not much happened today
Thinking Machines previewed their new native interaction models designed for full-duplex multimodal interaction enabling real-time concurrent listening, speaking, watching, thinking, searching, and reacting, marking a…
https://news.smol.ai·platform·505 items·last fetched
Thinking Machines previewed their new native interaction models designed for full-duplex multimodal interaction enabling real-time concurrent listening, speaking, watching, thinking, searching, and reacting, marking a…
OpenAI rapidly expanded the GPT-5.5 family with multiple variants including gpt-image-2, GPT-5.5 Pro, and GPT-5.5 Cyber, receiving positive feedback for efficiency and usability. Codex evolved into a long-running agent…
OpenAI released GPT-Realtime-2, a voice model with GPT-5-class reasoning, tool use, interruption handling, and extended context windows up to 128K tokens, achieving top scores on Big Bench Audio and Conversational…
OpenAI released GPT-Realtime-2, a voice model with GPT-5-class reasoning, tool use, interruption handling, and extended context windows up to 128K tokens, achieving top scores on Big Bench Audio and Conversational…
Anthropic announced a new SpaceX compute partnership to significantly increase capacity for Claude products, doubling Claude Code's 5-hour rate limits for Pro, Max, Team, and Enterprise users, removing peak-hour limit…
Anthropic announced a new SpaceX compute partnership to significantly increase capacity for Claude products, doubling Claude Code's 5-hour rate limits for Pro, Max, Team, and Enterprise users, removing peak-hour limit…
AI Twitter Recap highlights the shift from model-centric AI to context pipelines and agent orchestration as key performance drivers. Notably, gpt-5.2-codex and gpt-5.3-codex showed significant benchmark improvements…
OpenAI rolled out GPT-5.5 Instant as the new default for ChatGPT and API, enhancing factuality, intelligence, image understanding, and tone with stronger personalization features like saved memories and Gmail…
xAI released Grok 4.3, improving cost/performance with a 53 Intelligence Index score, 4 points higher than Grok 4.20, and significant gains on GDPval-AA and τ²-Bench Telecom. However, accuracy tradeoffs raised…
OpenAI's GPT-5.5 achieves top-tier performance in long-horizon cyber tasks, matching or surpassing Claude Mythos Preview with a 71.4% pass rate and showing ongoing improvement beyond 100M tokens inference. OpenAI also…
OpenAI is expanding Codex from a coding tool to a general work surface with persistent context, tools, integrations, and team rollout, including Codex-only seats with $0 seat fee for Business/Enterprise customers…
vLLM v0.20.0 introduces significant improvements in memory and MoE serving efficiency, including TurboQuant 2-bit KV cache for 4× KV capacity and a 2.1% latency improvement. The update supports multiple hardware…
OpenAI loosens its Azure exclusivity, allowing distribution across Google TPU, AWS Trainium, and Bedrock with commitments through 2032 and revenue share through 2030. GPT-5.5 shows improved benchmarks but is not…
DeepSeek-V4 technical release features a 1.6T-parameter MoE with 49B active parameters and 1M-token context, showcasing hybrid attention and compressed KV schemes for major memory reductions. It ranks as the #2…
OpenAI launched GPT-5.5 as its new flagship model for "real work and powering agents," immediately available in ChatGPT and Codex but with delayed API access due to enhanced safety requirements. The model features…
Alibaba released Qwen3.6-27B, a dense, Apache 2.0 open coding model with thinking and non-thinking modes, outperforming the larger Qwen3.5-397B-A17B on multiple coding benchmarks including SWE-bench and Terminal-Bench…
OpenAI launched GPT-Image-2, enhancing image generation with improved text rendering, layout fidelity, editing, multilingual support, and "thinking" capabilities. It supports generating slides, infographics, diagrams…
Moonshot's Kimi K2.6 is a major open-weight 1T-parameter MoE model featuring 32B active parameters, 384 experts, MLA attention, 256K context window, native multimodality, and INT4 quantization. It supports day-0…
Anthropic launched Claude Design, a prototyping tool powered by Claude Opus 4.7, targeting design workflows and competing with Figma and others. Benchmarks show Opus 4.7 leading in coding and text tasks, with improved…
Anthropic launched Claude Opus 4.7, its most capable Opus model yet, featuring stronger coding and agentic performance, a new tokenizer, and improved long-context handling with a new xhigh reasoning tier. Benchmarks…
OpenAI expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now…
Harness engineering is emerging as a key discipline in AI agent development, emphasizing components like filesystems, memory, and retries beyond just models. OpenAI's Codex is expanding agentic coding workflows beyond…
GLM-5.1 has reached #3 on Code Arena, surpassing Gemini 3.1 and GPT-5.4, and matching Claude Sonnet 4.6 in coding performance. Z.ai now holds the #1 open model rank close to the top overall. The advisor pattern…
Anthropic's Mythos and OpenAI's upcoming restricted cyber-capable models are central to recent discussions, with debates on their security realism and evaluation methods. LangChain's Deep Agents deploy introduces an…
Meta Superintelligence Labs launched Muse Spark, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on meta.ai and the Meta AI app with a…
Anthropic strategically challenges OpenAI amid its upcoming IPO concerns by announcing a jump from $19B ARR in March to $30B ARR in April, highlighting a differential growth rate and higher cost efficiency. The company…
Hermes Agent is gaining attention as a leading open agent stack with features like self-improving skills, persistent memory, and a self-improvement loop. Its new Manim skill enables generation of math/technical…
Google introduced Skills in Chrome, enabling reusable browser workflows with Gemini prompts and a library of ready-made Skills, enhancing end-user agentization. Tencent teased HYWorld 2.0, an open-source 3D world model…
Gemma 4 was launched by Google under an Apache 2.0 license, marking a significant open-model release focused on reasoning, agentic workflows, multimodality, and on-device use. It outperforms models 10x larger and has…
Google DeepMind released Gemma 4, a family of open-weight, multimodal models with long-context support up to 256K tokens under an Apache 2.0 license, marking a major capability and licensing shift. The lineup includes…
Arcee’s Trinity-Large-Thinking was released with open weights under Apache 2.0, featuring a 400B total / 13B active model size and strong agentic performance, ranking #2 on PinchBench. Z.ai’s GLM-5V-Turbo is a vision…
Anthropic introduced computer use inside Claude Code for closed-loop verification in a research preview for Pro/Max users, enhancing reliable app iteration. OpenAI released a Codex plugin for Claude Code, enabling…
Anthropic is reportedly introducing a new AI model tier called Capybara, which is larger and more intelligent than Claude Opus 4.6, showing improved performance in coding, academic reasoning, and cybersecurity. The…
Anthropic advances agent infrastructure with a multi-agent harness emphasizing orchestration and "computer use" for complex software environments. Figma, GitHub, and Cursor launch design canvases with direct AI…
Google launched Gemini 3.1 Flash Live, a realtime voice and vision agent model with 2x longer conversation memory, supporting 70 languages and 128k context. Mistral AI released Voxtral TTS, a low-latency, open-weight…
Anthropic's closed-source coding product Claude Code experienced a significant source leak exposing over 500k lines of orchestration logic, including autonomous modes and memory systems, but not model weights. The leak…
ARC-AGI-3 benchmark introduced by @arcprize and François Chollet resets the frontier for general agentic reasoning with humans solving 100% of tasks versus under 1% for current models, focusing on zero-preparation…
Anthropic introduced Claude Cowork and Claude Code enabling desktop control of mouse, keyboard, and screen in a macOS research preview, expanding agent capabilities beyond APIs and browsers. The agent ecosystem is…
Cursor's Composer 2, built on Kimi K2.5, sparked discussion over model attribution and licensing, highlighting a shift toward post-trained derivatives of open-source models with domain-specific fine-tuning and…
Cursor launched Composer 2, a frontier-class coding model with major cost reductions and strong benchmark scores like 61.3 on CursorBench and 73.7 on SWE-bench Multilingual. The model was improved via a first continued…
MiniMax M2.7 is the headline model release, described as a "self-evolving agent" with strong performance metrics including 56.22% on SWE-Pro, 57.0% on Terminal Bench 2, and parity with Sonnet 4.6. It features recursive…
OpenAI released GPT-5.4 mini and GPT-5.4 nano, their most capable small models optimized for coding, multimodal understanding, and subagents, featuring a 400k context window and over 2x speed compared to GPT-5 mini…
Moonshot's Attention Residuals paper introduced an input-dependent attention mechanism over prior layers with a 1.25x compute advantage and less than 2% inference latency overhead, validated on Kimi Linear 48B total /…
MCP tools remain relevant for deterministic APIs despite ergonomic criticisms, with new web MCP support in Chrome v146 enabling continuous browsing agents. Persistent memory is emerging as a key differentiator for…
Harnesses, agent infrastructure, and the MCP protocol are central themes, with emphasis on how harnesses, sandboxes, filesystem access, skills, memory, and observability shape agent UI/UX and runtime environments…
NVIDIA’s Nemotron 3 Super is a 120B parameter / ~12B active open model featuring a hybrid Mamba-Transformer / SSM Latent MoE architecture and 1M context window, delivering up to 2.2x faster inference than GPT-OSS-120B…
Yann LeCun launched Advanced Machine Intelligence (AMI Labs) with a record $1.03B seed round at a $3.5B pre-money valuation, aiming to build AI models that understand the physical world through world models rather than…
RSI covers AI developments from 3/5/2026 to 3/9/2026, highlighting the emergence of LLMs autonomously training smaller LLMs, marking a significant "AutoML moment" in AI progress. Karpathy and Yi Tay discuss "vibe…
OpenAI rolled out GPT-5.4, achieving tied #1 on the Artificial Analysis Intelligence Index with Gemini 3.1 Pro Preview scoring 57 (up from 51 for GPT-5.2 xhigh). GPT-5.4 features a larger ~1.05M token context window…
OpenAI launched GPT-5.4 and GPT-5.4 Pro with unified mainline and Codex models, featuring native computer use, up to ~1M token context, and efficiency improvements including a new Codex `/fast` mode. Benchmarks showed…