not much happened today
OpenAI released GPT-Realtime-2, a voice model with GPT-5-class reasoning, tool use, interruption handling, and extended context windows up to 128K tokens, achieving top scores on Big Bench Audio and Conversational…
88 this week·97 this month·139 all-time
Agent products, multi-step tooling, browser/code agents
OpenAI released GPT-Realtime-2, a voice model with GPT-5-class reasoning, tool use, interruption handling, and extended context windows up to 128K tokens, achieving top scores on Big Bench Audio and Conversational…
Google launched Gemini 3.1 Flash Live, a realtime voice and vision agent model with 2x longer conversation memory, supporting 70 languages and 128k context. Mistral AI released Voxtral TTS, a low-latency, open-weight…
OpenAI launched GPT-5.3-Codex with a Super Bowl ad emphasizing "You can just build things" as a product strategy, focusing on builder tooling over chat interfaces. The model is rolling out across Cursor, VS Code, and…
OpenAI launched GPT-5.2-Codex API, touted as their strongest coding model for long-running tasks and cybersecurity. Cursor integrated GPT-5.2-Codex to autonomously run a browser for a week, producing over 3 million…
Google's Threat Intelligence Group has identified the first known case of an attacker using AI to discover and weaponize a zero-day vulnerability. Google says it stopped the planned mass attack. State-backed actors…
Reuters: The US DOD says it is deploying Mythos to find and patch software vulnerabilities across the US government, even as it works on a transition away from Anthropic — WASHINGTON, May 12 (Reuters) - The Pentagon is…
Mira Murati's start-up presents its first AI model and aims to free voice AI from the question-and-answer model. The model processes audio, video and text in 200-millisecond chunks in parallel and aims to beat OpenAI's…
well done, Team Thinky.
Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.
Superset on Vercel Software development with AI started as a single engineer chatting with a single agent about a local repo. Today, developers direct fleets of agents in the cloud, but traditional tools were built for…
OpenAI rapidly expanded the GPT-5.5 family with multiple variants including gpt-image-2, GPT-5.5 Pro, and GPT-5.5 Cyber, receiving positive feedback for efficiency and usability. Codex evolved into a long-running agent…
Russell Brandom / TechCrunch: Mozilla says Anthropic's Mythos Preview and other AI models helped it identify and ship 423 Firefox security bug fixes in April, compared to 31 a year earlier — When Anthropic unveiled its…
Anthropic announced a new SpaceX compute partnership to significantly increase capacity for Claude products, doubling Claude Code's 5-hour rate limits for Pro, Max, Team, and Enterprise users, removing peak-hour limit…
We're releasing ten new Cowork and Claude Code plugins, integrations with the Microsoft 365 suite, new connectors, and an MCP app for financial services and insurance organizations.
Best-in-class open omni-modal reasoning model delivers the highest efficiency and accuracy to power agentic workflows such as computer use, document intelligence and audio-video reasoning.
OpenAI expanded its Agents SDK by separating the agent harness from compute/storage, enabling long-running, durable agents with features like file/computer use, skills, memory, and compaction. The harness is now…
0.86.0 (2026-04-08) Full Changelog: sdk-v0.85.0...sdk-v0.86.0 Features api: add support for Claude Managed Agents (2ef732a) Chores internal: codegen related update (d644830)
Meta Superintelligence Labs launched Muse Spark, a natively multimodal reasoning model featuring tool use, visual chain of thought, and multi-agent orchestration. It is live on meta.ai and the Meta AI app with a…
Anthropic introduced computer use inside Claude Code for closed-loop verification in a research preview for Pro/Max users, enhancing reliable app iteration. OpenAI released a Codex plugin for Claude Code, enabling…
Anthropic introduced Claude Cowork and Claude Code enabling desktop control of mouse, keyboard, and screen in a macOS research preview, expanding agent capabilities beyond APIs and browsers. The agent ecosystem is…
Alibaba released the Qwen 3.5 series with models ranging from 0.8B to 9B parameters, featuring native multimodality, scaled reinforcement learning, and targeting edge and lightweight agent deployments. The models…
MiniMax M2.1 launches as an open-source agent and coding Mixture-of-Experts (MoE) model with ~10B active / ~230B total parameters, claiming to outperform Gemini 3 Pro and Claude Sonnet 4.5, and supports local inference…
Google DeepMind: Google DeepMind details a Gemini-powered mouse pointer that understands what it is pointing at, allowing users to perform tasks without using text-heavy prompts — We are developing more seamless…
Introducing our new Cline CLI built on our new SDK and comes with a snappy new TUI. Install: ```sh npm install -g cline ``` For nightly builds: ```sh npm install -g cline@nightly ```
With Gemini Intelligence, Google is introducing new AI features for Android that automate multi-step tasks, summarize web content, fill out forms, and turn spoken thoughts into polished text messages. The article…
Nimbus builds production AI systems — internal tools, customer agents, retrieval pipelines — combining humans and AI end-to-end. From scoped pilot to production in 4–8 weeks.
Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.We were always frustrated by…
PLUS: Build a YouTube research bot in 15 minutes
Thinking Machines previewed their new native interaction models designed for full-duplex multimodal interaction enabling real-time concurrent listening, speaking, watching, thinking, searching, and reacting, marking a…
Palisade Research shows that AI agents can hack remote computers, copy themselves onto them, and form replication chains. In one year, the success rate jumped from 6 to 81 percent. The researchers expect remaining…