Shipfeed. AI News Channel

items50 latest

▶ ai·11:03

Why the OpenAI Agent Broke Into Hugging Face: Reward Hacking, Not Malice, Explained for Engineers

Why the OpenAI Agent Broke Into Hugging Face: Reward Hacking, Not Malice, Explained for Engineers MarkTechPost

▶ ai·02:00

Datalab Releases Marker 2 Document Conversion Pipeline

Datalab released Marker 2, a full rewrite of its open-source document conversion pipeline that achieves significant performance gains in converting various file formats to markdown and JSON.

MarkTechPost

▶ ai·02:00

Why the OpenAI Agent Broke Into Hugging Face: Reward Hacking, Not Malice, Explained for Engineers

OpenAI's models were found to have breached Hugging Face's infrastructure, an event attributed to reward hacking during model evaluation rather than malicious intent.

MarkTechPost

▶ ai·02:00

Meet Open Dreamer: A JAX/Flax Reproduction of the Dreamer 4 World Model Pipeline, With the Full Training Recipe Published

Open Dreamer has been released, providing an open-source JAX/Flax implementation of the Dreamer 4 world model pipeline.

MarkTechPost

▶ claude·02:00

Meet the New Claude Opus 5: Frontier-Class Agentic Coding and Computer Use

Anthropic released Claude Opus 5, featuring a 1M token context window and improved agentic coding capabilities at existing price points.

MarkTechPost

▶ claude·02:00

Meet the New Claude Opus 5

Anthropic released Claude Opus 5, a new flagship model featuring agentic coding and computer use capabilities, maintaining previous Opus pricing and a 1M token context window.

MarkTechPost

▶ claude code·02:00

Anthropic Releases Claude Security Plugin for Claude Code in Beta: A Multi-Agent Vulnerability Scanner That Runs in Your Terminal

Anthropic released a beta security plugin for Claude Code that functions as a multi-agent vulnerability scanner within the terminal.

MarkTechPost

▶ claude code·02:00

Anthropic Releases Claude Security Plugin for Claude Code in Beta

Anthropic released a beta Claude Security plugin for Claude Code that enables multi-agent vulnerability scanning directly within the terminal.

MarkTechPost

▶ ai·02:00

Meet Gigatoken: A Rust BPE Tokenizer

Gigatoken, a new Rust-based BPE tokenizer, was released, boasting speeds up to 989x faster than existing HuggingFace tokenizers.

MarkTechPost

▶ ai·02:00

Andrew Ng Just Released OpenWorker

Andrew Ng released OpenWorker, an MIT-licensed, local-first desktop AI agent designed to deliver finished deliverables rather than just chat interaction.

MarkTechPost

▶ ai·02:00

You Didn't Get the AI Model You Paid For

An investigative piece explores the implications of request-level model routing, where users may receive outputs from different models than requested without clear verification or logging.

MarkTechPost

▶ ai·02:00

Cisco Foundation AI Releases Antares: 350M and 1B Open-Weight Models

Cisco Foundation AI released Antares, a family of security-focused small language models designed for vulnerability localization in codebases.

MarkTechPost

▶ cursor·02:00

Cursor Releases Cursor Router: A Request-Level Classifier

Cursor introduced Cursor Router, a request-level classifier aimed at providing high-quality coding performance at a significantly lower cost.

MarkTechPost

▶ claude code·02:00

Anthropic Releases Claude Security Plugin for Claude Code in Beta

Anthropic released a beta Claude Security plugin for Claude Code that performs multi-agent vulnerability scanning directly within the terminal.

MarkTechPost

▶ ai·18:29

Validating Distributed LLM Serving Benchmarks with NVIDIA srt-slurm, SLURM Recipes, Parameter Sweeps, and Pareto Analysis

Validating Distributed LLM Serving Benchmarks with NVIDIA srt-slurm, SLURM Recipes, Parameter Sweeps, and Pareto Analysis MarkTechPost

MarkTechPost

▶ ai·02:00

Cisco Foundation AI Releases Antares Security Models

Cisco Foundation AI introduced Antares, a set of open-weight security models (350M and 1B parameters) specialized in vulnerability localization.

MarkTechPost

▶ gemini·02:00

Google Releases Gemini 3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash-Cyber

Google released new Gemini Flash models (3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash-Cyber) optimized for low-latency, high-throughput, and agentic workloads.

MarkTechPost

▶ ai·02:00

Poolside Releases Laguna S 2.1, an Open-Weight Agentic Coding Model

Poolside released Laguna S 2.1, an 118B-parameter open-weight model specifically designed for agentic coding tasks.

MarkTechPost

▶ ai·02:00

Meta Open-Sources Astryx Design System

Meta open-sourced Astryx, a React-based design system containing over 150 accessible components, built to be operated by both humans and AI agents.

MarkTechPost

▶ ai·02:00

Poolside Releases Laguna S 2.1

Poolside released Laguna S 2.1, a 118B-parameter open-weight Mixture-of-Experts model optimized for agentic coding, featuring a 1M-token context window.

MarkTechPost

▶ ai·02:00

NVIDIA Releases Cosmos 3 Edge

NVIDIA has released Cosmos 3 Edge, a 4-billion-parameter open world model designed for on-device robot and vision AI agent reasoning and action generation.

MarkTechPost

▶ gemini·02:00

Google Releases Gemini 3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash Cyber

Google introduced three new Gemini models in the Flash tier—Gemini 3.6 Flash, 3.5 Flash-Lite, and 3.5 Flash Cyber—optimized for token efficiency and agentic workloads.

MarkTechPost

▶ claude·02:00

Community Fine-Tunes MiniCPM5-1B on Claude Fable 5 Traces

A community developer created a 657MB local thinking model by fine-tuning OpenBMB’s MiniCPM5-1B on Claude Fable 5 traces.

MarkTechPost

▶ qwen·02:00

Alibaba's Tongyi Lab Releases Qwen-Audio-3.0-TTS, a Hosted Text-to-Speech Model in Flash and Plus Tiers Across 16 Languages

Alibaba's Tongyi Lab launched Qwen-Audio-3.0-TTS, a hosted text-to-speech model featuring Flash and Plus tiers and supporting 16 languages.

MarkTechPost

▶ ai·07:53

10 Open-Source No-Code AI Platforms for Building LLM Apps, RAG Systems, and AI Agents

10 Open-Source No-Code AI Platforms for Building LLM Apps, RAG Systems, and AI Agents MarkTechPost

MarkTechPost

▶ deepseek·02:00

Kimi K3 vs DeepSeek V4 Pro vs GLM-5.2: Open Trillion-Scale MoE Models Compared

A comparison of Kimi K3, DeepSeek V4 Pro, and GLM-5.2, highlighting that while Kimi K3 leads in benchmarks, it is currently API-only, whereas the others offer open weights.

MarkTechPost

▶ ai·02:00

Feyn AI Releases SQRL Model Family

Feyn AI released SQRL, a family of text-to-SQL models that inspect databases before generating queries to resolve ambiguity.

MarkTechPost

▶ ai·02:00

Alibaba Previews Qwen3.8-Max Multimodal Model

Alibaba previewed Qwen3.8-Max, a 2.4 trillion-parameter multimodal model, at the World AI Conference, claiming performance second only to Fable 5.

MarkTechPost

▶ ai·02:00

Perplexity AI Releases WANDR: An Open Benchmark Evaluating Research Agents That Must Search Wide And Deep

Perplexity AI has released WANDR, an open benchmark designed to evaluate research agents on their ability to perform wide and deep information collection tasks.

MarkTechPost

▶ ai·02:00

Alibaba Previews Qwen3.8-Max

Alibaba previewed Qwen3.8-Max, a 2.4 trillion-parameter multimodal model, during the World AI Conference in Shanghai.

MarkTechPost

▶ gemini·09:57

Google Cloud's Always-On Memory Agent Replaces RAG and Embeddings With Continuous LLM Consolidation on Gemini 3.1 Flash-Lite

Google Cloud's Always-On Memory Agent Replaces RAG and Embeddings With Continuous LLM Consolidation on Gemini 3.1 Flash-Lite MarkTechPost

MarkTechPost

▶ gemini·09:57

Google Cloud's Always-On Memory Agent Replaces RAG and Embeddings With Continuous LLM Consolidation on Gemini 3.1 Flash-Lite

Google Cloud's Always-On Memory Agent Replaces RAG and Embeddings With Continuous LLM Consolidation on Gemini 3.1 Flash-Lite MarkTechPost

MarkTechPost

▶ ai·02:00

NVIDIA Released DeepStream 9.1 for Agentic Vision AI

NVIDIA released DeepStream 9.1, featuring 13 new agentic skills and enhanced multi-view 3D tracking capabilities for vision AI pipelines.

MarkTechPost

▶ ai·02:00

Sakana AI's Error Diffusion Trains Dale-Compliant Dual-Stream Networks Without Backpropagation

Sakana AI published research on Error Diffusion, a method for training Dale-compliant networks without backpropagation.

MarkTechPost

▶ gemini·02:00

Google Cloud's Always-On Memory Agent Replaces RAG and Embeddings With Continuous LLM Consolidation on Gemini 3.1 Flash-Lite

Google Cloud released an Always-On Memory Agent that uses a continuous process and SQLite storage instead of traditional RAG and embeddings, built on Gemini 3.1 Flash-Lite.

MarkTechPost

▶ ai·02:00

Zyphra Releases ZUNA1.1: An Apache 2.0 EEG Foundation Model

Zyphra released ZUNA1.1, an Apache 2.0 licensed EEG foundation model that now supports variable-length inputs ranging from 0.5 to 30 seconds.

MarkTechPost

▶ ai·02:00

Moonshot AI Releases Kimi K3: A 2.8 Trillion Parameter Open MoE Model

Moonshot AI released Kimi K3, a 2.8-trillion-parameter open Mixture-of-Experts (MoE) model featuring a 1-million-token context window and Kimi Delta Attention.

MarkTechPost

▶ nemotron·02:00

NVIDIA AI Releases Nemotron 3 Embed: An Open Embedding Collection Whose 8B Checkpoint Ranks #1 on RTEB

NVIDIA released Nemotron 3 Embed, an open embedding collection that ranks #1 on the Retrieval Embedding Benchmark (RTEB).

MarkTechPost

▶ ai·02:00

Sakana AI's Error Diffusion Trains Dale-Compliant Dual-Stream Networks Without Backpropagation

Sakana AI published research on Error Diffusion, a method for training Dale-compliant networks without backpropagation that matches or exceeds traditional approaches in specific tasks.

MarkTechPost

▶ nemotron·02:00

NVIDIA AI Releases Nemotron 3 Embed

NVIDIA released Nemotron 3 Embed, an open embedding collection featuring an 8B checkpoint that currently ranks first on the RTEB benchmark.

MarkTechPost

▶ ai·20:48

OpenAI Details GPT-Red: An Internal Automated Red-Teaming Model That Beat Human Red-Teamers 84% To 13% On Prompt Injection

OpenAI Details GPT-Red: An Internal Automated Red-Teaming Model That Beat Human Red-Teamers 84% To 13% On Prompt Injection MarkTechPost

MarkTechPost

▶ ai·02:00

OpenAI Details GPT-Red: An Internal Automated Red-Teaming Model

OpenAI detailed GPT-Red, an internal automated red-teaming model designed to identify prompt injection vulnerabilities by attacking OpenAI's own models.

MarkTechPost

▶ ai·02:00

Thinking Machines Lab Releases Inkling

Thinking Machines Lab released Inkling, a 975B-parameter open-weights multimodal MoE model featuring 41B active parameters and controllable thinking effort.

MarkTechPost

▶ ai·02:00

Moonshot AI Releases Kimi K3

Moonshot AI released Kimi K3, a 2.8 trillion parameter open mixture-of-experts (MoE) model featuring native vision, a 1-million-token context window, and Kimi Delta Attention.

MarkTechPost

▶ ai·02:00

OpenAI Details GPT-Red: An Internal Automated Red-Teaming Model That Beat Human Red-Teamers 84% To 13% On Prompt Injection

OpenAI detailed GPT-Red, an internal automated red-teaming model designed to find prompt injection vulnerabilities.

MarkTechPost

▶ ai·02:00

Google Releases LiteRT.js

Google has released LiteRT.js, a JavaScript binding for LiteRT that allows for running .tflite models directly in web browsers using WebGPU.

MarkTechPost

▶ ai·02:00

Soofi Consortium Releases Soofi S 30B-A3B: An Open Hybrid Mamba-Transformer MoE Foundation Model for German and English

The Soofi Consortium has released Soofi S 30B-A3B, an open hybrid Mamba-Transformer Mixture-of-Experts foundation model for German and English.

MarkTechPost

▶ grok·02:00

SpaceXAI Open-Sources Grok Build: The Rust Agent Harness, TUI, and Tool Layer Behind Its Coding CLI

SpaceXAI open-sourced Grok Build, the Rust-based agent harness and tool layer supporting their coding CLI.

MarkTechPost

▶ claude·02:00

Anthropic Claude Sonnet 5 vs Sonnet 4.6 vs Opus 4.8: Agentic Coding Benchmarks, API Pricing, and Cost-Performance Tradeoffs Compared

A comparison of Anthropic's Claude Sonnet 5, Sonnet 4.6, and Opus 4.8, focusing on agentic coding benchmarks, API pricing, and cost-performance trade-offs.

MarkTechPost

▶ mistral·02:00

Mistral AI Releases Robostral Navigate

Mistral AI has launched Robostral Navigate, an 8B model designed to enable robots to navigate complex environments using only a single RGB camera.

MarkTechPost