Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention
From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs
magazine.sebastianraschka.com·platform·21 items·last fetched
From Gemma 4 to DeepSeek V4, How New Open-Weight LLMs Are Reducing Long-Context Costs
A learning-oriented workflow for understanding new open-weight model releases
How coding agents use tools, memory, and repo context to make LLMs work better in practice
From MHA and GQA to MLA, sparse attention, and hybrid architectures
A Round Up And Comparison of 10 Open-Weight LLM Releases in Spring 2026
And an Overview of Recent Inference-Scaling Papers
A 2025 review of large language models, from DeepSeek R1 and RLVR to inference-time scaling, benchmarks, architectures, and predictions for 2026.
In June, I shared a bonus article with my curated and bookmarked research paper lists to the paid subscribers who make this Substack possible.
Understanding How DeepSeek's Flagship Open-Weight Models Evolved
Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
A Detailed Look at One of the Leading Open-Source LLMs
And How They Stack Up Against Qwen3
From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design
A topic-organized collection of 200+ LLM research papers from 2025
KV caches are one of the most critical techniques for efficient inference in LLMs in production.
Why build LLMs from scratch? It's probably the best and most efficient way to learn how LLMs really work. Plus, many readers have told me they had a lot of fun doing it.
Understanding GRPO and New Insights from Reasoning Model Papers
Welcome to the next stage of large language models (LLMs): reasoning. LLMs have transformed how we process and generate text, but their success has been largely driven by statistical pattern recognition. However, new…
Inference-Time Compute Scaling Methods to Improve Reasoning Models
Methods and Strategies for Building and Refining Reasoning Models