§ feed · storyline

LLaDA: Large Language Diffusion Models

LLaDA 8B, a diffusion-based language model, matches LLaMA 3 8B performance while training on 2 trillion tokens using 0.13 million H800 GPU hours.

Feb 18 · 04:27:47 · primary fetch1 sourceupdated Feb 18 · 04:27:47

LLaDA (Large Language Diffusion Model) 8B is a breakthrough diffusion-based language model that rivals LLaMA 3 8B while training on 7x fewer tokens (2 trillion tokens) and using 0.13 million H800 GPU hours. It introduces a novel text generation approach by predicting uniformly masked tokens in a diffusion process, enabling multi-turn dialogue and instruction-following. Alongside, StepFun AI released two major models: Step-Video-T2V 30B, a text-to-video model generating up to 204 frames with high coherence and motion quality, and Step-Audio-Chat 132B, a voice-to-voice model.

Additionally, challenging multimodal benchmarks like Scale AI's EnigmaEval and Cambridge's ZeroBench highlight current frontier models scoring zero, emphasizing the difficulty of these tasks. The community also noted the return of diffusion models in language modeling, a previously speculative architecture now scaled successfully.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiLLaDA: Large Language Diffusion Modelsprimary04:27:47