§ feed · storyline

Meta BLT: Tokenizer-free, Byte-level LLM

Meta AI introduces the Byte Latent Transformer (BLT), a tokenizer-free architecture that processes raw bytes via dynamic patching, outperforming Llama 3 on several benchmarks including CUTE.

Dec 14 · 06:38:19 · primary fetch1 sourceupdated Dec 14 · 06:38:19

Meta AI introduces the Byte Latent Transformer (BLT), a tokenizer-free architecture that dynamically forms byte patches for efficient compute allocation, outperforming Llama 3 on benchmarks including the CUTE benchmark. The model was trained on approximately 1 trillion tokens and features a three-block transformer design with local and global components. This approach challenges traditional tokenization and may enable new multimodal capabilities such as direct file interaction without retrieval-augmented generation. Additionally, Microsoft announced the Phi-4 14B parameter model achieving state-of-the-art results on STEM and reasoning benchmarks, surpassing GPT-4o.

DeepSeek AI launched new vision-language models based on their MoE architecture with sizes ranging from 1.0B to 27B parameters. OpenAI released a new Projects feature for ChatGPT, and Cohere introduced their smallest and fastest Command R7B model. Anthropic published research on "Best-of-N Jailbreaking" vulnerabilities across text, vision, and audio models. Industry discussion highlights a trend of decreasing frontier LLM sizes, with GPT-4 at approximately 1.8 trillion parameters compared to newer models.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiMeta BLT: Tokenizer-free, Byte-level LLMprimary06:38:19