§ feed · storyline

DBRX: Best open model (just not most efficient)

Databricks Mosaic releases DBRX, an open-source mixture-of-experts model trained on 12 trillion tokens that outperforms Grok, Mixtral, and Llama 2 on several benchmarks.

Mar 27 · 23:33:19 · primary fetch1 sourceupdated Mar 27 · 23:33:19

Databricks Mosaic has released a new open-source model called DBRX that outperforms Grok, Mixtral, and Llama2 on evaluations while being about 2x more efficient than Llama2 and Grok. The model was trained on 12 trillion tokens using 3,000 H100 GPUs over 2 months, with an estimated compute cost of $10 million. It uses OpenAI's 100k tiktoken tokenizer and shows strong zero-shot code generation performance, even beating GPT-4 on the Humaneval benchmark. DBRX also upstreamed work to MegaBlocks open source. Despite its scale and efficiency, DBRX's performance on MMLU is only slightly better than Mixtral, raising questions about its scaling efficiency.

The focus of DBRX is on enabling users to train models efficiently, with MoE training being about 2x more FLOP-efficient than dense models, achieving similar quality with nearly 4x less compute than previous MPT models. This release is part of the ongoing competition for open-source AI leadership, including models like Dolly, MPT, and Mistral. "If it activates 36B params, the model's perf should be equivalent to a 72B dense model or even 80B," says Qwen's tech lead.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiDBRX: Best open model (just not most efficient)primary23:33:19