§ feed · storyline

12/9/2023: The Mixtral Rush

Mixtral releases model weights without code, prompting rapid community implementation by DiscoResearch and Fireworks AI, though no significant benchmark gains were reported over comparable models.

Dec 10 · 00:30:00 · primary fetch1 sourceupdated Dec 10 · 00:30:00

Mixtral's weights were released without code, prompting the Disco Research community and Fireworks AI to implement it rapidly. Despite efforts, no significant benchmark improvements were reported, limiting its usefulness for local LLM usage but marking progress for the small models community. Discussions in the DiscoResearch Discord covered Mixtral's performance compared to models like Hermes 2.5 and Hermes 2, with evaluations on benchmarks such as winogrande, truthfulqa_mc2, and arc_challenge. Technical topics included GPU requirements, multi-GPU setups, and quantization via GPTQ.

Benchmarking strategies like grammar-based evaluation, chain of thought (CoT), and min_p sampling were explored, alongside model sampling techniques like Min P and Top P to enhance response stability and creativity. Users also discussed GPTs' learning limitations and the adaptability of models under varying conditions, emphasizing min_p sampling's role in enabling higher temperature settings for creativity.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.ai12/9/2023: The Mixtral Rushprimary00:30:00