§ feed · storyline

DeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the cost

DeepSeek releases V2, a 236B-parameter MoE model with Multi-Head Latent Attention that outperforms GPT-4 on AlignBench at roughly half the inference cost of comparable models.

May 7 · 01:37:03 · primary fetch1 sourceupdated May 7 · 01:37:03

DeepSeek V2 introduces a new state-of-the-art MoE model with 236B parameters and a novel Multi-Head Latent Attention mechanism, achieving faster inference and surpassing GPT-4 on AlignBench. Llama 3 120B shows strong creative writing skills, while Microsoft is reportedly developing a 500B parameter LLM called MAI-1. Research from Scale AI highlights overfitting issues in models like Mistral and Phi, whereas GPT-4, Claude, Gemini, and Llama maintain benchmark robustness.

In robotics, Tesla Optimus advances with superior data collection and teleoperation, LeRobot marks a move toward open-source robotics AI, and Nvidia's DrEureka automates robot skill training. Multimodal LLM hallucinations are surveyed with new mitigation strategies, and Google's Med-Gemini achieves SOTA on medical benchmarks with fine-tuned multimodal models.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiDeepSeek-V2 beats Mixtral 8x22B with >160 experts at HALF the costprimary01:37:03