shipfeedAI news, curated daily

23:04:34 CET
20 MAY23:04:34shipfeed
pull to refreshlast sync
Just in — 30 new
§ feed · storyline

LMSys advances Llama 3 eval analysis

LMSys publishes granular Llama 3 evaluation analysis across 8 query subcategories and 7 prompt complexity levels, revealing uneven performance strengths in the 70b model.

May 10 · · primary fetch1 sourceupdated May 10 ·

LMSys is enhancing LLM evaluation by categorizing performance across 8 query subcategories and 7 prompt complexity levels, revealing uneven strengths in models like Llama-3-70b. DeepMind released AlphaFold 3, advancing molecular structure prediction with holistic modeling of protein-DNA-RNA complexes, impacting biology and genetics research. OpenAI introduced the Model Spec, a public standard to clarify model behavior and tuning, inviting community feedback and aiming for models to learn directly from it.

Llama 3 has reached top leaderboard positions on LMSys, nearly matching Claude-3-sonnet in performance, with notable variations on complex prompts. The analysis highlights the evolving landscape of model benchmarking and behavior shaping.

read full article on news.smol.ai
§ sources1 publication · timeline below
  1. news.smol.aiLMSys advances Llama 3 eval analysisprimary