§ feed · storyline

Creating a LLM-as-a-Judge

Hamel Husain publishes a 6,000-word guide on building LLM judges using critique shadowing to align language models with domain experts and address untrusted data in AI teams.

Oct 31 · 00:17:27 · primary fetch1 sourceupdated Oct 31 · 00:17:27

Anthropic released details on Claude 3.5 SWEBench+SWEAgent, while OpenAI introduced SimpleQA and DeepMind launched NotebookLM. Apple announced new M4 Macbooks, and a new SOTA image model, Recraft v3, emerged. Hamel Husain presented a detailed 6,000-word treatise on creating LLM judges using a method called critique shadowing to align LLMs with domain experts, addressing the problem of untrusted and unused data in AI teams.

The workflow involves expert-reviewed datasets and iterative prompt refinement. Additionally, Zep introduced a temporal knowledge graph memory layer to improve AI agent memory and reduce hallucinations. Anthropic also integrated Claude 3.5 Sonnet with GitHub Copilot, expanding access to Copilot Chat users.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiCreating a LLM-as-a-Judgeprimary00:17:27