§ feed · storyline

GraphRAG: The Marriage of Knowledge Graphs and RAG

Microsoft Research open-sources GraphRAG, a RAG technique that builds and clusters knowledge graphs from source documents to improve LLM answer quality at the cost of higher token usage and inference time.

Jul 3 · 03:30:30 · primary fetch1 sourceupdated Jul 3 · 03:30:30

Microsoft Research open sourced GraphRAG, a retrieval augmented generation (RAG) technique that extracts knowledge graphs from sources and clusters them for improved LLM answers, though it increases token usage and inference time. Gemma 2 models were released focusing on efficient small LLMs with innovations like sliding window attention and RMS norm, nearly matching the larger Llama 3 70B. Anthropic's Claude 3.5 Sonnet leads in instruction following and coding benchmarks, while Nvidia's Nemotron 340B model was released in June.

Qwen2-72B tops the HuggingFace Open LLM leaderboard excelling in math and long-range reasoning. Discussions on RAG highlighted its limitations and improvements in context usage via function calls. A persona-driven synthetic data generation approach introduced 1 billion personas, with a fine-tuned model matching GPT-4 performance on math benchmarks at 7B scale. The 200GB AutoMathText dataset was also noted for math data synthesis.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiGraphRAG: The Marriage of Knowledge Graphs and RAGprimary03:30:30