§ feed · storyline

Gemini launches context caching... or does it?

Google Gemini introduces context caching with a 33k minimum token input, offering a cost option between RAG and fine-tuning with no upper limit on cache duration.

Jun 18 · 23:26:50 · primary fetch1 sourceupdated Jun 18 · 23:26:50

Nvidia's Nemotron ranks #1 open model on LMsys and #11 overall, surpassing Llama-3-70b. Meta AI released Chameleon 7B/34B models after further post-training. Google's Gemini introduced context caching, offering a cost-efficient middle ground between RAG and finetuning, with a minimum input token count of 33k and no upper limit on cache duration. DeepSeek launched DeepSeek-Coder-V2, a 236B parameter model outperforming GPT-4 Turbo, Claude-3-Opus, and Gemini-1.5-Pro in coding tasks, supporting 338 programming languages and extending context length to 128K.

It was trained on 6 trillion tokens using the Group Relative Policy Optimization (GRPO) algorithm and is available on Hugging Face with a commercial license. These developments highlight advances in model performance, context caching, and large-scale coding models.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiGemini launches context caching... or does it?primary23:26:50