§ feed · storyline

Gemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canary

Chrome Canary adds a Gemini Nano feature flag with a prompt API, supporting on-device models at 1.8B and 3.25B parameters with sub-100ms inference, and weights have been uploaded to HuggingFace.

Jun 25 · 09:02:13 · primary fetch1 sourceupdated Jun 25 · 09:02:13

The latest Chrome Canary now includes a feature flag for Gemini Nano, offering a prompt API and on-device optimization guide, with models Nano 1 and 2 at 1.8B and 3.25B parameters respectively, showing decent performance relative to Gemini Pro. The base and instruct-tuned model weights have been extracted and posted to HuggingFace. In AI model releases, Anthropic launched Claude 3.5 Sonnet, which outperforms GPT-4o on some benchmarks, is twice as fast as Opus, and is free to try. DeepSeek-Coder-V2 achieves 90.2% on HumanEval and 75.7% on MATH, surpassing GPT-4-Turbo-0409, with models up to 236B parameters and 128K context length.

GLM-0520 from Zhipu AI/Tsinghua ranks highly in coding and overall benchmarks. NVIDIA announced Nemotron-4 340B, an open model family for synthetic data generation. Research highlights include TextGrad, a framework for automatic differentiation on textual feedback; PlanRAG, an iterative plan-then-RAG decision-making technique; a paper on goldfish loss to mitigate memorization in LLMs; and a tree search algorithm for language model agents.

read full article on news.smol.ai ↗

§ sources1 publication · timeline below

news.smol.aiGemini Nano: 50-90% of Gemini Pro, <100ms inference, on device, in Chrome Canaryprimary09:02:13