Reflection 70B, by Matt from IT Department
Hyperwrite and Glaive release Reflection 70B, a fine-tuned Llama 3.1 70B model using a two-step thinking and reflection technique that draws comparisons to Claude 3.5 Sonnet.
Reflection Tuning technique has been used by a two-person team from Hyperwrite and Glaive to finetune llama-3.1-70b, showing strong performance improvements with minimal synthetic data. The approach builds on the concept of adding `thinking` and `reflection` steps to outputs, related to the Chain of Thought method.
Despite some criticisms like contamination concerns, worse coding performance, and reliance on system prompts, the model has received positive reception and comparisons to claude-3.5-sonnet. The work highlights efficient instruction tuning and synthetic data generation for large models.