How AI training scales
OpenAI publishes research showing the gradient noise scale predicts optimal batch sizes for neural network training, suggesting large batches will enable further scaling of AI systems.
We’ve discovered that the gradient noise scale, a simple statistical metric, predicts the parallelizability of neural network training on a wide range of tasks. Since complex tasks tend to have noisier gradients, increasingly large batch sizes are likely to become useful in the future, removing one potential limit to further growth of AI systems.
More broadly, these results show that neural network training need not be considered a mysterious art, but can be rigorized and systematized.
- openai.comHow AI training scalesprimary