§ feed · storyline

A Recipe for Training Neural Networks

Andrej Karpathy publishes a long-form guide on a practical recipe for training neural networks, covering common mistakes and a structured process to avoid them.

Apr 25 · 11:00:00 · primary fetch1 sourceupdated Apr 25 · 11:00:00

Some few weeks ago I posted a tweet on “the most common neural net mistakes”, listing a few common gotchas related to training neural nets. The tweet got quite a bit more engagement than I anticipated (including a webinar :)). Clearly, a lot of people have personally encountered the large gap between “here is how a convolutional layer works” and “our convnet achieves state of the art results”. So I thought it could be fun to brush off my dusty blog to expand my tweet to the long form that this topic deserves. However, instead of going into an enumeration of more common errors or fleshing them out, I wanted to dig a bit deeper and talk about how one can avoid making these errors altogether (or fix them very fast).

The trick to doing so is to follow a certain process, which as far as I can tell is not very often documented. Let’s start with two important observations that motivate it. 1) Neural net training is a leaky abstraction It is allegedly easy to get started with training neural nets. Numerous libraries and frameworks take pride in displaying 30-line miracle snippets that solve your data problems, giving the (false) impression that this stuff is plug and play. It’s common…

read full article on karpathy.github.io ↗

§ sources1 publication · timeline below

karpathy.github.ioA Recipe for Training Neural Networksprimary11:00:00