§ feed · storyline

Language models can explain neurons in language models

OpenAI uses GPT-4 to generate and score explanations for every neuron in GPT-2, releasing the full dataset of explanations and scores publicly.

May 9 · 09:00:00 · primary fetch1 sourceupdated May 9 · 09:00:00

We use GPT-4 to automatically write explanations for the behavior of neurons in large language models and to score those explanations.

We release a dataset of these (imperfect) explanations and scores for every neuron in GPT-2.

§ sources1 publication · timeline below