microgpt
microgpt is a 200-line single-file Python implementation of a GPT, covering tokenizer, autograd, GPT-2-like architecture, Adam optimiser, and inference with no external dependencies.
.post-header h1 { font-size: 35px; } .post pre, .post code { background-color: #fcfcfc; font-size: 13px; / make code smaller for this post... / } This is a brief guide to my new art project microgpt, a single file of 200 lines of pure Python with no dependencies that trains and inferences a GPT. This file contains the full algorithmic content of what is needed: dataset of documents, tokenizer, autograd engine, a GPT-2-like neural network architecture, the Adam optimizer, training loop, and inference loop. Everything else is just efficiency. I cannot simplify this any further. This script is the culmination of multiple projects (micrograd, makemore, nanogpt, etc.) and a decade-long obsession to simplify LLMs to their bare essentials, and I think it is beautiful 🥹.
It even breaks perfectly across 3 columns: Where to find it: This GitHub gist has the full source code: microgpt.py It’s also available on this web page: https://karpathy.ai/microgpt.html Also available as a Google Colab notebook NEW: buy microgpt as a triptych on my art store at karpathy.art :) The following is my guide on stepping an interested reader through the code. Dataset The fuel of large language models is a…
- karpathy.github.iomicrogptprimary