Learning to play Minecraft with Video PreTraining
OpenAI trains a neural network to play Minecraft via Video PreTraining on unlabeled gameplay footage, enabling it to craft diamond tools in a task taking humans over 20 minutes.
We trained a neural network to play Minecraft by Video PreTraining (VPT) on a massive unlabeled video dataset of human Minecraft play, while using only a small amount of labeled contractor data. With fine-tuning, our model can learn to craft diamond tools, a task that usually takes proficient humans over 20 minutes (24,000 actions).
Our model uses the native human interface of keypresses and mouse movements, making it quite general, and represents a step towards general computer-using agents.