§ feed · storyline
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions
OpenAI publishes research on an instruction hierarchy framework that trains LLMs to prioritise privileged instructions and resist prompt injection and jailbreak attacks.
Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.
§ sources1 publication · timeline below