§ feed · storyline

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

OpenAI publishes research on an instruction hierarchy framework that trains LLMs to prioritise privileged instructions and resist prompt injection and jailbreak attacks.

Apr 19 · 21:00:00 · primary fetch1 sourceupdated Apr 19 · 21:00:00

Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow adversaries to overwrite a model's original instructions with their own malicious prompts.

read full article on openai.com ↗

§ sources1 publication · timeline below

openai.comThe Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructionsprimary21:00:00