§ feed · storyline
Improving instruction hierarchy in frontier LLMs
IH-Challenge trains frontier LLMs to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
§ sources1 publication · timeline below