Governance Practice

What Is Jailbreaking?

Jailbreaking is the practice of crafting inputs that circumvent an AI system's safety guardrails to elicit outputs the system was designed to refuse.

Definition

Jailbreaking — the practice of crafting inputs that circumvent an AI system's safety guardrails to elicit outputs the system was designed to refuse.

Jailbreaking exploits the gap between what a model is capable of producing and what its safety training intends to prevent. Techniques include role-play framing, hypothetical scenarios, encoded instructions, and adversarial suffixes. Jailbreaking is a persistent governance challenge because safety alignment is probabilistic, not absolute — no current technique fully prevents it. Red-teaming programmes deliberately attempt jailbreaks to find and close gaps before deployment.

Source: NIST AI 100-2; OWASP LLM Top 10 (2025)

Plain-language explanation

Primary source: NIST AI 100-2; OWASP LLM Top 10 (2025)

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →