What Is Jailbreaking?
Jailbreaking is the practice of crafting inputs that circumvent an AI system's safety guardrails to elicit outputs the system was designed to refuse.
Jailbreaking — the practice of crafting inputs that circumvent an AI system's safety guardrails to elicit outputs the system was designed to refuse.
Jailbreaking exploits the gap between what a model is capable of producing and what its safety training intends to prevent. Techniques include role-play framing, hypothetical scenarios, encoded instructions, and adversarial suffixes. Jailbreaking is a persistent governance challenge because safety alignment is probabilistic, not absolute — no current technique fully prevents it. Red-teaming programmes deliberately attempt jailbreaks to find and close gaps before deployment.
Source: NIST AI 100-2; OWASP LLM Top 10 (2025)
Plain-language explanation
Jailbreaking exploits the gap between what a model is capable of producing and what its safety training intends to prevent. Techniques include role-play framing, hypothetical scenarios, encoded instructions, and adversarial suffixes. Jailbreaking is a persistent governance challenge because safety alignment is probabilistic, not absolute — no current technique fully prevents it. Red-teaming programmes deliberately attempt jailbreaks to find and close gaps before deployment.
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment — 3 minutes →