What Is Guardrails?
Guardrails is technical and procedural controls that constrain an AI system's behaviour to keep its outputs and actions within acceptable bounds.
Guardrails — technical and procedural controls that constrain an AI system's behaviour to keep its outputs and actions within acceptable bounds.
AI guardrails operate at multiple layers: input filtering (blocking harmful prompts), output filtering (blocking harmful responses), system prompts (instructions constraining behaviour), and runtime monitoring. For agentic AI, guardrails also include action constraints — limiting what real-world actions the system can take without human approval. Guardrails are a defence-in-depth measure, not a guarantee; they can be bypassed through prompt injection and jailbreaking, which is why they form one layer of a broader control framework.
Source: NIST AI 600-1; OWASP LLM Top 10
Plain-language explanation
AI guardrails operate at multiple layers: input filtering (blocking harmful prompts), output filtering (blocking harmful responses), system prompts (instructions constraining behaviour), and runtime monitoring. For agentic AI, guardrails also include action constraints — limiting what real-world actions the system can take without human approval. Guardrails are a defence-in-depth measure, not a guarantee; they can be bypassed through prompt injection and jailbreaking, which is why they form one layer of a broader control framework.
Related insights
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment — 3 minutes →