AIRiskAware
AI Governance Glossary
Governance Practice

What Is Guardrails?

Guardrails is technical and procedural controls that constrain an AI system's behaviour to keep its outputs and actions within acceptable bounds.

Definition

Guardrailstechnical and procedural controls that constrain an AI system's behaviour to keep its outputs and actions within acceptable bounds.

AI guardrails operate at multiple layers: input filtering (blocking harmful prompts), output filtering (blocking harmful responses), system prompts (instructions constraining behaviour), and runtime monitoring. For agentic AI, guardrails also include action constraints — limiting what real-world actions the system can take without human approval. Guardrails are a defence-in-depth measure, not a guarantee; they can be bypassed through prompt injection and jailbreaking, which is why they form one layer of a broader control framework.

Source: NIST AI 600-1; OWASP LLM Top 10

Plain-language explanation

AI guardrails operate at multiple layers: input filtering (blocking harmful prompts), output filtering (blocking harmful responses), system prompts (instructions constraining behaviour), and runtime monitoring. For agentic AI, guardrails also include action constraints — limiting what real-world actions the system can take without human approval. Guardrails are a defence-in-depth measure, not a guarantee; they can be bypassed through prompt injection and jailbreaking, which is why they form one layer of a broader control framework.

Primary source: NIST AI 600-1; OWASP LLM Top 10

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →