AIRiskAware
AI Governance Glossary
Governance Practice

What Is Reinforcement Learning?

Reinforcement Learning is a machine-learning paradigm in which an agent learns to make decisions by taking actions in an environment and receiving rewards or penalties that it seeks to maximise over time.

Definition

Reinforcement Learninga machine-learning paradigm in which an agent learns to make decisions by taking actions in an environment and receiving rewards or penalties that it seeks to maximise over time.

Unlike supervised learning, which learns from labelled examples, reinforcement learning learns from trial-and-error feedback. It underpins reinforcement learning from human feedback (RLHF), used to align large language models to human preferences. From a governance view, reward design is a key risk: poorly specified rewards can produce unintended or unsafe behaviour ("reward hacking").

Source: ISO/IEC 22989:2022 (AI concepts and terminology)

Plain-language explanation

Unlike supervised learning, which learns from labelled examples, reinforcement learning learns from trial-and-error feedback. It underpins reinforcement learning from human feedback (RLHF), used to align large language models to human preferences. From a governance view, reward design is a key risk: poorly specified rewards can produce unintended or unsafe behaviour ("reward hacking").

Primary source: ISO/IEC 22989:2022 (AI concepts and terminology)

Related terms

RLHF (Reinforcement Learning from Human Feedback) Machine Learning AI Alignment Model Evaluation

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →