AIRiskAware
AI Governance Glossary
Governance Concept

What Is Multimodal Model?

Multimodal Model is an AI model that can process or generate more than one type of data — for example combining text, images, audio, or video — within a single system.

Definition

Multimodal Modelan AI model that can process or generate more than one type of data — for example combining text, images, audio, or video — within a single system.

Multimodal models (for example, systems that accept an image and a text prompt together) expand what AI can do but widen the governance surface: each modality is a separate input channel that can carry bias, prompt injection, or sensitive data. Evaluation and red-teaming need to cover the combinations, not just each modality alone.

Source: Machine-learning literature

Plain-language explanation

Multimodal models (for example, systems that accept an image and a text prompt together) expand what AI can do but widen the governance surface: each modality is a separate input channel that can carry bias, prompt injection, or sensitive data. Evaluation and red-teaming need to cover the combinations, not just each modality alone.

Primary source: Machine-learning literature

Related terms

Foundation Model Large Language Model (LLM) Generative AI Inference (AI)

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →