Governance Concept

What Is Multimodal Model?

Multimodal Model is an AI model that can process or generate more than one type of data — for example combining text, images, audio, or video — within a single system.

Definition

Multimodal Model — an AI model that can process or generate more than one type of data — for example combining text, images, audio, or video — within a single system.

Multimodal models (for example, systems that accept an image and a text prompt together) expand what AI can do but widen the governance surface: each modality is a separate input channel that can carry bias, prompt injection, or sensitive data. Evaluation and red-teaming need to cover the combinations, not just each modality alone.

Source: Machine-learning literature

Plain-language explanation

Primary source: Machine-learning literature

See where you stand on AI governance

Take the free 7-question maturity assessment and get a personalised action plan.

Free assessment — 3 minutes →

What Is Multimodal Model?

Plain-language explanation

Related terms