What Is Multimodal Model?
Multimodal Model is an AI model that can process or generate more than one type of data — for example combining text, images, audio, or video — within a single system.
Multimodal Model — an AI model that can process or generate more than one type of data — for example combining text, images, audio, or video — within a single system.
Multimodal models (for example, systems that accept an image and a text prompt together) expand what AI can do but widen the governance surface: each modality is a separate input channel that can carry bias, prompt injection, or sensitive data. Evaluation and red-teaming need to cover the combinations, not just each modality alone.
Source: Machine-learning literature
Plain-language explanation
Multimodal models (for example, systems that accept an image and a text prompt together) expand what AI can do but widen the governance surface: each modality is a separate input channel that can carry bias, prompt injection, or sensitive data. Evaluation and red-teaming need to cover the combinations, not just each modality alone.
Related terms
See where you stand on AI governance
Take the free 7-question maturity assessment and get a personalised action plan.
Free assessment — 3 minutes →