Multimedia multimodal artificial intelligence (MMAI): Foundations, challenges, and future directions

Le Hoang Son; Oni Damilola.I

Multimedia multimodal artificial intelligence (MMAI): Foundations, challenges, and future directions

Le Hoang Son, Oni Damilola.I

Author affiliations

Authors

Le Hoang Son VNU Information Technology Institute, Vietnam National University, 144 Xuan Thuy Street, Cau Giay Ward, Ha Noi, Viet Nam
Oni Damilola.I International School, Vietnam National University, HT1 Building, VNU Campus at Hoa Lac, Ha Noi, Viet Nam https://orcid.org/0000-0002-1460-7183

Keywords:

AI 4.0, Content Generation, Fusion Techniques, Human-Computer Interaction, Multimedia, Multimodal AI, Self-Supervised Learning.

Abstract

Multimedia Multimodal Artificial Intelligence (MMAI) represents a transformational paradigm that enables machines to process and synthesize many modalities, e.g., text, image, audio, and video, to understand and generate complex multimedia content. This review provides an intensive exploration of multimedia multimodal artificial intelligence, which focuses on its basic models, major challenges, and future directions. Drawing insight from recent research trends and literature, this paper presents a comprehensive analysis of multimodal AI, fusion techniques, self-supervised learning strategies, and real-world applications such as healthcare, education, entertainment, and human-computer interactions. It also examines the theoretical foundation of MMAI, including multimodal representation, alignment, and fusion techniques, which are very important to integrate heterogeneous data sources while maintaining coherence and relevance. The review also mentions the role of self-supervised learning in reducing dependence on labeled datasets by taking advantage of the underlying structure of multimodal data. Additionally, this review highlights the ability of generic AI to create multimedia content, stretching the limits of what AI can do in creative and practical domains. Despite this progress, many challenges persist, including technical limitations like high computational costs, data inequality or heterogeneity, and model interpretability, as well as ethical concerns relating to privacy and bias. Finally, future research directions will be mapped out, including the development of scalable and efficient training methods, the integration of symbolic reasoning with deep learning, and the promotion of interdisciplinary collaboration. By synthesizing knowledge from leading studies and industry innovations, this review will be a blueprint for people, which aims to exploit the full potential of AI-driven multimedia technologies in an increasingly interconnected world.

Downloads

Published

02-03-2026

How to Cite

[1]L. H. Son and O. Damilola.I, “Multimedia multimodal artificial intelligence (MMAI): Foundations, challenges, and future directions”, J. Comput. Sci. Cybern., Mar. 2026.

Download Citation

Issue

Online First

Section

Articles

License

1. We hereby assign copyright of our article (the Work) in all forms of media, whether now known or hereafter developed, to the Journal of Computer Science and Cybernetics. We understand that the Journal of Computer Science and Cybernetics will act on my/our behalf to publish, reproduce, distribute and transmit the Work.
2. This assignment of copyright to the Journal of Computer Science and Cybernetics is done so on the understanding that permission from the Journal of Computer Science and Cybernetics is not required for me/us to reproduce, republish or distribute copies of the Work in whole or in part. We will ensure that all such copies carry a notice of copyright ownership and reference to the original journal publication.
3. We warrant that the Work is our results and has not been published before in its current or a substantially similar form and is not under consideration for another publication, does not contain any unlawful statements and does not infringe any existing copyright.
4. We also warrant that We have obtained the necessary permission from the copyright holder/s to reproduce in the article any materials including tables, diagrams or photographs not owned by me/us.

Multimedia multimodal artificial intelligence (MMAI): Foundations, challenges, and future directions

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

Published by Year

indexing

Information