Dr. Mubarak Shah will present a lecture on “Diffusion Models in Computer Vision” at the International Spring School Multimodal Foundation Models and Generative AI 2024 on April 29th in Rabat, Morocco.
Foundation models (FMs) are large deep learning neural networks trained on massive datasets (e.g., with billions of parameters), which can be further adapted to a variety of downstream tasks with little or no supervision. For example, the BERT model released in 2018, one of the first bidirectional foundation models, was trained using 340 million parameters and a 16 GB training dataset. Just five years later, in 2023, OpenAI trained GPT-4 using 170 trillion parameters and a 45 GB training dataset. Rather than developing artificial intelligence (AI) from scratch, AI scientists use a foundation model as a starting point to develop AI models that power new applications faster and more cost-effectively. In recent years, this approach has significantly advanced the state-of-the-art in Computer Vision, Natural Language Processing, Speech Analysis, and several other fields. In particular, multimodal foundation models, which are trained simultaneously with multiple modalities, have shown remarkable success in text or audio to image/video/3D generation.
The purpose of this summer school is to provide a clear overview and an in-depth analysis of the state-of-the-art research in Multimodal Foundation Models and Generative AI. The courses will be delivered by world renowned experts in the field, and will cover both theoretical and practical aspects of Multimodal Foundation Models and Generative AI.
The school aims to provide a stimulating opportunity for young researchers and Ph.D. students. The participants will benefit from direct interaction and discussions with world leaders in Computer Vision. Participants will also have the possibility to present the results of their research, and to interact with their scientific peers, in a friendly and constructive environment.