
Abstract
Modern intelligence systems are increasingly designed for multiple modalities to solve real-world tasks. On the one hand, these multi-modal models are expected to deliver strong performance across diverse and complex inputs for practice. On the other hand, as the model scale keeps growing, the efficiency becomes a significant challenge for both training and inference. This creates a critical tension of how we can build large-scale models with competitive performance while maintaining model efficiency under practical multi-modal scenarios.
In this talk, I will introduce how model sparsity can serve as a powerful tool to address these challenges. Focusing on multi-modal learning, I will present a series of approaches that leverage model sparsity to improve either or both model efficiency and capability. They span a range of topics, including vision-language retrieval, large language models (LLMs), and multi-modal large language models (MLLMs), illustrating how sparsity can support effective and efficient AI systems in multi-modal settings.
For more info, please follow this link.