Dr. Afshin Dehghan
Apple
Sr AIML Manager, Apple. Leading the Multimodal Intelligence Team in the Hardware Technology group.
Topic:
Advancing Video Understanding: From Training-Free to Streaming Video LLM
Abstract
We trace the progression from training-free methods to architectures specifically designed for streaming video, with the goal of enabling real-time, proactive assistants. We'll discuss strategies for adapting existing VLMs to handle temporal reasoning and large-scale multimodal integration, both with and without fine-tuning. The talk will highlight recent works from our group, including SF-LLaVA, SF-LLaVA 1.5, and SteamBridge.