As video acquisition devices become more and more widespread, the corpus of available video content is exponentially growing. With emergence of video logging (vlog) phenomenon, rise in number of security cameras installed all around the globe, aerial videography using UAVs and drones, and increase in utilizing body-worn and dash cameras by police officers everywhere, it has become extremely challenging to browse, index, or extract information from this rich resource. One obstacle to research on video summarization is designing systems that can adapt to different user needs. Furthermore, effective training of such models remains a challenge. Lastly, while creating short summaries is useful, it is rather ineffective for
browsing large databases as one still has to watch the summaries to infer information about the original videos.
This dissertation makes contributions to above tasks by proposing:
(1) A new research problem, Query-Focused Video Summarization, in which the user preferences influence the summarization process, and developing a probabilistic model for it. (2) The first Query-Focused Video Summarization Dataset, a novel concept-level evaluation metric, as well as an improved framework. (3) A large-margin objective function to address the exposure bias problem commonly present in most sequence-to-sequence learning problems, and a new probabilistic model that accepts user input about the expected length of the summary. (4) A framework that given a video, produces a short textual synopsis for it, enabling users to quickly index a large video database without watching them.