TVSum: Summarizing Web Videos Using Titles

Yale Song; Jordi Vallmitjana; Amanda Stent; Alejandro Jaimes

2015 CVPR CVPR 2015

TVSum: Summarizing Web Videos Using Titles

Abstract

Video summarization is a challenging problem in part because knowing which part of a video is important requires prior knowledge about its main topic. We present TVSum, an unsupervised video summarization framework that uses title-based image search results to find visually important shots. We observe that a video title is often carefully chosen to be maximally descriptive of its main topic, and hence images related to the title can serve as a proxy for important visual concepts of the main topic. However, because titles are free-formed, unconstrained, and often written ambiguously, images searched using the title can contain noise (images irrelevant to video content) and variance (images of different topics). To deal with this challenge, we developed a novel co-archetypal analysis technique that learns canonical visual concepts shared between video and images, but not in either alone, by finding a joint-factorial representation of two data sets. We introduce a new benchmark dataset, TVSum50, that contains 50 videos and their shot-level importance scores annotated via crowdsourcing. Experimental results on two datasets, SumMe and TVSum50, suggest our approach produces superior quality summaries compared to several recently proposed approaches.

🌉 Interdisciplinary Bridge — Computer Vision and Data Science & Analytics and Machine Learning

🧭 Keyword Pioneer — title-based search

🐣 Hot Topic Early Bird — video processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yale Song , Jordi Vallmitjana , Amanda Stent , Alejandro Jaimes

Topics

Machine Learning > Learning Types > Unsupervised Learning Computer Vision > Processing > Video Understanding Data Science & Analytics > Methods > Data Mining Computer Vision > Applications > Computer Vision

Keywords

unsupervised learning visual concept video summarization video processing title-based search co-archetypal analysis

Download PDF

Related papers

Long-Term Correlation Tracking 2015

Hierarchically-Constrained Optical Flow 2015

Propagated Image Filtering 2015

Web Scale Photo Hash Clustering on A Single Machine 2015

Expanding Object Detector's Horizon: Incremental Learning Framework for Object Detection in Videos 2015