A Dataset for Movie Description

Anna Rohrbach; Marcus Rohrbach; Niket Tandon; Bernt Schiele

2015 CVPR CVPR 2015

A Dataset for Movie Description

Abstract

Audio Description (AD) provides linguistic descriptions of movies and allows visually impaired people to follow a movie along with their peers. Such descriptions are by design mainly visual and thus naturally form an interesting data source for computer vision and computational linguistics. In this work we propose a novel dataset which contains transcribed ADs, which are temporally aligned to full length HD movies. In addition we also collected the aligned movie scripts which have been used in prior work and compare the two different sources of descriptions. In total the MPII Movie Description dataset (MPII-MD) contains a parallel corpus of over 68K sentences and video snippets from 94 HD movies. We characterize the dataset by benchmarking different approaches for generating video descriptions. Comparing ADs to scripts, we find that ADs are far more visual and describe precisely what is shown rather than what should happen according to the scripts created prior to movie production.

🌉 Interdisciplinary Bridge — Computer Vision and Interdisciplinary and Natural Language Processing

📈 Trend Setter — Computational Linguistics

🧭 Keyword Pioneer — dataset creation

🐣 Hot Topic Early Bird — dataset creation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Anna Rohrbach , Marcus Rohrbach , Niket Tandon , Bernt Schiele

Topics

Computer Vision > Generation > Image Captioning Interdisciplinary > Linguistics > Computational Linguistics Natural Language Processing > Applications > Image Captioning Computer Vision > Applications > Computer Vision

Keywords

dataset creation video captioning video understanding visual description video description audio description movie description

Download PDF

Related papers

Long-Term Correlation Tracking 2015

Hierarchically-Constrained Optical Flow 2015

Propagated Image Filtering 2015

Web Scale Photo Hash Clustering on A Single Machine 2015

Expanding Object Detector's Horizon: Incremental Learning Framework for Object Detection in Videos 2015