The Aligned Multimodal Movie Treebank: An audio, video, dependency-parse treebank

Adam Yaari; Jan DeWitt; Henry Hu; Bennett Stankovits; Sue Felshin; Yevgeni Berzak; Helena Aparicio; Boris Katz; Ignacio Cases; Andrei Barbu

2022 EMNLP EMNLP 2022

The Aligned Multimodal Movie Treebank: An audio, video, dependency-parse treebank

Abstract

AbstractTreebanks have traditionally included only text and were derived from written sources such as newspapers or the web. We introduce the Aligned Multimodal Movie Treebank (AMMT), an English language treebank derived from dialog in Hollywood movies which includes transcriptions of the audio-visual streams with word-level alignment, as well as part of speech tags and dependency parses in the Universal Dependencies formalism. AMMT consists of 31,264 sentences and 218,090 words, that will amount to the 3rd largest UD English treebank and the only multimodal treebank in UD. To help with the web-based annotation effort, we also introduce the Efficient Audio Alignment Annotator (EAAA), a companion tool that enables annotators to significantly speed-up their annotation processes.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Natural Language Processing

🧭 Keyword Pioneer — multimodal treebank

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio