Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video

Haoran Li; Junnan Zhu; Cong Ma; Jiajun Zhang; Chengqing Zong

2017 EMNLP EMNLP 2017

Multi-modal Summarization for Asynchronous Collection of Text, Image, Audio and Video

Abstract

AbstractThe rapid increase of the multimedia data over the Internet necessitates multi-modal summarization from collections of text, image, audio and video. In this work, we propose an extractive Multi-modal Summarization (MMS) method which can automatically generate a textual summary given a set of documents, images, audios and videos related to a specific topic. The key idea is to bridge the semantic gaps between multi-modal contents. For audio information, we design an approach to selectively use its transcription. For vision information, we learn joint representations of texts and images using a neural network. Finally, all the multi-modal aspects are considered to generate the textural summary by maximizing the salience, non-redundancy, readability and coverage through budgeted optimization of submodular functions. We further introduce an MMS corpus in English and Chinese. The experimental results on this dataset demonstrate that our method outperforms other competitive baseline methods.

🌉 Interdisciplinary Bridge — Computer Science and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — multi-modal summarization

🐣 Hot Topic Early Bird — submodular optimization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Haoran Li , Junnan Zhu , Cong Ma , Jiajun Zhang , Chengqing Zong

Topics

Machine Learning > Core Methods > Representation Learning Computer Science > Applications > Information Retrieval Natural Language Processing > Applications > Summarization Deep Learning > Models > Neural Networks Deep Learning > Learning Types > Multi-Modal Learning

Keywords

submodular optimization extractive summarization text generation joint representation submodular function neural network multi-modal summarization semantic gap bridging

Download PDF

Related papers

Reinforced Video Captioning with Entailment Rewards 2017

Cross-lingual Character-Level Neural Morphological Tagging 2017

Inter-Weighted Alignment Network for Sentence Pair Modeling 2017

Investigating Different Syntactic Context Types and Context Representations for Learning Word Embeddings 2017

An Empirical Analysis of Edit Importance between Document Versions 2017