Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks

Yingying Zhang; Junyu Gao; Xiaoshan Yang; Chang Liu; Yan Li; Changsheng Xu

2020 AAAI AAAI 2020

Find Objects and Focus on Highlights: Mining Object Semantics for Video Highlight Detection via Graph Neural Networks

Abstract

Abstract With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user's major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH-GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — object semantics

🐣 Hot Topic Early Bird — visual representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yingying Zhang , Junyu Gao , Xiaoshan Yang , Chang Liu , Yan Li , Changsheng Xu

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Core Methods > Representation Learning Deep Learning > Architectures > Graph Neural Networks Computer Vision > Analysis > Video Understanding Artificial Intelligence > Core AI > Computer Vision Deep Learning > Learning Types > Multi-Modal Learning

Keywords

temporal modeling visual representation video highlight detection video analysis temporal graph video representation graph neural network object semantics spatial temporal graph multi-stage loss

Download PDF

Related papers

Enhancing Pointer Network for Sentence Ordering with Pairwise Ordering Predictions 2020

CopyMTL: Copy Mechanism for Joint Extraction of Entities and Relations with Multi-Task Learning 2020

Neural Simile Recognition with Cyclic Multitask Learning and Local Attention 2020

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy 2020

Multi-Point Semantic Representation for Intent Classification 2020