Visual Semantic Search: Retrieving Videos via Complex Textual Queries

Dahua Lin; Sanja Fidler; Chen Kong; Raquel Urtasun

2014 CVPR CVPR 2014

Visual Semantic Search: Retrieving Videos via Complex Textual Queries

Abstract

In this paper, we tackle the problem of retrieving videos using complex natural language queries. Towards this goal, we first parse the sentential descriptions into a semantic graph, which is then matched to visual concepts using a generalized bipartite matching algorithm. Our approach exploits object appearance, motion and spatial relations, and learns the importance of each term using structure prediction. We demonstrate the effectiveness of our approach on a new dataset designed for semantic search in the context of autonomous driving, which exhibits complex and highly dynamic scenes with many objects. We show that our approach is able to locate a major portion of the objects described in the query with high accuracy, and improve the relevance in video retrieval.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — semantic search

🐣 Hot Topic Early Bird — semantic search

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dahua Lin , Sanja Fidler , Chen Kong , Raquel Urtasun

Topics

Computer Vision > Processing > Video Understanding Computer Vision > Domain-Specific > Autonomous Driving Computer Vision > Analysis > Video Understanding Deep Learning > Learning Types > Representation Learning

Keywords

bipartite matching semantic search video retrieval natural language queries structure prediction spatial relation semantic graph

Download PDF

Related papers

Efficient Nonlinear Markov Models for Human Motion 2014

Occlusion Geodesics for Online Multi-Object Tracking 2014

A Principled Approach for Coarse-to-Fine MAP Inference 2014

Locally Optimized Product Quantization for Approximate Nearest Neighbor Search 2014

Fast and Accurate Image Matching with Cascade Hashing for 3D Reconstruction 2014