Computer Vision › Processing ›

Video Understanding

1592 directly classified papers

Papers per year

Papers

Context-Dependent Sentiment Analysis in User-Generated Videos ACL 2017

Thin-Slicing Network: A Deep Structured Model for Pose Estimation in Videos CVPR 2017

Temporal Residual Networks for Dynamic Scene Recognition CVPR 2017

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video CVPR 2017

Spatio-Temporal Self-Organizing Map Deep Network for Dynamic Object Detection From Videos CVPR 2017

Reasoning About Liquids via Closed-Loop Simulation RSS 2017

Safe Visual Navigation via Deep Learning and Novelty Detection RSS 2017

Supervising Neural Attention Models for Video Captioning by Human Gaze Data CVPR 2017

Deep Sequential Context Networks for Action Prediction CVPR 2017

Predicting Salient Face in Multiple-Face Videos CVPR 2017

Unsupervised Semantic Scene Labeling for Streaming Data CVPR 2017

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos CVPR 2017

A Dataset and Exploration of Models for Understanding Video Data Through Fill-In-The-Blank Question-Answering CVPR 2017

The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives CVPR 2017

Primary Object Segmentation in Videos Based on Region Augmentation and Reduction CVPR 2017

Optical Flow in Mostly Rigid Scenes CVPR 2017

DeMoN: Depth and Motion Network for Learning Monocular Stereo CVPR 2017

Online Video Object Segmentation via Convolutional Trident Network CVPR 2017

Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos CVPR 2017

Predicting Scene Parsing and Motion Dynamics in the Future NIPS 2017

Recurrent Ladder Networks NIPS 2017

Video Highlight Prediction Using Audience Chat Reactions EMNLP 2017

Unsupervised Learning of Disentangled Representations from Video NIPS 2017

Visual Interaction Networks: Learning a Physics Simulator from Video NIPS 2017

Representations of language in a model of visually grounded speech signal ACL 2017