COHESIV: Contrastive Object and Hand Embedding Segmentation In Video

Dandan Shan; Richard Higgins; David Fouhey

2021 NIPS NeurIPS 2021

COHESIV: Contrastive Object and Hand Embedding Segmentation In Video

Abstract

In this paper we learn to segment hands and hand-held objects from motion. Our system takes a single RGB image and hand location as input to segment the hand and hand-held object. For learning, we generate responsibility maps that show how well a hand's motion explains other pixels' motion in video. We use these responsibility maps as pseudo-labels to train a weakly-supervised neural network using an attention-based similarity loss and contrastive loss. Our system outperforms alternate methods, achieving good performance on the 100DOH, EPIC-KITCHENS, and HO3D datasets.

🌉 Interdisciplinary Bridge — Computer Science and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Dandan Shan , Richard Higgins , David Fouhey

Topics

Machine Learning > Learning Types > Contrastive Learning Machine Learning > Learning Types > Weakly Supervised Learning Computer Science > Applications > Computer Vision

Keywords

contrastive learning video segmentation weakly-supervised learning object segmentation hand segmentation

Download PDF

Related papers

Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data 2021

On Model Calibration for Long-Tailed Object Detection and Instance Segmentation 2021

Test-Time Personalization with a Transformer for Human Pose Estimation 2021

NTopo: Mesh-free Topology Optimization using Implicit Neural Representations 2021

Scalable Intervention Target Estimation in Linear Models 2021