2019
CVPR
CVPR 2019
Learning Individual Styles of Conversational Gesture
Abstract
Human speech is often accompanied by hand and arm gestures. We present a method for cross-modal translation from "in-the-wild" monologue speech of a single speaker to their conversational gesture motion. We train on unlabeled videos for which we only have noisy pseudo ground truth from an automatic pose detection system. Our proposed model significantly outperforms baseline methods in a quantitative comparison. To support research toward obtaining a computational understanding of the relationship between gesture and speech, we release a large video dataset of person-specific gestures.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning
🧭
Keyword Pioneer
— speech gesture
🐣
Hot Topic Early Bird
— cross-modal learning
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Multimodal Learning
Machine Learning > Core Methods > Representation Learning
Machine Learning > Learning Types > Self-Supervised Learning
Computer Vision > Generation > Video Generation
Computer Vision > Analysis > Video Understanding
Deep Learning > Learning Types > Multi-Modal Learning