No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures

Chaitanya Ahuja; Dong Won Lee; Ryo Ishii; Louis-Philippe Morency

2020 EMNLP EMNLP 2020

No Gestures Left Behind: Learning Relationships between Spoken Language and Freeform Gestures

Abstract

AbstractWe study relationships between spoken language and co-speech gestures in context of two key challenges. First, distributions of text and gestures are inherently skewed making it important to model the long tail. Second, gesture predictions are made at a subword level, making it important to learn relationships between language and acoustic cues. We introduce AISLe, which combines adversarial learning with importance sampling to strike a balance between precision and coverage. We propose the use of a multimodal multiscale attention block to perform subword alignment without the need of explicit alignment between language and acoustic cues. Finally, to empirically study the importance of language in this task, we extend the dataset proposed in Ahuja et al. (2020) with automatically extracted transcripts for audio signals. We substantiate the effectiveness of our approach through large-scale quantitative and user studies, which show that our proposed methodology significantly outperforms previous state-of-the-art approaches for gesture generation. Link to code, data and videos: https://github.com/chahuja/aisle

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Speech & Audio

📈 Trend Setter — Speech Enhancement

🧭 Keyword Pioneer — co-speech gesture

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chaitanya Ahuja , Dong Won Lee , Ryo Ishii , Louis-Philippe Morency

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Adversarial Learning Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Learning Types > Multimodal Learning Speech & Audio > Processing > Speech Enhancement

Keywords

adversarial learning attention mechanism multimodal learning importance sampling gesture generation co-speech gesture subword alignment

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020