Parsing Speech: a Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

Trang Tran; Shubham Toshniwal; Mohit Bansal; Kevin Gimpel; Karen Livescu; Mari Ostendorf

2018 NAACL NAACL 2018

Parsing Speech: a Neural Approach to Integrating Lexical and Acoustic-Prosodic Information

Abstract

AbstractIn conversational speech, the acoustic signal provides cues that help listeners disambiguate difficult parses. For automatically parsing spoken utterances, we introduce a model that integrates transcribed text and acoustic-prosodic features using a convolutional neural network over energy and pitch trajectories coupled with an attention-based recurrent neural network that accepts text and prosodic features. We find that different types of acoustic-prosodic features are individually helpful, and together give statistically significant improvements in parse and disfluency detection F1 scores over a strong text-only baseline. For this study with known sentence boundaries, error analyses show that the main benefit of acoustic-prosodic features is in sentences with disfluencies, attachment decisions are most improved, and transcription errors obscure gains from prosody.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Natural Language Processing and Speech & Audio

🧭 Keyword Pioneer — speech parsing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Trang Tran , Shubham Toshniwal , Mohit Bansal , Kevin Gimpel , Karen Livescu , Mari Ostendorf

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Understanding > Parsing Speech & Audio > Analysis > Prosody Analysis Speech & Audio > Analysis > Speech Analysis Computer Vision > Core AI > Multimodal Learning Deep Learning > Learning Types > Multimodal Learning

Keywords

attention mechanism disfluency detection convolutional neural network conversational speech acoustic-prosodic feature speech parsing spoken language parsing attention-based recurrent neural network speech disfluency detection

Download PDF

Related papers

A Melody-Conditioned Lyrics Language Model 2018

Before Name-Calling: Dynamics and Triggers of Ad Hominem Fallacies in Web Argumentation 2018

Automated Essay Scoring in the Presence of Biased Ratings 2018

Neural Automated Essay Scoring and Coherence Modeling for Adversarially Crafted Input 2018

QuickEdit: Editing Text & Translations by Crossing Words Out 2018