Incremental Dialogue Act Recognition: Token- vs Chunk-Based Classification

Eustace Ebhotemhen; Volha Petukhova; Dietrich Klakow

2017 INTERSPEECH INTERSPEECH 2017

Incremental Dialogue Act Recognition: Token- vs Chunk-Based Classification

Abstract

This paper presents a machine learning based approach to incremental dialogue act classification with a focus on the recognition of communicative functions associated with dialogue segments in a multidimensional space, as defined in the ISO 24617-2 dialogue act annotation standard. The main goal is to establish the nature of an increment whose processing will result in a reliable overall system performance. We explore scenarios where increments are tokens or syntactically, semantically or prosodically motivated chunks. Combing local classification with meta-classifiers at a late fusion decision level we obtained state-of-the-art classification performance. Experiments were carried out on manually corrected transcriptions and on potentially erroneous ASR output. Chunk-based classification yields better results on the manual transcriptions, whereas token-based classification shows a more robust performance on the ASR output. It is also demonstrated that layered hierarchical and cascade training procedures result in better classification performance than the single-layered approach based on a joint classification predicting complex class labels.

🧭 Keyword Pioneer — dialogue act recognition

🐝 Cross-Pollinator — Artificial Intelligence, Deep Learning, Interdisciplinary, Machine Learning, Natural Language Processing, Speech & Audio

Authors

Eustace Ebhotemhen , Volha Petukhova , Dietrich Klakow

Topics

Machine Learning > Core Methods > Classification

Keywords

incremental classification dialogue act recognition token-based classification chunk-based classification

Download PDF

Related papers

Description of the Munich-Passau Snore Sound Corpus (MPSSC) 2017

A Study on Replay Attack and Anti-Spoofing for Automatic Speaker Verification 2017

Binaural Reverberant Speech Separation Based on Deep Neural Networks 2017

Building Audio-Visual Phonetically Annotated Arabic Corpus for Expressive Text to Speech 2017

A Comparison of Danish Listeners’ Processing Cost in Judging the Truth Value of Norwegian, Swedish, and English Sentences 2017