Leveraging Large Language Models to Refine Automatic Feedback Generation at Articulatory Level in Computer Aided Pronunciation Training

Huihang Zhong; Yanlu Xie; ZiJin Yao

2024 INTERSPEECH INTERSPEECH 2024

Leveraging Large Language Models to Refine Automatic Feedback Generation at Articulatory Level in Computer Aided Pronunciation Training

Abstract

This study explores the potential of leveraging Large Language Models (LLMs) to refine automatic feedback generation in Computer-Aided Pronunciation Training (CAPT). Specifically, it evaluates the impact of two factors on the effectiveness of automatically generated pronunciation feedbacks: (1) the use of mispronunciation detection at different fine-grained levels as prompts for GPT-4 models to generate automatic feedback, and (2) the fine-tuning of GPT-4 models using specific prompt-feedback pairs aimed at optimizing feedback generation. Feedback generated through each approach is rated by second language (L2) learners in terms of comprehensibility and helpfulness. The results highlight both the potential of using LLMs for automatic feedback generation and the effectiveness of articulatory level representations. Our accessible demonstrations invite further exploration.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — automatic feedback generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Huihang Zhong , Yanlu Xie , ZiJin Yao

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models

Keywords

pronunciation training feedback generation pronunciation assessment articulatory phonetics large language model computer aided pronunciation training automatic feedback generation

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024