Papers
8,761 papers found
A Dataset and Two-pass System for Reading Miscue Detection
Raj Gothi, Rahul Kumar, Mildred Pereira et al.
Adding User Feedback To Enhance CB-Whisper
Raul Monteiro
A demonstrator for articulation-based command word recognition
Joao Vitor Possamai de Menezes, Arne-Lukas Fietkau, Tom Diener et al.
A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding
Gaëlle Laperrière, Sahar Ghannay, Bassam Jabaian et al.
Adversarial Robustness Analysis in Automatic Pathological Speech Detection Approaches
Mahdi Amiri, Ina Kodrasi
Aerodynamics of Sakata labial-velar oral stops
Lorenzo Maselli, Véronique Delvaux
Affricates in Lushootseed
Ted Kye
AFL-Net: Integrating Audio, Facial, and Lip Modalities with a Two-step Cross-attention for Robust Speaker Diarization in the Wild
YongKang Yin, Xu Li, Ying Shan et al.
A Framework for Phoneme-Level Pronunciation Assessment Using CTC
Xinwei Cao, Zijian Fan, Torbjørn Svendsen et al.
A Functional Trade-off between Prosodic and Semantic Cues in Conveying Sarcasm
Zhu Li, Xiyuan Gao, Yuqing Zhang et al.
Age-related Differences in Acoustic Cues for the Perception of Checked Syllables in Shengzhou Wu
Bingliang Zhao, Jiangping Kong, Xiyu Wu
AG-LSEC: Audio Grounded Lexical Speaker Error Correction
Rohit Paturi, Xiang Li, Sundararajan Srinivasan
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer
Himanshu Maurya, Atli Sigurgeirsson
A Joint Noise Disentanglement and Adversarial Training Framework for Robust Speaker Verification
Xujiang Xing, Mingxing Xu, Thomas Fang Zheng
A Language Modeling Approach to Diacritic-Free Hebrew TTS
Amit Roth, Arnon Turetzky, Yossi Adi
A Layer-Anchoring Strategy for Enhancing Cross-Lingual Speech Emotion Recognition
Shreya G. Upadhyay, Carlos Busso, Chi-Chun Lee
A layer-wise analysis of Mandarin and English suprasegmentals in SSL speech models
Anton de la Fuente, Dan Jurafsky
AlignNet: Learning dataset score alignment functions to enable better training of speech quality estimators
Jaden Pieper, Stephen Voran
All Ears: Building Self-Supervised Learning based ASR models for Indian Languages at scale
Vasista Sai Lodagala, Abhishek Biswas, Shoutrik Das et al.
All Neural Low-latency Directional Speech Extraction
Ashutosh Pandey, Sanha Lee, Juan Azcarreta et al.
A Low-Bitrate Neural Audio Codec Framework with Bandwidth Reduction and Recovery for High-Sampling-Rate Waveforms
Yang Ai, Ye-Xin Lu, Xiao-Hang Jiang et al.
A multimodal analysis of different types of laughter expression in conversational dialogues
Kexin Wang, Carlos Ishi, Ryoko Hayashi
A multimodal approach to study the nature of coordinative patterns underlying speech rhythm
Jinyu Li, Leonardo Lancia
A Multimodal Framework for the Assessment of the Schizophrenia Spectrum
Gowtham Premananth, Yashish M. Siriwardena, Philip Resnik et al.
A Multitask Training Approach to Enhance Whisper with Open-Vocabulary Keyword Spotting
Yuang Li, Min Zhang, Chang Su et al.