Papers
8,761 papers found
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss
Muhammad Shakeel, Yui Sudo, Yifan Peng et al.
Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems
Chin Yuen Kwok, Jia Qi Yip, Eng Siong Chng
Contrastive Feedback Mechanism for Simultaneous Speech Translation
Haotian Tan, Sakriani Sakti
Contrastive Learning and Inter-Speaker Distribution Alignment Based Unsupervised Domain Adaptation for Robust Speaker Verification
Zuoliang Li, Wu Guo, Bin Gu et al.
Contrastive Learning Approach for Assessment of Phonological Precision in Patients with Tongue Cancer Using MRI Data
Tomas Arias-Vergara, Paula Andrea Pérez-Toro, Xiaofeng Liu et al.
Controlling Emotion in Text-to-Speech with Natural Language Prompts
Thomas Bott, Florian Lux, Ngoc Thang Vu
ConvoCache: Smart Re-Use of Chatbot Responses
Conor Atkins, Ian Wood, Mohamed Ali Kaafar et al.
Convolution-Augmented Parameter-Efficient Fine-Tuning for Speech Recognition
Kwangyoun Kim, Suwon Shon, Yi-Te Hsu et al.
COSMIC: Data Efficient Instruction-tuning For Speech In-Context Learning
Jing Pan, Jian Wu, Yashesh Gaur et al.
CreakVC: a voice conversion tool for modulating creaky voice
Harm Lameris, Joakim Gustafson, Éva Székely
CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions
Mario Zusag, Laurin Wagner, Bernhad Thallinger
Cross-Attention-Guided WaveNet for EEG-to-MEL Spectrogram Reconstruction
Hao Li, Yuan Fang, Xueliang Zhang et al.
Crosslinguistic Comparison of Acoustic Variation in the Vowel Sequences /ia/ and /io/ in Four Romance Languages
Johanna Cronenberg, Ioana Chitoran, Lori Lamel et al.
Cross-Linguistic Intelligibility of Non-Compositional Expressions in Spoken Context
Iuliia Zaitova, Irina Stenger, Wei Xue et al.
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval
Lifeng Zhou, Yuke Li, Rui Deng et al.
Cross-modal Features Interaction-and-Aggregation Network with Self-consistency Training for Speech Emotion Recognition
Ying Hu, Huamin Yang, Hao Huang et al.
Cross-Modality Diffusion Modeling and Sampling for Speech Recognition
Chia-Kai Yeh, Chih-Chun Chen, Ching-Hsien Hsu et al.
Cross-transfer Knowledge between Speech and Text Encoders to Evaluate Customer Satisfaction
Luis Felipe Parra-Gallego, Tilak Purohit, Bogdan Vlasenko et al.
CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting
Sichen Jin, Youngmoon Jung, Seungjin Lee et al.
CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection
Yongyi Zang, Jiatong Shi, You Zhang et al.
Custom wake word detection
Kesavaraj V, Charan Devarkonda, Vamshiraghusimha Narasinga et al.
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
Tzu-Quan Lin, Hung-yi Lee, Hao Tang
Dataset-Distillation Generative Model for Speech Emotion Recognition
Fabian Ritter-Gutierrez, Kuan-Po Huang, Jeremy H. M. Wong et al.
DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition
Xin Jing, Luyang Zhang, Jiangjian Xie et al.
DBD-CI: Doubling the Band Density for Bilateral Cochlear Implants
Mingyue Shi, Huali Zhou, Qinglin Meng et al.