Papers
8,761 papers found
Bridging Emotions Across Languages: Low Rank Adaptation for Multilingual Speech Emotion Recognition
Lucas Goncalves, Donita Robinson, Elizabeth Richerson et al.
Bridging Language Gaps in Audio-Text Retrieval
Zhiyong Yan, Heinrich Dinkel, Yongqing Wang et al.
BS-PLCNet 2: Two-stage Band-split Packet Loss Concealment Network with Intra-model Knowledge Distillation
Zihan Zhang, Xianjun Xia, Chuanzeng Huang et al.
BTS: Bridging Text and Sound Modalities for Metadata-Aided Respiratory Sound Classification
June-Woo Kim, Miika Toikkanen, Yera Choi et al.
CALL system using pitch-accent feature representations reflecting listeners’ subjective adequacy
Ikuyo Masuda-Katsuse, Ayako Shirose
Can Large Language Models Understand Spatial Audio?
Changli Tang, Wenyi Yu, Guangzhi Sun et al.
Can Modelling Inter-Rater Ambiguity Lead To Noise-Robust Continuous Emotion Predictions?
Ya-Tse Wu, Jingyao Wu, Vidhyasaharan Sethu et al.
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
Tiantian Feng, Dimitrios Dimitriadis, Shrikanth S. Narayanan
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
Zakaria Aldeneh, Takuya Higuchi, Jee-weon Jung et al.
CaptainA self-study mobile app for practising speaking: task completion assessment and feedback with generative AI
Nhan Phan, Anna von Zansen, Maria Kautonen et al.
CDSD: Chinese Dysarthria Speech Database
Yan Wan, Mengyi Sun, Xinchen Kang et al.
CEC: A Noisy Label Detection Method for Speaker Recognition
Yao Shen, Yingying Gao, Yaqian Hao et al.
Centroid Estimation with Transformer-Based Speaker Embedder for Robust Target Speaker Extraction
Woon-Haeng Heo, Joongyu Maeng, Yoseb Kang et al.
Challenge of Singing Voice Synthesis Using Only Text-To-Speech Corpus With FIRNet Source-Filter Neural Vocoder
Takuma Okamoto, Yamato Ohtani, Sota Shimizu et al.
Challenges of German Speech Recognition: A Study on Multi-ethnolectal Speech Among Adolescents
Martha Schubert, Daniel Duran, Ingo Siegert
Challenging margin-based speaker embedding extractors by using the variational information bottleneck
Themos Stafylakis, Anna Silnova, Johan Rohdin et al.
Characterizing code-switching: Applying Linguistic Principles for Metric Assessment and Development
Jie Chi, Electra Wallington, Peter Bell
Children’s Speech Recognition through Discrete Token Enhancement
Vrunda N. Sukhadia, Shammur Absar Chowdhury
Classification of Room Impulse Responses and its application for channel verification and diarization
Yuri Khokhlov, Tatiana Prisyach, Anton Mitrofanov et al.
Clever Hans Effect Found in Automatic Detection of Alzheimer's Disease through Speech
Yin-Long Liu, Rui Feng, Jia-Hong Yuan et al.
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge
Chen Chen, Zehua Liu, Xiaolou Li et al.
Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations
Kunal Dhawan, Nithin Rao Koluguri, Ante Jukić et al.
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
Yi Lu, Yuankun Xie, Ruibo Fu et al.