Papers
8,761 papers found
Autoregressive cross-interlocutor attention scores meaningfully capture conversational dynamics
Matthew McNeill, Rivka Levitan
AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning
Jongsuk Kim, Jiwon Shin, Junmo Kim
AVR: synergizing foundation models for audio-visual humor detection
Sarthak Sharma, Orchid Chetia Phukan, Drishti Singh et al.
Backchannel prediction, based on who, when and what
Yo-Han Park, Wencke Liermann, Yong-Seok Choi et al.
Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques
Mun-Hak Lee, Jae-Hong Lee, DoHee Kim et al.
Balance, Multiple Augmentation, and Re-synthesis: A Triad Training Strategy for Enhanced Audio Deepfake Detection
Thien-Phuc Doan, Long Nguyen-Vu, Kihun Hong et al.
Beam-search SIEVE for low-memory speech recognition
Martino Ciaperoni, Athanasios Katsamanis, Aristides Gionis et al.
Behavioral evidence for higher speech rate convergence following natural than artificial time altered speech
Jérémy Giroud, Jessica Lei, Kirsty Phillips et al.
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
Ruchao Fan, Natarajan Balaji Shankar, Abeer Alwan
BESST Dataset: A Multimodal Resource for Speech-based Stress Detection and Analysis
Jan Pešán, Vojtěch Juřík, Martin Karafiát et al.
Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models
Matthew Perez, Aneesha Sampath, Minxue Niu et al.
Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis
Christina Tånnander, Shivam Mehta, Jonas Beskow et al.
Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications
Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Wangyou Zhang, Kohei Saijo, Jee-weon Jung et al.
Bilingual and Code-switching TTS Enhanced with Denoising Diffusion Model and GAN
Huai-Zhe Yang, Chia-Ping Chen, Shan-Yun He et al.
Bilingual Rhotic Production Patterns: A Generational Comparison of Spanish-English Bilingual Speakers in Canada
Ioana Colgiu, Laura Spinu, Rajiv Rao et al.
Binaural Selective Attention Model for Target Speaker Extraction
Hanyu Meng, Qiquan Zhang, Xiangyu Zhang et al.
Bird Whisperer: Leveraging Large Pre-trained Acoustic Model for Bird Call Classification
Muhammad Umer Sheikh, Hassan Abid, Bhuiyan Sanjid Shafique et al.
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation
Hui-Peng Du, Ye-Xin Lu, Yang Ai et al.
Blind Zero-Shot Audio Restoration: A Variational Autoencoder Approach for Denoising and Inpainting
Veranika Boukun, Jakob Drefs, Jörg Lücke
Boosting Cross-Corpus Speech Emotion Recognition using CycleGAN with Contrastive Learning
Jincen Wang, Yan Zhao, Cheng Lu et al.
Boosting CTC-based ASR using inter-layer attention-based CTC loss
Keigo Hojo, Yukoh Wakabayashi, Kengo Ohta et al.
Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding
Takafumi Moriya, Takanori Ashihara, Masato Mimura et al.
Boosting the Transferability of Adversarial Examples with Gradient-Aligned Ensemble Attack for Speaker Recognition
Zhuhai Li, Jie Zhang, Wu Guo et al.
Bridging Child-Centered Speech Language Identification and Language Diarization via Phonetics
Yujia Wang, Hexin Liu, Leibny Paola Garcia