Research Explorer

Autoregressive cross-interlocutor attention scores meaningfully capture conversational dynamics

Matthew McNeill, Rivka Levitan

2024 INTERSPEECH

AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning

Jongsuk Kim, Jiwon Shin, Junmo Kim

2024 INTERSPEECH

AVR: synergizing foundation models for audio-visual humor detection

Sarthak Sharma, Orchid Chetia Phukan, Drishti Singh et al.

2024 INTERSPEECH

Backchannel prediction, based on who, when and what

Yo-Han Park, Wencke Liermann, Yong-Seok Choi et al.

2024 INTERSPEECH

Balanced-Wav2Vec: Enhancing Stability and Robustness of Representation Learning Through Sample Reweighting Techniques

Mun-Hak Lee, Jae-Hong Lee, DoHee Kim et al.

2024 INTERSPEECH

Balance, Multiple Augmentation, and Re-synthesis: A Triad Training Strategy for Enhanced Audio Deepfake Detection

Thien-Phuc Doan, Long Nguyen-Vu, Kihun Hong et al.

2024 INTERSPEECH

Beam-search SIEVE for low-memory speech recognition

Martino Ciaperoni, Athanasios Katsamanis, Aristides Gionis et al.

2024 INTERSPEECH

Behavioral evidence for higher speech rate convergence following natural than artificial time altered speech

Jérémy Giroud, Jessica Lei, Kirsty Phillips et al.

2024 INTERSPEECH

Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models

Ruchao Fan, Natarajan Balaji Shankar, Abeer Alwan

2024 INTERSPEECH

BESST Dataset: A Multimodal Resource for Speech-based Stress Detection and Analysis

Jan Pešán, Vojtěch Juřík, Martin Karafiát et al.

2024 INTERSPEECH

Beyond Binary: Multiclass Paraphasia Detection with Generative Pretrained Transformers and End-to-End Models

Matthew Perez, Aneesha Sampath, Minxue Niu et al.

2024 INTERSPEECH

Beyond graphemes and phonemes: continuous phonological features in neural text-to-speech synthesis

Christina Tånnander, Shivam Mehta, Jonas Beskow et al.

2024 INTERSPEECH

Beyond Levenshtein: Leveraging Multiple Algorithms for Robust Word Error Rate Computations And Granular Error Classifications

Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann

2024 INTERSPEECH

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement

Wangyou Zhang, Kohei Saijo, Jee-weon Jung et al.

2024 INTERSPEECH

Bilingual and Code-switching TTS Enhanced with Denoising Diffusion Model and GAN

Huai-Zhe Yang, Chia-Ping Chen, Shan-Yun He et al.

2024 INTERSPEECH

Bilingual Rhotic Production Patterns: A Generational Comparison of Spanish-English Bilingual Speakers in Canada

Ioana Colgiu, Laura Spinu, Rajiv Rao et al.

2024 INTERSPEECH

Binaural Selective Attention Model for Target Speaker Extraction

Hanyu Meng, Qiquan Zhang, Xiangyu Zhang et al.

2024 INTERSPEECH

Bird Whisperer: Leveraging Large Pre-trained Acoustic Model for Bird Call Classification

Muhammad Umer Sheikh, Hassan Abid, Bhuiyan Sanjid Shafique et al.

2024 INTERSPEECH

BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation

Hui-Peng Du, Ye-Xin Lu, Yang Ai et al.

2024 INTERSPEECH

Blind Zero-Shot Audio Restoration: A Variational Autoencoder Approach for Denoising and Inpainting

Veranika Boukun, Jakob Drefs, Jörg Lücke

2024 INTERSPEECH

Boosting Cross-Corpus Speech Emotion Recognition using CycleGAN with Contrastive Learning

Jincen Wang, Yan Zhao, Cheng Lu et al.

2024 INTERSPEECH

Boosting CTC-based ASR using inter-layer attention-based CTC loss

Keigo Hojo, Yukoh Wakabayashi, Kengo Ohta et al.

2024 INTERSPEECH

Boosting Hybrid Autoregressive Transducer-based ASR with Internal Acoustic Model Training and Dual Blank Thresholding

Takafumi Moriya, Takanori Ashihara, Masato Mimura et al.

2024 INTERSPEECH

Boosting the Transferability of Adversarial Examples with Gradient-Aligned Ensemble Attack for Speaker Recognition

Zhuhai Li, Jie Zhang, Wu Guo et al.

2024 INTERSPEECH

Bridging Child-Centered Speech Language Identification and Language Diarization via Phonetics

Yujia Wang, Hexin Liu, Leibny Paola Garcia

2024 INTERSPEECH

Papers