NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Alon Vinnikov; Amir Ivry; Aviv Hurvitz; Igor Abramovski; Sharon Koubi; Ilya Gurvich; Shai Peer; Xiong Xiao; Benjamin Martinez Elizalde; Naoyuki Kanda; Xiaofei Wang; Shalev Shaer; Stav Yagev; Yossi Asher; Sunit Sivasankaran; Yifan Gong; Min Tang; Huaming Wang; Eyal Krupka

2024 INTERSPEECH INTERSPEECH 2024

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Abstract

We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR) Challenge, datasets, and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in meeting scenarios, with single-channel and known-geometry multi-channel tracks, using a single device. We launch two new datasets: First, a benchmark dataset of 280 English meetings, averaging 6 minutes each, capturing a broad spectrum of acoustic and conversational patterns across 30 rooms with 4-8 attendees. Second, a 1000-hour simulated training dataset, synthesized for real-world generalization, incorporating 15,000 real acoustic transfer functions. The NOTSOFAR-1 Challenge aims to advance research in the field of DASR, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmark datasets.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio

🧭 Keyword Pioneer — distant meeting transcription

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio

Authors

Alon Vinnikov , Amir Ivry , Aviv Hurvitz , Igor Abramovski , Sharon Koubi , Ilya Gurvich , Shai Peer , Xiong Xiao , Benjamin Martinez Elizalde , Naoyuki Kanda , Xiaofei Wang , Shalev Shaer , Stav Yagev , Yossi Asher , Sunit Sivasankaran , Yifan Gong , Min Tang , Huaming Wang , Eyal Krupka

Topics

Machine Learning > Optimization & Theory > Optimization Speech & Audio > Recognition > Automatic Speech Recognition Speech & Audio > Analysis > Speaker Verification

Keywords

speaker diarization benchmark dataset distant meeting transcription multi-channel automatic speech recognition far-field audio data-driven method

Download PDF

Related papers

Reshape Dimensions Network for Speaker Recognition 2024

RevRIR: Joint Reverberant Speech and Room Impulse Response Embedding using Contrastive Learning with Application to Room Shape Classification 2024

Mixed Children/Adult/Childrenized Fine-Tuning for Children’s ASR: How to Reduce Age Mismatch and Speaking Style Mismatch 2024

Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions 2024

K-means and hierarchical clustering of f0 contours 2024