2024 INTERSPEECH INTERSPEECH 2024

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Abstract

We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR) Challenge, datasets, and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in meeting scenarios, with single-channel and known-geometry multi-channel tracks, using a single device. We launch two new datasets: First, a benchmark dataset of 280 English meetings, averaging 6 minutes each, capturing a broad spectrum of acoustic and conversational patterns across 30 rooms with 4-8 attendees. Second, a 1000-hour simulated training dataset, synthesized for real-world generalization, incorporating 15,000 real acoustic transfer functions. The NOTSOFAR-1 Challenge aims to advance research in the field of DASR, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmark datasets.

🌉 Interdisciplinary Bridge — Machine Learning and Speech & Audio
🧭 Keyword Pioneer — distant meeting transcription
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Deep Learning, Healthcare & Medicine, Interdisciplinary, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio