Proactive Hearing Assistants that Isolate Egocentric Conversations

Guilin Hu; Malek Itani; Tuochao Chen; Shyamnath Gollakota

2025 EMNLP EMNLP 2025

Proactive Hearing Assistants that Isolate Egocentric Conversations

Abstract

AbstractWe introduce proactive hearing assistants that automatically identify and separate the wearer’s conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer’s self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Deep Learning and Machine Learning and Speech & Audio

🧭 Keyword Pioneer — egocentric audio

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Security & Privacy, Speech & Audio

Authors

Guilin Hu , Malek Itani , Tuochao Chen , Shyamnath Gollakota

Topics

Machine Learning > Application Areas > Efficient Computing Deep Learning > Techniques > Model Architecture Computer Science > Applications > Robotics Speech & Audio > Processing > Speech Enhancement Artificial Intelligence > Core AI > Speech Processing

Keywords

speech separation streaming model conversation analysis binaural audio conversational dynamics egocentric audio hearing assistant hearing assistance

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025