Referring to Screen Texts with Voice Assistants

Shruti Bhargava; Anand Dhoot; Ing-marie Jonsson; Hoang Long Nguyen; Alkesh Patel; Hong Yu; Vincent Renkens

2023 ACL ACL 2023

Referring to Screen Texts with Voice Assistants

Abstract

AbstractVoice assistants help users make phone calls, send messages, create events, navigate and do a lot more. However assistants have limited capacity to understand their users’ context. In this work, we aim to take a step in this direction. Our work dives into a new experience for users to refer to phone numbers, addresses, email addresses, urls, and dates on their phone screens. We focus on reference understanding, which is particularly interesting when, similar to visual grounding, there are multiple similar texts on screen. We collect a dataset and propose a lightweight general purpose model for this novel experience. Since consuming pixels directly is expensive, our system is designed to rely only on text extracted from the UI. Our model is modular, offering flexibility, better interpretability and efficient run time memory.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — reference understanding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio