Building Multimodal Simulations for Natural Language

James Pustejovsky; Nikhil Krishnaswamy

2017 EACL EACL 2017

Building Multimodal Simulations for Natural Language

Abstract

AbstractIn this tutorial, we introduce a computational framework and modeling language (VoxML) for composing multimodal simulations of natural language expressions within a 3D simulation environment (VoxSim). We demonstrate how to construct voxemes, which are visual object representations of linguistic entities. We also show how to compose events and actions over these objects, within a restricted domain of dynamics. This gives us the building blocks to simulate narratives of multiple events or participate in a multimodal dialogue with synthetic agents in the simulation environment. To our knowledge, this is the first time such material has been presented as a tutorial within the CL community.This will be of relevance to students and researchers interested in modeling actionable language, natural language communication with agents and robots, spatial and temporal constraint solving through language, referring expression generation, embodied cognition, as well as minimal model creation.Multimodal simulation of language, particularly motion expressions, brings together a number of existing lines of research from the computational linguistic, semantics, robotics, and formal logic communities, including action and event representation (Di Eugenio, 1991), modeling gestural correlates to NL expressions (Kipp et al., 2007; Neff et al., 2008), and action event modeling (Kipper and Palmer, 2000; Yang et al., 2015). We combine an approach to event modeling with a scene generation approach akin to those found in work by (Coyne and Sproat, 2001; Siskind, 2011; Chang et al., 2015). Mapping natural language expressions through a formal model and a dynamic logic interpretation into a visualization of the event described provides an environment for grounding concepts and referring expressions that is interpretable by both a computer and a human user. This opens a variety of avenues for humans to communicate with computerized agents and robots, as in (Matuszek et al., 2013; Lauria e

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — embodied cognition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

James Pustejovsky , Nikhil Krishnaswamy

Topics

Artificial Intelligence > Core AI > Agent Systems Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Generation > Dialogue Systems Artificial Intelligence > Core AI > Robotics Artificial Intelligence > Core AI > Language

Keywords

multimodal learning referring expression natural language understanding language grounding dialogue system embodied cognition

Download PDF

Related papers

Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages 2017

Learning and Knowledge Transfer with Memory Networks for Machine Comprehension 2017

Is this a Child, a Girl or a Car? Exploring the Contribution of Distributional Similarity to Learning Referential Word Meanings 2017

Building Web-Interfaces for Vector Semantic Models with the WebVectors Toolkit 2017

Assessing Convincingness of Arguments in Online Debates with Limited Number of Features 2017