Grounding language acquisition by training semantic parsers using captioned videos

Candace Ross; Andrei Barbu; Yevgeni Berzak; Battushig Myanganbayar; Boris Katz

2018 EMNLP EMNLP 2018

Grounding language acquisition by training semantic parsers using captioned videos

Abstract

AbstractWe develop a semantic parser that is trained in a grounded setting using pairs of videos captioned with sentences. This setting is both data-efficient, requiring little annotation, and similar to the experience of children where they observe their environment and listen to speakers. The semantic parser recovers the meaning of English sentences despite not having access to any annotated sentences. It does so despite the ambiguity inherent in vision where a sentence may refer to any combination of objects, object properties, relations or actions taken by any agent in a video. For this task, we collected a new dataset for grounded language acquisition. Learning a grounded semantic parser — turning sentences into logical forms using captioned videos — can significantly expand the range of data that parsers can be trained on, lower the effort of training a semantic parser, and ultimately lead to a better understanding of child language acquisition.

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning and Natural Language Processing

📈 Trend Setter — Multimodal NLP

🧭 Keyword Pioneer — video caption

🐣 Hot Topic Early Bird — language acquisition

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Candace Ross , Andrei Barbu , Yevgeni Berzak , Battushig Myanganbayar , Boris Katz

Topics

Machine Learning > Learning Types > Self-Supervised Learning Interdisciplinary > Linguistics > Computational Linguistics Natural Language Processing > Applications > Semantic Parsing Natural Language Processing > Resources & Methods > Multimodal NLP

Keywords

visual language grounding language acquisition logical form semantic parser vision language grounded language video caption grounded language acquisition captioned video

Download PDF

Related papers

Speeding Up Neural Machine Translation Decoding by Cube Pruning 2018

Limitations in learning an interpreted language with recurrent models 2018

Results of the sixth edition of the BioASQ Challenge 2018

Neural Segmental Hypergraphs for Overlapping Mention Recognition 2018

Hybrid Neural Attention for Agreement/Disagreement Inference in Online Debates 2018