2016 EMNLP EMNLP 2016

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding