Building a Bridge: A Method for Image-Text Sarcasm Detection Without Pretraining on Image-Text Data

Xinyu Wang; Xiaowen Sun; Tan Yang; Hongbo Wang

2020 EMNLP EMNLP 2020

Building a Bridge: A Method for Image-Text Sarcasm Detection Without Pretraining on Image-Text Data

Abstract

AbstractSarcasm detection in social media with text and image is becoming more challenging. Previous works of image-text sarcasm detection were mainly to fuse the summaries of text and image: different sub-models read the text and image respectively to get the summaries, and fuses the summaries. Recently, some multi-modal models based on the architecture of BERT are proposed such as ViLBERT. However, they can only be pretrained on the image-text data. In this paper, we propose an image-text model for sarcasm detection using the pretrained BERT and ResNet without any further pretraining. BERT and ResNet have been pretrained on much larger text or image data than image-text data. We connect the vector spaces of BERT and ResNet to utilize more data. We use the pretrained Multi-Head Attention of BERT to model the text and image. Besides, we propose a 2D-Intra-Attention to extract the relationships between words and images. In experiments, our model outperforms the state-of-the-art model.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🧭 Keyword Pioneer — image-text sarcasm detection

🐣 Hot Topic Early Bird — vision language

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xinyu Wang , Xiaowen Sun , Tan Yang , Hongbo Wang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Deep Learning > Architectures > Transformers Natural Language Processing > Applications > Sentiment Analysis Deep Learning > Learning Types > Multi-Modal Learning

Keywords

sentiment analysis text classification attention mechanism multimodal learning sarcasm detection bert model multi-modal learning visual reasoning image processing vision language image-text sarcasm detection

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020