2020 COLING COLING 2020

Hitachi at SemEval-2020 Task 8: Simple but Effective Modality Ensemble for Meme Emotion Recognition

Abstract

AbstractUsers of social networking services often share their emotions via multi-modal content, usually images paired with text embedded in them. SemEval-2020 task 8, Memotion Analysis, aims at automatically recognizing these emotions of so-called internet memes. In this paper, we propose a simple but effective Modality Ensemble that incorporates visual and textual deep-learning models, which are independently trained, rather than providing a single multi-modal joint network. To this end, we first fine-tune four pre-trained visual models (i.e., Inception-ResNet, PolyNet, SENet, and PNASNet) and four textual models (i.e., BERT, GPT-2, Transformer-XL, and XLNet). Then, we fuse their predictions with ensemble methods to effectively capture cross-modal correlations. The experiments performed on dev-set show that both visual and textual features aided each other, especially in subtask-C, and consequently, our system ranked 2nd on subtask-C.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning
🧭 Keyword Pioneer — pre-trained vision model
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio