2024 COLING COLING 2024

Towards Multi-modal Sarcasm Detection via Disentangled Multi-grained Multi-modal Distilling

Abstract

AbstractMulti-modal sarcasm detection aims to identify whether a given sample with multi-modal information (i.e., text and image) is sarcastic, which has received increasing attention due to the rapid growth of multi-modal posts on modern social media. However, mainstream models process the input of each modality in a holistic manner, resulting in redundant and unrefined information. Moreover, the representations of different modalities are entangled in one common latent space to perform complex cross-modal interactions, neglecting the heterogeneity and distribution gap of different modalities. To address these issues, we propose a novel framework DMMD (short for Disentangled Multi-grained Multi-modal Distilling) for multi-modal sarcasm detection, which conducts multi-grained knowledge distilling (i.e., intra-subspace and inter-subspace) based on the disentangled multi-modal representations. Concretely, the representations of each modality are disentangled explicitly into modality-agnostic/specific subspaces. Then we transfer cross-modal knowledge by conducting intra-subspace knowledge distilling in a self-adaptive pattern. We also apply mutual learning to regularize the underlying inter-subspace consistency. Extensive experiments on a commonly used benchmark demonstrate the efficacy of our DMMD over cutting-edge methods. More encouragingly, visualization results indicate the multi-modal representations display meaningful distributional patterns, and we hope it will be helpful for the community of multi-modal knowledge transfer.

๐ŸŒ‰ Interdisciplinary Bridge โ€” Artificial Intelligence and Deep Learning and Machine Learning
๐Ÿงญ Keyword Pioneer โ€” knowledge distilling
๐Ÿ Cross-Pollinator โ€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio