DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Zhuo Chen; Yufeng Huang; Jiaoyan Chen; Yuxia Geng; Wen Zhang; Yin Fang; Jeff Z. Pan; Huajun Chen

2023 AAAI AAAI 2023

DUET: Cross-Modal Semantic Grounding for Contrastive Zero-Shot Learning

Abstract

Abstract Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the shortage of fine-grained annotations, but also the attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that our DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark. Its components are effective and its predictions are interpretable.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning

🧭 Keyword Pioneer — cross-modal semantic grounding

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Zhuo Chen , Yufeng Huang , Jiaoyan Chen , Yuxia Geng , Wen Zhang , Yin Fang , Jeff Z. Pan , Huajun Chen

Topics

Artificial Intelligence > Core AI > Multimodal Learning Machine Learning > Learning Types > Contrastive Learning Artificial Intelligence > Learning Paradigms > Zero-Shot Learning Deep Learning > Learning Types > Contrastive Learning Deep Learning > Models > Transformers Deep Learning > Learning Types > Zero-Shot Learning

Keywords

contrastive learning zero-shot learning multi-task learning cross-modal learning attribute recognition pre-trained language model semantic grounding cross-modal semantic grounding attribute-level contrastive learning

Download PDF

Related papers

A Model-Agnostic Heuristics for Selective Classification 2023

Tackling Safe and Efficient Multi-Agent Reinforcement Learning via Dynamic Shielding (Student Abstract) 2023

Head-Free Lightweight Semantic Segmentation with Linear Transformer 2023

Hierarchical ConViT with Attention-Based Relational Reasoner for Visual Analogical Reasoning 2023

Deep Spiking Neural Networks with High Representation Similarity Model Visual Pathways of Macaque and Mouse 2023