Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning

Liqi Yan; Cheng Han; Zenglin Xu; Dongfang Liu; Qifan Wang

2023 IJCAI IJCAI 2023

Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning

Abstract

Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches have been introduced to learn fixed textual or visual prompts while freezing the pre-trained model in downstream tasks. Despite the effectiveness of prompt tuning, what do those learnable prompts learn remains unexplained. In this work, we explore whether prompts in the fine-tuning can learn knowledge-aware prompts from the pre-training, by designing two different sets of prompts in pre-training and fine-tuning phases respectively. Specifically, we present a Video-Language Prompt tuning (VL-Prompt) approach for video captioning, which first efficiently pre-train a video-language model to extract key information (e.g., actions and objects) with flexibly generated Knowledge-Aware Prompt (KAP). Then, we design a Video-Language Prompt (VLP) to transfer the knowledge from the knowledge-aware prompts and fine-tune the model to generate full captions. Experimental results show the superior performance of our approach over several state-of-the-art baselines. We further demonstrate that the video-language prompts are well learned from the knowledge-aware prompts.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Natural Language Processing

🧭 Keyword Pioneer — knowledge-aware prompt

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Liqi Yan , Cheng Han , Zenglin Xu , Dongfang Liu , Qifan Wang

Topics

Artificial Intelligence > Core AI > Multimodal Learning Natural Language Processing > Generation > Text Generation

Keywords

video captioning vision-language model prompt tuning knowledge-aware prompt

Download PDF

Related papers

Analyzing Intentional Behavior in Autonomous Agents under Uncertainty 2023

Deep Hashing-based Dynamic Stock Correlation Estimation via Normalizing Flow 2023

U-Match: Two-view Correspondence Learning with Hierarchy-aware Local Context Aggregation 2023

Artificial Agents Inspired by Human Motivation Psychology for Teamwork in Hazardous Environments 2023

Proportionally Fair Online Allocation of Public Goods with Predictions 2023