Knowledge Acquisition for Human-In-The-Loop Image Captioning

Ervine Zheng; Qi Yu; Rui Li; Pengcheng Shi; Anne Haake

2023 AISTATS AISTATS 2023

Knowledge Acquisition for Human-In-The-Loop Image Captioning

Abstract

Image captioning offers a computational process to understand the semantics of images and convey them using descriptive language. However, automated captioning models may not always generate satisfactory captions due to the complex nature of the images and the quality/size of the training data. We propose an interactive captioning framework to improve machine-generated captions by keeping humans in the loop and performing an online-offline knowledge acquisition (KA) process. In particular, online KA accepts a list of keywords specified by human users and fuses them with the image features to generate a readable sentence that captures the semantics of the image. It leverages a multimodal conditioned caption completion mechanism to ensure the appearance of all user-input keywords in the generated caption. Offline KA further learns from the user inputs to update the model and benefits caption generation for unseen images in the future. It is built upon a Bayesian transformer architecture that dynamically allocates neural resources and supports uncertainty-aware model updates to mitigate overfitting. Our theoretical analysis also proves that Offline KA automatically selects the best model capacity to accommodate the newly acquired knowledge. Experiments on real-world data demonstrate the effectiveness of the proposed framework.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — bayesian transformer

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ervine Zheng , Qi Yu , Rui Li , Pengcheng Shi , Anne Haake

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Machine Learning > Learning Types > Active Learning Machine Learning > Optimization & Theory > Bayesian Inference Computer Vision > Generation > Image Captioning Deep Learning > Learning Types > Multi-Modal Learning

Keywords

bayesian inference multimodal learning image captioning human-in-the-loop learning knowledge acquisition bayesian transformer

Download PDF

Related papers

Safe Sequential Testing and Effect Estimation in Stratified Count Data 2023

Who Should Predict? Exact Algorithms For Learning to Defer to Humans 2023

An Online and Unified Algorithm for Projection Matrix Vector Multiplication with Application to Empirical Risk Minimization 2023

Stochastic Gradient Descent-Ascent: Unified Theory and New Efficient Methods 2023

The Ordered Matrix Dirichlet for State-Space Models 2023