Attention Beam: An Image Captioning Approach (Student Abstract)

Anubhav Shrimal; Tanmoy Chakraborty

2021 AAAI AAAI 2021

Attention Beam: An Image Captioning Approach (Student Abstract)

Abstract

Abstract The aim of image captioning is to generate textual description of a given image. Though seemingly an easy task for humans, it is challenging for machines as it requires the ability to comprehend the image (computer vision) and consequently generate a human-like description for the image (natural language understanding). In recent times, encoder-decoder based architectures have achieved state-of-the-art results for image captioning. Here, we present a heuristic of beam search on top of the encoder-decoder based architecture that gives better quality captions on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Anubhav Shrimal , Tanmoy Chakraborty

Topics

Computer Vision > Generation > Image Captioning Artificial Intelligence > Core AI > Computer Vision Deep Learning > Techniques > Attention

Keywords

computer vision attention mechanism image captioning natural language understanding beam search encoder-decoder architecture

Download PDF

Related papers

Contextual Conditional Reasoning 2021

Movie Summarization via Sparse Graph Construction 2021

Text Analysis for Understanding Symptoms of Social Anxiety in Student Veterans 2021

ERNIE-ViL: Knowledge Enhanced Vision-Language Representations through Scene Graphs 2021

Safety Assurance for Systems with Machine Learning Components 2021