2024 IJCAI IJCAI 2024

GEM: Generating Engaging Multimodal Content

Abstract

Generating engaging multimodal content is a key objective in numerous applications, such as the creation of online advertisements that captivate user attention through a synergy of images and text. In this paper, we introduce GEM, a novel framework engineered for the generation of engaging multimodal image-text posts. The GEM framework operates in two primary phases. Initially, GEM integrates a pre-trained engagement discriminator with a technique for deriving an effective continuous prompt tailored for the stable diffusion model. Subsequently, GEM unveils an iterative algorithm dedicated to producing coherent and compelling image-sentence pairs centered around a specified topic of interest. Through a combination of experimental analysis and human evaluations, we establish that the image-sentence pairs generated by GEM not only surpass several established baselines in terms of engagement but also in achieving superior alignment.

๐ŸŒ‰ Interdisciplinary Bridge โ€” Data Science & Analytics and Machine Learning
๐Ÿฃ Hot Topic Early Bird โ€” multimodal generation
๐Ÿ Cross-Pollinator โ€” Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio