Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Jiapeng Zhu; Ceyuan Yang; Kecheng Zheng; Yinghao Xu; Zifan Shi; Yifei Zhang; Qifeng Chen; Yujun Shen

2025 CVPR CVPR 2025

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis

Abstract

Due to the difficulty in scaling up, generative adversarial networks (GANs) seem to be falling out of grace with the task of text-conditioned image synthesis. Sparsely activated mixture-of-experts (MoE) has recently been demonstrated as a valid solution to training large-scale models with limited resources. Inspired by this, we present Aurora, a GAN-based text-to-image generator that employs a collection of experts to learn feature processing, together with a sparse router to adaptively select the most suitable expert for each feature point. We adopt a two-stage training strategy, which first learns a base model at 64x64 resolution followed by an upsampler to produce 512x512 images. Trained with only public data, our approach encouragingly closes the performance gap between GANs and industry-level diffusion models, maintaining a fast inference speed. We release the code and checkpoints \href https://github.com/zhujiapeng/Aurora here to facilitate the community for further development.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — text-conditioned image synthesis

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiapeng Zhu , Ceyuan Yang , Kecheng Zheng , Yinghao Xu , Zifan Shi , Yifei Zhang , Qifeng Chen , Yujun Shen

Topics

Machine Learning > Learning Types > Adversarial Learning Deep Learning > Architectures > Neural Networks Deep Learning > Models > Generative Models Computer Vision > Generation > Image Generation Artificial Intelligence > Core AI > Computer Vision

Keywords

image generation text-to-image synthesis feature processing generative adversarial network mixture of expert sparse activation text-conditioned image synthesis

Download PDF

Related papers

AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos 2025

SeriesBench: A Benchmark for Narrative-Driven Drama Series Understanding 2025

FADE: Frequency-Aware Diffusion Model Factorization for Video Editing 2025

Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning 2025

Reversible Decoupling Network for Single Image Reflection Removal 2025