Exploring DCN-like architecture for fast image generation with arbitrary resolution

Shuai Wang; Zexian Li; Tianhui Song; Xubin Li; Tiezheng Ge; Bo Zheng; Limin Wang

2024 NIPS NeurIPS 2024

Exploring DCN-like architecture for fast image generation with arbitrary resolution

Abstract

Arbitrary-resolution image generation still remains a challenging task in AIGC, as it requires handling varying resolutions and aspect ratios while maintaining high visual quality. Existing transformer-based diffusion methods suffer from quadratic computation cost and limited resolution extrapolation capabilities, making them less effective for this task. In this paper, we propose FlowDCN, a purely convolution-based generative model with linear time and memory complexity, that can efficiently generate high-quality images at arbitrary resolutions. Equipped with a new design of learnable group-wise deformable convolution block, our FlowDCN yields higher flexibility and capability to handle different resolutions with a single model.FlowDCN achieves the state-of-the-art 4.30 sFID on $256\times256$ ImageNet Benchmark and comparable resolution extrapolation results, surpassing transformer-based counterparts in terms of convergence speed (only $\frac{1}{5}$ images), visual quality, parameters ($8\%$ reduction) and FLOPs ($20\%$ reduction). We believe FlowDCN offers a promising solution to scalable and flexible image synthesis.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — arbitrary resolution

🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Shuai Wang , Zexian Li , Tianhui Song , Xubin Li , Tiezheng Ge , Bo Zheng , Limin Wang

Topics

Deep Learning > Architectures > Neural Networks Deep Learning > Models > Diffusion Models Deep Learning > Models > Generative Models Computer Vision > Generation > Image Generation Deep Learning > Optimization & Theory > Efficient Computing Deep Learning > Architectures > Convolutional Neural Networks

Keywords

image generation deformable convolution generative model diffusion model convolutional neural network convolutional network arbitrary resolution resolution extrapolation

Download PDF

Related papers

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers 2024

Training for Stable Explanation for Free 2024

NeuralSolver: Learning Algorithms For Consistent and Efficient Extrapolation Across General Tasks 2024

Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch 2024

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence 2024