Dragonite: Single-Step Drag-based Image Editing with Geometric-Semantic Guidance
Abstract
Recent interactive image editing methods have made notable progress, yet achieving both precise control and real-time performance remains a challenge. Drag-based methods offer detailed geometric manipulations but suffer from low image fidelity and slow runtime performance, while text-based approaches enhance realism but limit precise and pixel-level control. To overcome these limitations, we introduce Dragonite, an intuitive and efficient framework that seamlessly unifies geometric and semantic manipulation for image editing. Dragonite leverages a Dual Guidance Module that fuses geometric deformation vectors with semantic guidance cues into a joint representation space, ensuring precise manipulation of both content and semantics. By combining a single-step latent optimization mechanism with a enhanced interpolation method, Dragonite achieves efficient interactive image editing while maintaining high precision through integrated geometric and semantic guidance. Extensive evaluations on the DragBench benchmark demonstrate that Dragonite effectively resolves the trade-off between speed and accuracy, enabling real-time, high-fidelity image editing.