Deformable One-shot Face Stylization via DINO Semantic Guidance

Yang Zhou; Zichong Chen; Hui Huang

2024 CVPR CVPR 2024

Deformable One-shot Face Stylization via DINO Semantic Guidance

Abstract

This paper addresses the complex issue of one-shot face stylization focusing on the simultaneous consideration of appearance and structure where previous methods have fallen short. We explore deformation-aware face stylization that diverges from traditional single-image style reference opting for a real-style image pair instead. The cornerstone of our method is the utilization of a self-supervised vision transformer specifically DINO-ViT to establish a robust and consistent facial structure representation across both real and style domains. Our stylization process begins by adapting the StyleGAN generator to be deformation-aware through the integration of spatial transformers (STN). We then introduce two innovative constraints for generator fine-tuning under the guidance of DINO semantics: i) a directional deformation loss that regulates directional vectors in DINO space and ii) a relative structural consistency constraint based on DINO token self-similarities ensuring diverse generation. Additionally style-mixing is employed to align the color generation with the reference minimizing inconsistent correspondences. This framework delivers enhanced deformability for general one-shot face stylization achieving notable efficiency with a fine-tuning duration of approximately 10 minutes. Extensive qualitative and quantitative comparisons demonstrate our superiority over state-of-the-art one-shot face stylization methods. Code is available at https://github.com/zichongc/DoesFS

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning and Machine Learning

🧭 Keyword Pioneer — face stylization

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yang Zhou , Zichong Chen , Hui Huang

Topics

Machine Learning > Learning Types > Self-Supervised Learning Deep Learning > Architectures > Transformers Deep Learning > Models > Generative Models Computer Vision > Generation > Image Generation

Keywords

vision transformer image generation self-supervised learning face stylization stylegan generator deformable generation

Download PDF

Related papers

DUSt3R: Geometric 3D Vision Made Easy 2024

Bezier Everywhere All at Once: Learning Drivable Lanes as Bezier Graphs 2024

NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows 2024

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization 2024

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models 2024