MBTI: Metric-Based Textual Inversion for Fine-Grained Image Generation

Byungkwan Chae; Youngjae Choi; Heewon Kim

2026 WACV WACV 2026

MBTI: Metric-Based Textual Inversion for Fine-Grained Image Generation

Abstract

Diffusion-based data augmentation increases semantic diversity but often fails to preserve class-defining cues in fine-grained categories, which harms few-shot classifica- tion. We present Metric-Based Textual Inversion (MBTI), a textual-inversion training scheme that explicitly lever- ages inter-class relations while learning a pseudo-token per class from a few images. At each iteration, MBTI se- lects top-K support embeddings from previously learned classes and pushes the current embedding token away from these supports using a diffusion-based distance computed on denoiser noise predictions under matched latents and timesteps. This metric-aware objective enlarges inter-class margins while retaining class-specific attributes, yielding more representative synthetic samples Across few-shot, fine- grained classification settings, MBTI consistently improves accuracy over textual inversion-based baselines and pro- duces sharper, more discriminative details.

🌉 Interdisciplinary Bridge — Deep Learning and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio