2026 WACV WACV 2026

TaxonRL: Reinforcement Learning with Intermediate Rewards for Interpretable Fine-Grained Visual Reasoning

Abstract

Traditional vision-language models struggle with fine-grained taxonomic reasoning, particularly distinguishing between visually similar species within the same genus or family. We propose a reinforcement learning approach using Group Relative Policy Optimization with intermediate rewards that decompose the reasoning process into hierarchical taxonomic predictions. Our method incentivizes models to explicitly reason about species-level, genus-level, and family-level features before making final classifications. This structured approach is designed not only to boost accuracy but also to yield a transparent, verifiable decision-making process. On the challenging Birds-to-Words dataset, our approach achieves 91.7% accuracy on same-species verification, matching human performance (77.3%) while generating interpretable reasoning traces. We demonstrate cross-domain generalization showing substantial gains on primate verification while generating explainable traces. The intermediate reward mechanism shows that structured biological reasoning provides a powerful framework for fine-grained visual discrimination.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Reinforcement Learning
🧭 Keyword Pioneer — intermediate reward
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio