2017 CVPR CVPR 2017

Link the Head to the "Beak": Zero Shot Learning From Noisy Text Description at Part Precision

Abstract

In this paper, we study learning visual classifiers from unstructured text description at part precision with no training images. We show that visual text terms can be encouraged to attend to its relevant parts, while image connections to non-visual text terms vanishes without any supervision. This learning process enables terms like "peak" to be linked to parts like only head for instance , while non-visual terms like "migrate" not to affect classifier prediction without part-text annotation. Images are encoded by a part-based CNN that detect bird parts and learn part-specific learning representation. Part-based visual classifiers are predicted from text descriptions of unseen visual classifiers to facilitate classification without training images (also known as zero-shot recognition ). We performed our experiments on CUB200 dataset and improves the zero-shot recognition results from 34.2% to 44.0%. We also created a large scale benchmark on 404 North American Bird Images with text descriptions, where we also showed that our method outperforming existing methods.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision and Deep Learning and Machine Learning
📈 Trend Setter — Zero-Shot Learning
🧭 Keyword Pioneer — text description
🐣 Hot Topic Early Bird — zero-shot learning
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio