FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Anwesan Pal; Sahil Wadhwa; Ayush Jaiswal; Xu Zhang; Yue Wu; Rakesh Chada; Pradeep Natarajan; Henrik I. Christensen

2023 ICCV ICCV 2023

FashionNTM: Multi-turn Fashion Image Retrieval via Cascaded Memory

Abstract

Multi-turn textual feedback-based fashion image retrieval focuses on a real-world setting, where users can iteratively provide information to refine retrieval results until they find an item that fits all their requirements. In this work, we present a novel memory-based method, called FashionNTM, for such a multi-turn system. Our framework incorporates a new Cascaded Memory Neural Turing Machine (CM-NTM) approach for implicit state management, thereby learning to integrate information across all past turns to retrieve new images, for a given turn. Unlike vanilla Neural Turing Machine (NTM), our CM-NTM operates on multiple inputs, which interact with their respective memories via individual read and write heads, to learn complex relationships. Extensive evaluation results show that our proposed method outperforms the previous state-of-the-art algorithm by 50.5%, on Multi-turn FashionIQ -- the only existing multi-turn fashion dataset currently, in addition to having a relative improvement of 12.6% on Multi-turn Shoes -- an extension of the single-turn Shoes dataset that we created in this work. Further analysis of the model in a real-world interactive setting demonstrates two important capabilities of our model -- memory retention across turns, and agnosticity to turn order for non-contradictory feedback. Finally, user study results show that images retrieved by FashionNTM were favored by 83.1% over other multi-turn models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Vision

🧭 Keyword Pioneer — fashion recognition

🐣 Hot Topic Early Bird — multi-turn dialogue

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Anwesan Pal , Sahil Wadhwa , Ayush Jaiswal , Xu Zhang , Yue Wu , Rakesh Chada , Pradeep Natarajan , Henrik I. Christensen

Topics

Artificial Intelligence > Core AI > Memory Computer Vision > Analysis > Image Retrieval Computer Vision > Generation > Image Retrieval

Keywords

image retrieval multi-turn dialogue memory network fashion recognition

Download PDF

Related papers

PVT++: A Simple End-to-End Latency-Aware Visual Tracking Framework 2023

Periodically Exchange Teacher-Student for Source-Free Object Detection 2023

Stable and Causal Inference for Discriminative Self-supervised Deep Visual Representations 2023

Minimal Solutions to Uncalibrated Two-view Geometry with Known Epipoles 2023

3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for Robust 6D Pose Estimation 2023