2020 AACL AACL 2020

BCTH: A Novel Text Hashing Approach via Bayesian Clustering

Abstract

AbstractSimilarity search is to find the most similar items for a certain target item. The ability of similarity search at large scale plays a significant role in many information retrieval applications, and thus has received much attention. Text hashing is a promising strategy, which utilizes binary encoding to represent documents, obtaining attractive performance. This paper makes the first attempt to utilize Bayesian Clustering for Text Hashing, dubbed as BCTH. Specifically, BCTH is able to map documents to binary codes by utilizing multiple Bayesian Clusterings in parallel, where each Bayesian Clustering is responsible for one bit. Our approach employs the bit-balanced constraint to maximize the amount of information in each bit. Meanwhile, the bit-uncorrected constraint is adopted to keep the independence among all bits. The time complexity of BCTH is linear, where the hash codes and hash function are jointly learned. The experimental results, based on four widely-used datasets, demonstrate that BCTH is competitive, compared with currently competitive baselines in the perspective of both precision and training speed.

🚀 Conference Pioneer — AACL 2020
🌉 Interdisciplinary Bridge — Computer Science and Data Science & Analytics and Machine Learning
🧭 Keyword Pioneer — binary encoding
🐣 Hot Topic Early Bird — information retrieval
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio