Is the ranking of PubMed similar articles good enough? An evaluation of text similarity methods for three datasets

Mariana Neves; Ines Schadock; Beryl Eusemann; Gilbert Schnfelder; Bettina Bert; Daniel Butzke

2023 ACL ACL 2023

Is the ranking of PubMed similar articles good enough? An evaluation of text similarity methods for three datasets

Abstract

AbstractThe use of seed articles in information retrieval provides many advantages, such as a longercontext and more details about the topic being searched for. Given a seed article (i.e., a PMID), PubMed provides a pre-compiled list of similar articles to support the user in finding equivalent papers in the biomedical literature. We aimed at performing a quantitative evaluation of the PubMed Similar Articles based on three existing biomedical text similarity datasets, namely, RELISH, TREC-COVID, and SMAFIRA-c. Further, we carried out a survey and an evaluation of various text similarity methods on these three datasets. Our experiments considered the original title and abstract from PubMed as well as automatically detected sections and manually annotated relevant sentences. We provide an overview about which methods better performfor each dataset and compare them to the ranking in PubMed similar articles. While resultsvaried considerably among the datasets, we were able to obtain a better performance thanPubMed for all of them. Datasets and source codes are available at: https://github.com/mariananeves/reranking

❓ The Questioner

🌉 Interdisciplinary Bridge — Computer Science and Healthcare & Medicine and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — biomedical literature search

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio