On the Use of Web Search to Improve Scientific Collections

Krutarth Patel; Cornelia Caragea; Sujatha Das Gollapalli

2020 EMNLP EMNLP 2020

On the Use of Web Search to Improve Scientific Collections

Abstract

AbstractDespite the advancements in search engine features, ranking methods, technologies, and the availability of programmable APIs, current-day open-access digital libraries still rely on crawl-based approaches for acquiring their underlying document collections. In this paper, we propose a novel search-driven framework for acquiring documents for such scientific portals. Within our framework, publicly-available research paper titles and author names are used as queries to a Web search engine. We were able to obtain ~267,000 unique research papers through our fully-automated framework using ~76,000 queries, resulting in almost 200,000 more papers than the number of queries. Moreover, through a combination of title and author name search, we were able to recover 78% of the original searched titles.

🌉 Interdisciplinary Bridge — Computer Science and Data Science & Analytics and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — digital libraries

🐣 Hot Topic Early Bird — document retrieval

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Krutarth Patel , Cornelia Caragea , Sujatha Das Gollapalli

Topics

Machine Learning > Learning Types > Unsupervised Learning Natural Language Processing > Applications > Information Retrieval Computer Science > Applications > Information Retrieval Computer Science > Applications > Document Analysis Data Science & Analytics > Applications > Information Retrieval Machine Learning > Application Areas > Information Retrieval

Keywords

information retrieval document retrieval web search scientific paper open access document collection digital library digital libraries document acquisition scientific collection scientific portal

Download PDF

Related papers

Fast semantic parsing with well-typedness guarantees 2020

Detecting Objectifying Language in Online Professor Reviews 2020

Analogous Process Structure Induction for Sub-event Sequence Prediction 2020

Aspect Sentiment Classification with Aspect-Specific Opinion Spans 2020

Robust and Interpretable Grounding of Spatial References with Relation Networks 2020