2024 EMNLP EMNLP 2024

Comparing Edge-based and Node-based Methods on a Citation Prediction Task

Abstract

AbstractCitation Prediction, estimating whether paper a cites paper b, is particularly interesting in a forecasting setting where the model is trained on papers published before time t, and evaluated on papers published after h, where h is the forecast horizon. Performance improves with t (larger training sets) and degrades with h (longer forecast horizons). The trade-off between edge-based methods and node-based methods depends on t. Because edges grow faster than nodes, larger training sets favor edge-based methods.We introduce a new forecast-based Citation Prediction benchmark of 3 million papers to quantify these trends.Our benchmark shows that desirable policies for combining edge- and node-based methods depend on h and t.We release our benchmark, evaluation scripts, and embeddings.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Deep Learning and Machine Learning
🧭 Keyword Pioneer — edge-based method
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio