Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization

Wenhao Gao; Tianfan Fu; Jimeng Sun; Connor Coley

2022 NIPS NeurIPS 2022

Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization

Abstract

Molecular optimization is a fundamental goal in the chemical sciences and is of central interest to drug and material design. In recent years, significant progress has been made in solving challenging problems across various aspects of computational molecular optimizations, emphasizing high validity, diversity, and, most recently, synthesizability. Despite this progress, many papers report results on trivial or self-designed tasks, bringing additional challenges to directly assessing the performance of new methods. Moreover, the sample efficiency of the optimization---the number of molecules evaluated by the oracle---is rarely discussed, despite being an essential consideration for realistic discovery applications.To fill this gap, we have created an open-source benchmark for practical molecular optimization, PMO, to facilitate the transparent and reproducible evaluation of algorithmic advances in molecular optimization. This paper thoroughly investigates the performance of 25 molecular design algorithms on 23 single-objective (scalar) optimization tasks with a particular focus on sample efficiency. Our results show that most ``state-of-the-art'' methods fail to outperform their predecessors under a limited oracle budget allowing 10K queries and that no existing algorithm can efficiently solve certain molecular optimization problems in this setting. We analyze the influence of the optimization algorithm choices, molecular assembly strategies, and oracle landscapes on the optimization performance to inform future algorithm development and benchmarking. PMO provides a standardized experimental setup to comprehensively evaluate and compare new molecule optimization methods with existing ones. All code can be found at https://github.com/wenhao-gao/mol_opt.

🌉 Interdisciplinary Bridge — Data Science & Analytics and Healthcare & Medicine and Machine Learning

🧭 Keyword Pioneer — molecular optimization

🐣 Hot Topic Early Bird — benchmark evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics

Authors

Wenhao Gao , Tianfan Fu , Jimeng Sun , Connor Coley

Topics

Machine Learning > Optimization & Theory > Optimization Machine Learning > Application Areas > Domain Adaptation Machine Learning > Application Areas > Efficient Computing Healthcare & Medicine > Research > Bioinformatics Data Science & Analytics > Methods > Time Series Interdisciplinary > Science > Quantum Computing Machine Learning > Core Methods > Optimization Machine Learning > Learning Types > Optimization Interdisciplinary > Science > Bioinformatics Deep Learning > Learning Types > Generative Model

Keywords

molecular optimization sample efficiency benchmark evaluation optimization algorithm drug design drug discovery molecular design molecular generation oracle query

Download PDF

Related papers

Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching 2022

A Theoretical View on Sparsely Activated Networks 2022

Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks 2022

Matryoshka Representation Learning 2022

Off-Policy Evaluation with Deficient Support Using Side Information 2022