GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Sebastian Gehrmann; Abhik Bhattacharjee; Abinaya Mahendiran; Alex Wang; Alexandros Papangelis; Aman Madaan; Angelina McMillan-Major; Anna Shvets; Ashish Upadhyay; Bernd Bohnet; Bingsheng Yao; Bryan Wilie; Chandra Bhagavatula; Chaobin You; Craig Thomson; Cristina Gârbacea; Dakuo Wang; Daniel Deutsch; Deyi Xiong; Di Jin; Dimitra Gkatzia; Dragomir Radev; Elizabeth Clark; Esin Durmus; Faisal Ladhak; Filip Ginter; Genta Indra Winata; Hendrik Strobelt; Hiroaki Hayashi; Jekaterina Novikova; Jenna Kanerva; Jenny Chim; Jiawei Zhou; Jordan Clive; Joshua Maynez; João Sedoc; Juraj Juraska; Kaustubh Dhole; Khyathi Raghavi Chandu; Laura Perez Beltrachini; Leonardo F . R. Ribeiro; Lewis Tunstall; Li Zhang; Mahim Pushkarna; Mathias Creutz; Michael White; Mihir Sanjay Kale; Moussa Kamal Eddine; Nico Daheim; Nishant Subramani; Ondřej Dušek; Paul Pu Liang; Pawan Sasanka Ammanamanchi; Qi Zhu; Ratish Puduppully; Reno Kriz; Rifat Shahriyar; Ronald Cardenas; Saad Mahamood; Salomey Osei; Samuel Cahyawijaya; Sanja Štajner; Sebastien Montella; Shailza Jolly; Simon Mille; Tahmid Hasan; Tianhao Shen; Tosin Adewumi; Vikas Raunak; Vipul Raheja; Vitaly Nikolaev; Vivian Tsai; Yacine Jernite; Ying Xu; Yisi Sang; Yixin Liu; Yufang Hou

2022 EMNLP EMNLP 2022

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Abstract

AbstractEvaluations in machine learning rarely use the latest metrics, datasets, or human evaluation in favor of remaining compatible with prior work. The compatibility, often facilitated through leaderboards, thus leads to outdated but standardized evaluation practices. We pose that the standardization is taking place in the wrong spot. Evaluation infrastructure should enable researchers to use the latest methods and what should be standardized instead is how to incorporate these new evaluation advances. We introduce GEMv2, the new version of the Generation, Evaluation, and Metrics Benchmark which uses a modular infrastructure for dataset, model, and metric developers to benefit from each other’s work. GEMv2 supports 40 documented datasets in 51 languages, ongoing online evaluation for all datasets, and our interactive tools make it easier to add new datasets to the living benchmark.

👥 Mega-Team — 77 authors

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science

🧭 Keyword Pioneer — modular infrastructure

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Sebastian Gehrmann , Abhik Bhattacharjee , Abinaya Mahendiran , Alex Wang , Alexandros Papangelis , Aman Madaan , Angelina McMillan-Major , Anna Shvets , Ashish Upadhyay , Bernd Bohnet , Bingsheng Yao , Bryan Wilie , Chandra Bhagavatula , Chaobin You , Craig Thomson , Cristina Gârbacea , Dakuo Wang , Daniel Deutsch , Deyi Xiong , Di Jin , Dimitra Gkatzia , Dragomir Radev , Elizabeth Clark , Esin Durmus , Faisal Ladhak , Filip Ginter , Genta Indra Winata , Hendrik Strobelt , Hiroaki Hayashi , Jekaterina Novikova , Jenna Kanerva , Jenny Chim , Jiawei Zhou , Jordan Clive , Joshua Maynez , João Sedoc , Juraj Juraska , Kaustubh Dhole , Khyathi Raghavi Chandu , Laura Perez Beltrachini , Leonardo F . R. Ribeiro , Lewis Tunstall , Li Zhang , Mahim Pushkarna , Mathias Creutz , Michael White , Mihir Sanjay Kale , Moussa Kamal Eddine , Nico Daheim , Nishant Subramani , Ondřej Dušek , Paul Pu Liang , Pawan Sasanka Ammanamanchi , Qi Zhu , Ratish Puduppully , Reno Kriz , Rifat Shahriyar , Ronald Cardenas , Saad Mahamood , Salomey Osei , Samuel Cahyawijaya , Sanja Štajner , Sebastien Montella , Shailza Jolly , Simon Mille , Tahmid Hasan , Tianhao Shen , Tosin Adewumi , Vikas Raunak , Vipul Raheja , Vitaly Nikolaev , Vivian Tsai , Yacine Jernite , Ying Xu , Yisi Sang , Yixin Liu , Yufang Hou

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Computer Science > Applications > Software Engineering

Keywords

natural language generation evaluation metrics modular infrastructure

Download PDF

Generative Entity Typing with Curriculum Learning 2022

Towards Reinterpreting Neural Topic Models via Composite Activations 2022

Weakly Supervised Headline Dependency Parsing 2022

Cross-modal Transfer Between Vision and Language for Protest Detection 2022

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

Abstract

Authors

Topics

Keywords

Related papers