Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI

2025 ICML ICML 2025

👥 Mega-Team — 34 authors

Shayne Longpre , Kevin Klyman , Ruth Elisabeth Appel , Sayash Kapoor , Rishi Bommasani , Michelle Sahar , Sean McGregor , Avijit Ghosh , Borhane Blili-Hamelin , Nathan Butters , Alondra Nelson , Dr. Amit Elazari , Andrew Sellars , Casey John Ellis , Dane Sherrets , Dawn Song , Harley Geiger , Ilona Cohen , Lauren Mcilvenny , Madhulika Srikumar , Mark M. Jaycox , Markus Anderljung , Nadine Farid Johnson , Nicholas Carlini , Nicolas Miailhe , Nik Marda , Peter Henderson , Rebecca S. Portnoff , Rebecca Weiss , Victoria Westerhoff , Yacine Jernite , Rumman Chowdhury , Percy Liang , Arvind Narayanan

Related papers