← Back to papers

2025 ICML ICML 2025

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

Authors

Adam Karvonen , Can Rager , Johnny Lin , Curt Tigges , Joseph Isaac Bloom , David Chanin , Yeu-Tong Lau , Eoin Farrell , Callum Stuart McDougall , Kola Ayonrinde , Demian Till , Matthew Wearden , Arthur Conmy , Samuel Marks , Neel Nanda

Related papers

Scaling Sparse Feature Circuits For Studying In-Context Learning 2025

Incremental Gradient Descent with Small Epoch Counts is Surprisingly Slow on Ill-Conditioned Problems 2025

SToFM: a Multi-scale Foundation Model for Spatial Transcriptomics 2025

Batch List-Decodable Linear Regression via Higher Moments 2025

GS-Bias: Global-Spatial Bias Learner for Single-Image Test-Time Adaptation of Vision-Language Models 2025