2025 ICML ICML 2025

Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors