2024 EMNLP EMNLP 2024

FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimization