Yushi Yang
2 papers
· 2025–2025
· 1 conference
· across top CS/AI conferences
Achievements
π
Interdisciplinary Bridge
π
Cross-Pollinator
(15)
β
The Questioner
Conferences
EMNLP (2)
Top co-authors
Keywords
direct preference optimization
(1)
model behavior
(1)
neural network analysis
(1)
ai safety
(1)
language model
(1)
decision boundary
(1)
model explanation
(1)
counterfactual explanation
(1)
mechanistic interpretability
(1)
safety fine-tuning
(1)
activation editing
(1)
neuron analysis
(1)
toxicity reduction
(1)
large language model
(1)
language model safety
(1)
self-generated explanation
(1)