Thomas Kwa
3 papers
· 2024–2024
· 1 conference
· across top CS/AI conferences
Achievements
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(11)
🗺️
Taxonomy Completionist
(15)
Conferences
NIPS (3)
Top co-authors
Keywords
mechanistic interpretability
(2)
neural network
(2)
neural network verification
(2)
policy optimization
(1)
kl divergence
(1)
reinforcement learning from human feedback
(1)
formal verification
(1)
heavy-tailed distribution
(1)
reward misspecification
(1)
reward hacking
(1)
formal guarantee
(1)
accuracy lower bound
(1)
proof transferability
(1)
causal model
(1)
circuit discovery
(1)
interchange intervention training
(1)
performance bound
(1)
transformer model
(1)
transformer architecture
(1)
accuracy bound
(1)