Olivia Watkins
6 papers
· 2021–2024
· 3 conferences
· across top CS/AI conferences
Achievements
🌍
Conference Polyglot
(3)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(15)
👑
Triple Crown
Conferences
NIPS (3)
ICML (2)
ICLR (1)
Top co-authors
Keywords
reinforcement learning
(3)
policy gradient
(1)
policy learning
(1)
text-to-image generation
(1)
reward function
(1)
intrinsic motivation
(1)
diffusion model
(1)
language model
(1)
exploration bonus
(1)
safety fine-tuning
(1)
attack success rate
(1)
goal generation
(1)
interactive feedback
(1)
harmfulness evaluation
(1)
advice distillation
(1)
jailbreak benchmark
(1)
Papers
A StrongREJECT for Empty Jailbreaks
NIPS 2024