Olivia Watkins

6 papers · 2021–2024 · 3 conferences · across top CS/AI conferences

Achievements

🌍 Conference Polyglot (3) 🌉 Interdisciplinary Bridge 🧭 Keyword Pioneer 🐝 Cross-Pollinator (15) 👑 Triple Crown

Conferences

NIPS (3) ICML (2) ICLR (1)

Top co-authors

Pieter Abbeel (6) Yuqing Du (3) Trevor Darrell (3) Abhishek Gupta (2) Sam Toyer (2) Jacob Andreas (2) Justin Svegliato (2) Jessy Lin (1) Moonkyung Ryu (1) Luke Bailey (1)

Keywords

reinforcement learning (3) policy gradient (1) policy learning (1) text-to-image generation (1) reward function (1) intrinsic motivation (1) diffusion model (1) language model (1) exploration bonus (1) safety fine-tuning (1) attack success rate (1) goal generation (1) interactive feedback (1) harmfulness evaluation (1) advice distillation (1) jailbreak benchmark (1)

Papers

A StrongREJECT for Empty Jailbreaks NIPS 2024

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game ICLR 2024

Learning to Model the World With Language ICML 2024

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models NIPS 2023

Guiding Pretraining in Reinforcement Learning with Large Language Models ICML 2023

Teachable Reinforcement Learning via Advice Distillation NIPS 2021