Reinforcement Learning Without Explicit Rewards: Theory and Practice

Weitong ZHANG

2026 AAAI AAAI 2026

Reinforcement Learning Without Explicit Rewards: Theory and Practice

Abstract

Abstract In this New Faculty Highlights, I begin with the reward free exploration that learns broad state and skill coverage with intrinsic rewards and remains robust under misspecification during efficient finetuning; guided generation methods that preserve the prior policy and mitigate reward hacking; and AI for science and healthcare, including practical RL for autonomous laboratories and automatic diagnosis. Building on impacts evidenced by publications, adoption, and awards. My future work will pursue imitation learning and contextual multi task RL that connect behavioral cloning with interactive policies without explicit reward design; personalized and multi-tasked offline to online adaptation with in-context demonstrations. In parallel, I am broadening the impact of AI for science and healthcare through existing collaborations. I will close with a talk that surveys these results and outlines an agenda for reinforcement learning without explicit rewards.

🌉 Interdisciplinary Bridge — Machine Learning and Reinforcement Learning

🧭 Keyword Pioneer — offline to online adaptation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Weitong ZHANG

Topics

Machine Learning > Learning Types > Self-Supervised Learning Machine Learning > Learning Types > Unsupervised Learning Reinforcement Learning > Methods > Deep RL

Keywords

imitation learning intrinsic reward behavioral cloning reward-free exploration guided generation offline to online adaptation

Download PDF

Related papers

Hi-EF: Benchmarking Emotion Forecasting in Human-interaction 2026

MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding 2026

Sparse3DPR: Training-Free 3D Hierarchical Scene Parsing and Task-Adaptive Subgraph Reasoning from Sparse RGB Views 2026

LayerEdit: Disentangled Multi-Object Editing via Conflict-Aware Multi-Layer Learning 2026

HDGS: Hierarchical Dynamic Gaussian Splatting for Urban Driving Scenes 2026