Reinforcement Learning › Methods ›

Deep RL

3861 directly classified papers

Papers per year

Papers

Threshold UCT: Cost-Constrained Monte Carlo Tree Search with Pareto Curves AAAI 2025

Bootstrapped Reward Shaping AAAI 2025

The Distributional Reward Critic Framework for Reinforcement Learning Under Perturbed Rewards AAAI 2025

RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors AAAI 2025

InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model ACL 2025

Deep Implicit Imitation Reinforcement Learning in Heterogeneous Action Settings AAAI 2025

FedAA: A Reinforcement Learning Perspective on Adaptive Aggregation for Fair and Robust Federated Learning AAAI 2025

Probabilistic Shielding for Safe Reinforcement Learning AAAI 2025

DUO: Diverse, Uncertain, On-Policy Query Generation and Selection for Reinforcement Learning from Human Feedback AAAI 2025

GLAM: Global-Local Variation Awareness in Mamba-based World Model AAAI 2025

Highly Parallelized Reinforcement Learning Training with Relaxed Assignment Dependencies AAAI 2025

Enhancing AMR Parsing with Group Relative Policy Optimization ACL 2025

Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL ACL 2025

Efficient Reinforcement Learning in Probabilistic Reward Machines AAAI 2025

Query-efficient Attack for Black-box Image Inpainting Forensics via Reinforcement Learning AAAI 2025

Reducing AUV Energy Consumption Through Dynamic Sensor Directions Switching via Deep Reinforcement Learning AAAI 2025

Removing Prompt-template Bias in Reinforcement Learning from Human Feedback ACL 2025

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning AAAI 2025

Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning AAAI 2025

Epistemic Bellman Operators AAAI 2025

SMoSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks AAAI 2025

Adversarial Preference Learning for Robust LLM Alignment ACL 2025

Intelligent OPC Engineer Assistant for Semiconductor Manufacturing AAAI 2025

Learning Joint Behaviors with Large Variations AAAI 2025

Partially Observable Reference Policy Programming IJCAI 2025