Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Reinforcement Learning
2932 directly classified papers
Papers per year
2003: 1
2006: 11
2007: 18
2008: 23
2009: 14
2010: 22
2011: 24
2012: 34
2013: 26
2014: 24
2015: 14
2016: 23
2017: 79
2018: 182
2019: 255
2020: 284
2021: 333
2022: 319
2023: 315
2024: 457
2025: 419
2026: 55
Papers
Large Language Models with Reinforcement Learning from Human Feedback Approach for Enhancing Explainable Sexism Detection
COLING 2025
Towards Human Understanding of Paraphrase Types in Large Language Models
COLING 2025
Why Does ChatGPT “Delve” So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models
COLING 2025
LLMSR@XLLM25: A Language Model-Based Pipeline for Structured Reasoning Data Construction
ACL 2025
Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning
ACL 2025
The Power of Simplicity in LLM-Based Event Forecasting
ACL 2025
Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
EMNLP 2025
Positive Experience Reflection for Agents in Interactive Text Environments
ACL 2025
Team XSZ at BioLaySumm2025: Section-Wise Summarization, Retrieval-Augmented LLM, and Reinforcement Learning Fine-Tuning for Lay Summaries
ACL 2025
The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation
ACL 2025
LLMs Protégés: Tutoring LLMs with Knowledge Gaps Improves Student Learning Outcome
ACL 2025
LookAlike: Consistent Distractor Generation in Math MCQs
ACL 2025
Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback
ACL 2025
Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic
ACL 2025
bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning
ACL 2025
Enhancing Persona Consistency for LLMs’ Role-Playing using Persona-Aware Contrastive Learning
ACL 2025
Search-in-Context: Efficient Multi-Hop QA over Long Contexts via Monte Carlo Tree Search with Dynamic KV Retrieval
ACL 2025
A Constrained Text Revision Agent via Iterative Planning and Searching
ACL 2025
Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement
ACL 2025
Understand the Implication: Learning to Think for Pragmatic Understanding
ACL 2025
Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning
ACL 2025
Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL
ACL 2025
Sparse Rewards Can Self-Train Dialogue Agents
ACL 2025
MWPO: Enhancing LLMs Performance through Multi-Weight Preference Strength and Length Optimization
ACL 2025
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
ACL 2025
<
1
…
5
6
7
…
118
>