← Learning Types

Machine Learning › Learning Types ›

Reinforcement Learning

2932 directly classified papers

Papers per year

Papers

Large Language Models with Reinforcement Learning from Human Feedback Approach for Enhancing Explainable Sexism Detection COLING 2025

Towards Human Understanding of Paraphrase Types in Large Language Models COLING 2025

Why Does ChatGPT “Delve” So Much? Exploring the Sources of Lexical Overrepresentation in Large Language Models COLING 2025

LLMSR@XLLM25: A Language Model-Based Pipeline for Structured Reasoning Data Construction ACL 2025

Sparks of Tabular Reasoning via Text2SQL Reinforcement Learning ACL 2025

The Power of Simplicity in LLM-Based Event Forecasting ACL 2025

Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization EMNLP 2025

Positive Experience Reflection for Agents in Interactive Text Environments ACL 2025

Team XSZ at BioLaySumm2025: Section-Wise Summarization, Retrieval-Augmented LLM, and Reinforcement Learning Fine-Tuning for Lay Summaries ACL 2025

The Fellowship of the LLMs: Multi-Model Workflows for Synthetic Preference Optimization Dataset Generation ACL 2025

LLMs Protégés: Tutoring LLMs with Knowledge Gaps Improves Student Learning Outcome ACL 2025

LookAlike: Consistent Distractor Generation in Math MCQs ACL 2025

Direct Repair Optimization: Training Small Language Models For Educational Program Repair Improves Feedback ACL 2025

Temporalizing Confidence: Evaluation of Chain-of-Thought Reasoning with Signal Temporal Logic ACL 2025

bea-jh at BEA 2025 Shared Task: Evaluating AI-powered Tutors through Pedagogically-Informed Reasoning ACL 2025

Enhancing Persona Consistency for LLMs’ Role-Playing using Persona-Aware Contrastive Learning ACL 2025

Search-in-Context: Efficient Multi-Hop QA over Long Contexts via Monte Carlo Tree Search with Dynamic KV Retrieval ACL 2025

A Constrained Text Revision Agent via Iterative Planning and Searching ACL 2025

Thinking Before Running! Efficient Code Generation with Thorough Exploration and Optimal Refinement ACL 2025

Understand the Implication: Learning to Think for Pragmatic Understanding ACL 2025

Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise Rewards for Mathematical Reasoning ACL 2025

Should I Trust You? Detecting Deception in Negotiations using Counterfactual RL ACL 2025

Sparse Rewards Can Self-Train Dialogue Agents ACL 2025

MWPO: Enhancing LLMs Performance through Multi-Weight Preference Strength and Length Optimization ACL 2025

MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification ACL 2025