← Optimization & Theory

Machine Learning › Optimization & Theory ›

Learning Theory

5312 directly classified papers

Papers per year

Papers

Order Matters: Investigate the Position Bias in Multi-constraint Instruction Following ACL 2025

Compute Optimal Scaling of Skills: Knowledge vs Reasoning ACL 2025

Benchmarking Deep Search over Heterogeneous Enterprise Data EMNLP 2025

User Behavior Prediction as a Generic, Robust, Scalable, and Low-Cost Evaluation Strategy for Estimating Generalization in LLMs ACL 2025

LR²Bench: Evaluating Long-chain Reflective Reasoning Capabilities of Large Language Models via Constraint Satisfaction Problems ACL 2025

Do LLMs Adhere to Label Definitions? Examining Their Receptivity to External Label Definitions EMNLP 2025

OpenHuEval: Evaluating Large Language Model on Hungarian Specifics ACL 2025

Scaling Laws for Multilingual Language Models ACL 2025

From Remembering to Metacognition: Do Existing Benchmarks Accurately Evaluate LLMs? EMNLP 2025

EXECUTE: A Multilingual Benchmark for LLM Token Understanding ACL 2025

From Input Perception to Predictive Insight: Modeling Model Blind Spots Before They Become Errors EMNLP 2025

How Numerical Precision Affects Arithmetical Reasoning Capabilities of LLMs ACL 2025

Mind the Gap: A Closer Look at Tokenization for Multiple-Choice Question Answering with LLMs EMNLP 2025

Low-Perplexity LLM-Generated Sequences and Where To Find Them ACL 2025

A Theory of Response Sampling in LLMs: Part Descriptive and Part Prescriptive ACL 2025

Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs ACL 2025

Training Dynamics Underlying Language Model Scaling Laws: Loss Deceleration and Zero-Sum Learning ACL 2025

Language Models Grow Less Humanlike beyond Phase Transition ACL 2025

PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models ACL 2025

Understanding Stragglers in Large Model Training Using What-if Analysis OSDI 2025

Language Models Resist Alignment: Evidence From Data Compression ACL 2025

Limited Generalizability in Argument Mining: State-Of-The-Art Models Learn Datasets, Not Arguments ACL 2025

GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration ACL 2025

On the Generalization vs Fidelity Paradox in Knowledge Distillation ACL 2025

Veracity Bias and Beyond: Uncovering LLMs’ Hidden Beliefs in Problem-Solving Reasoning ACL 2025