Papers
17,973 papers found
APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport
Zhuo Li, Yuege Feng, Dandan Guo et al.
A Position Paper on the Automatic Generation of Machine Learning Leaderboards
Roelien C. Timmer, Yufang Hou, Stephen Wan
A Probabilistic Inference Scaling Theory for LLM Self-Correction
Zhe Yang, Yichang Zhang, Yudong Wang et al.
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
Xiaopeng Ke, Hexuan Deng, Xuebo Liu et al.
ArabEmoNet: A Lightweight Hybrid 2D CNN-BiLSTM Model with Attention for Robust Arabic Speech Emotion Recognition
Ali Abouzeid, Bilal Elbouardi, Mohamed Maged et al.
ArabicWeb-Edu: Educational Quality Data for Arabic LLM Training
Majd Hawasly, Tasnim Mohiuddin, Hamdy Mubarak et al.
AraEval: An Arabic Multi-Task Evaluation Suite for Large Language Models
Alhanoof Althnian, Norah A. Alzahrani, Shaykhah Z. Alsubaie et al.
AraHalluEval: A Fine-grained Hallucination Evaluation Framework for Arabic LLMs
Aisha Alansari, Hamzah Luqman
AraHealthQA 2025: The First Shared Task on Arabic Health Question Answering
Hassan Alhuzali, Walid Al-Eisawi, Muhammad Abdul-Mageed et al.
AraReasoner: Evaluating Reasoning-Based LLMs for Arabic NLP
Ahmed Abul Hasanaath, Aisha Alansari, Ahmed Ashraf et al.
AraSafe: Benchmarking Safety in Arabic LLMs
Hamdy Mubarak, Abubakr Mohamed, Majd Hawasly
Archaeology at TSAR 2025 Shared Task Teaching Small Models to do CEFR Simplifications
Rareş-Alexandru Roşcan, Sergiu Nisioi
A Reasoner for Real-World Event Detection: Scaling Reinforcement Learning via Adaptive Perplexity-Aware Sampling Strategy
Xiaoyun Zhang, Jingqing Ruan, Xing Ma et al.
Are BabyLMs Deaf to Gricean Maxims? A Pragmatic Evaluation of Sample-efficient Language Models
Raha Askari, Sina Zarrieß, Özge Alacam et al.
Are Checklists Really Useful for Automatic Evaluation of Generative Tasks?
Momoka Furuhashi, Kouta Nakayama, Takashi Kodama et al.
Are Economists Always More Introverted? Analyzing Consistency in Persona-Assigned LLMs
Manon Reusens, Bart Baesens, David Jurgens
Are Generative Models Underconfident? Better Quality Estimation with Boosted Model Probability
Tu Anh Dinh, Jan Niehues
Are Knowledge and Reference in Multilingual Language Models Cross-Lingually Consistent?
Xi Ai, Mahardika Krisna Ihsani, Min-Yen Kan
Are Language Models Consequentialist or Deontological Moral Reasoners?
Keenan Samway, Max Kleiman-Weiner, David Guzman Piedrahita et al.
Are Large Language Models Chronically Online Surfers? A Dataset for Chinese Internet Meme Explanation
Yubo Xie, Chenkai Wang, Zongyang Ma et al.
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
Omer Nahum, Nitay Calderon, Orgad Keller et al.