Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Learning Types
Machine Learning
›
Learning Types
›
Evaluation
1654 directly classified papers
Papers per year
2005: 1
2006: 1
2007: 1
2008: 2
2009: 1
2010: 3
2011: 2
2012: 3
2013: 5
2014: 4
2015: 4
2016: 11
2017: 19
2018: 32
2019: 39
2020: 72
2021: 110
2022: 202
2023: 222
2024: 351
2025: 569
Papers
Memorization ≠ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
EMNLP 2025
The Emperor’s New Reasoning: Format Imitation Overshadows Genuine Mathematical Understanding in SFT
EMNLP 2025
Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
EMNLP 2025
From Understanding to Generation: An Efficient Shortcut for Evaluating Language Models
EMNLP 2025
Transitive self-consistency evaluation of NLI models without gold labels
EMNLP 2025
Persona-Augmented Benchmarking: Evaluating LLMs Across Diverse Writing Styles
EMNLP 2025
DCR: Quantifying Data Contamination in LLMs Evaluation
EMNLP 2025
Drivel-ology: Challenging LLMs with Interpreting Nonsense with Depth
EMNLP 2025
Agent-as-Judge for Factual Summarization of Long Narratives
EMNLP 2025
Scalable and Culturally Specific Stereotype Dataset Construction via Human-LLM Collaboration
EMNLP 2025
Towards a Holistic and Automated Evaluation Framework for Multi-Level Comprehension of LLMs in Book-Length Contexts
EMNLP 2025
Adaptively profiling models with task elicitation
EMNLP 2025
Co-Eval: Augmenting LLM-based Evaluation with Machine Metrics
EMNLP 2025
OWL: Probing Cross-Lingual Recall of Memorized Texts via World Literature
EMNLP 2025
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
EMNLP 2025
Large Language Models Badly Generalize across Option Length, Problem Types, and Irrelevant Noun Replacements
EMNLP 2025
BOUQuET : dataset, Benchmark and Open initiative for Universal Quality Evaluation in Translation
EMNLP 2025
EGOILLUSION: Benchmarking Hallucinations in Egocentric Video Understanding
EMNLP 2025
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
EMNLP 2025
Do LLMs Behave as Claimed? Investigating How LLMs Follow Their Own Claims using Counterfactual Questions
EMNLP 2025
Can LLMs Extract Frame-Semantic Arguments?
EMNLP 2025
Are Language Models Consequentialist or Deontological Moral Reasoners?
EMNLP 2025
PatentScore: Multi-dimensional Evaluation of LLM-Generated Patent Claims
EMNLP 2025
Can Large Language Models Outperform Non-Experts in Poetry Evaluation? A Comparative Study Using the Consensual Assessment Technique
EMNLP 2025
UTER: Capturing the Human Touch in Evaluating Morphologically Rich and Low-Resource Languages
NAACL 2025
<
1
…
9
10
11
…
67
>