Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Application Areas
Machine Learning
›
Application Areas
›
Data Augmentation
3622 directly classified papers
Papers per year
2002: 2
2006: 1
2008: 2
2009: 1
2011: 3
2012: 3
2013: 9
2014: 8
2015: 7
2016: 35
2017: 45
2018: 108
2019: 239
2020: 329
2021: 477
2022: 518
2023: 607
2024: 561
2025: 546
2026: 121
Papers
TopXGen: Topic-Diverse Parallel Data Generation for Low-Resource Machine Translation
EMNLP 2025
Evaluating Automatic Speech Recognition Systems for Korean Meteorological Experts
EMNLP 2025
SampleMix: A Sample-wise Pre-training Data Mixing Strategy by Coordinating Data Quality and Diversity
EMNLP 2025
Linguistic Alignment Predicts Learning in Small Group Tutoring Sessions
EMNLP 2025
FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation
EMNLP 2025
Beyond the Scientific Document: A Citation-Aware Multi-Granular Summarization Approach with Heterogeneous Graphs
EMNLP 2025
Rethinking Data Selection at Scale: Random Selection is Almost All You Need
EMNLP 2025
Speaking at the Right Level: Literacy-Controlled Counterspeech Generation with RAG-RL
EMNLP 2025
Enhancing Hate Speech Classifiers through a Gradient-assisted Counterfactual Text Generation Strategy
EMNLP 2025
Improving Influence-based Instruction Tuning Data Selection for Balanced Learning of Diverse Capabilities
EMNLP 2025
Moral Framing in Politics (MFiP): A new resource and models for moral framing
EMNLP 2025
GReX: A Graph Neural Network-Based Rerank-then-Expand Method for Detecting Conflicts Among Legal Articles in Korean Criminal Law
EMNLP 2025
Enhancing Large Vision-Language Models with Ultra-Detailed Image Caption Generation
EMNLP 2025
Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text
EMNLP 2025
Machine-generated text detection prevents language model collapse
EMNLP 2025
GRAID: Synthetic Data Generation with Geometric Constraints and Multi-Agentic Reflection for Harmful Content Detection
EMNLP 2025
LLM-Driven Completeness and Consistency Evaluation for Cultural Heritage Data Augmentation in Cross-Modal Retrieval
EMNLP 2025
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
EMNLP 2025
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance
EMNLP 2025
Corrupted but Not Broken: Understanding and Mitigating the Negative Impacts of Corrupted Data in Visual Instruction Tuning
EMNLP 2025
Exploring Quality and Diversity in Synthetic Data Generation for Argument Mining
EMNLP 2025
Doubling Your Data in Minutes: Ultra-fast Tabular Data Generation via LLM-Induced Dependency Graphs
EMNLP 2025
Recontextualizing Revitalization: A Mixed Media Approach to Reviving the Nüshu Language
EMNLP 2025
SynC-LLM: Generation of Large-Scale Synthetic Circuit Code with Hierarchical Language Models
EMNLP 2025
Evaluating the Effectiveness and Scalability of LLM-Based Data Augmentation for Retrieval
EMNLP 2025
<
1
…
7
8
9
…
145
>