Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Application Areas
Machine Learning
›
Application Areas
›
Data Augmentation
3622 directly classified papers
Papers per year
2002: 2
2006: 1
2008: 2
2009: 1
2011: 3
2012: 3
2013: 9
2014: 8
2015: 7
2016: 35
2017: 45
2018: 108
2019: 239
2020: 329
2021: 477
2022: 518
2023: 607
2024: 561
2025: 546
2026: 121
Papers
AQuilt: Weaving Logic and Self-Inspection into Low-Cost, High-Relevance Data Synthesis for Specialist LLMs
EMNLP 2025
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
EMNLP 2025
Pragyaan: Designing and Curating High-Quality Cultural Post-Training Datasets for Indian Languages
EMNLP 2025
Scaling Low-Resource MT via Synthetic Data Generation with LLMs
EMNLP 2025
CondenseLM: LLMs-driven Text Dataset Condensation via Reward Matching
EMNLP 2025
PolyNorm: Few-Shot LLM-Based Text Normalization for Text-to-Speech
EMNLP 2025
More Data or Better Data? A Critical Analysis of Data Selection and Synthesis for Mathematical Reasoning
EMNLP 2025
CharacterCraft: Bridging the Literature-Reality Dialogue Gap for Practical Role-Playing Agents
EMNLP 2025
ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning
EMNLP 2025
NLP for preserving Torlak, a vulnerable low-resource Slavic language
COLING 2025
VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction
COLING 2025
Dunamu ML at the Financial Misinformation Detection Challenge Task: Improving Supervised Fine-Tuning with LLM-based Data Augmentation
COLING 2025
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition
EMNLP 2025
CompCap: Improving Multimodal Large Language Models with Composite Captions
ICCV 2025
Transplant Then Regenerate: A New Paradigm for Text Data Augmentation
EMNLP 2025
We Need to Measure Data Diversity in NLP — Better and Broader
EMNLP 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
CVPR 2025
Assessing the Role of Data Quality in Training Bilingual Language Models
EMNLP 2025
S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting
CVPR 2025
MODA: Motion-Drift Augmentation for Inertial Human Motion Analysis
CVPR 2025
All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising
CVPR 2025
Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation
CVPR 2025
Low-Biased General Annotated Dataset Generation
CVPR 2025
Generative Hard Example Augmentation for Semantic Point Cloud Segmentation
CVPR 2025
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning
CVPR 2025
<
1
…
8
9
10
…
145
>