← Application Areas

Machine Learning › Application Areas ›

Data Augmentation

3622 directly classified papers

Papers per year

Papers

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices ACL 2025

Data-Constrained Synthesis of Training Data for De-Identification ACL 2025

Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains ACL 2025

A Survey on Efficient Large Language Model Training: From Data-centric Perspectives ACL 2025

Interactive platform for the exploration of large-scale ‘living’ systematic maps ACL 2025

K/DA: Automated Data Generation Pipeline for Detoxifying Implicitly Offensive Language in Korean ACL 2025

Overlapping Context with Variable-Length Stride Increases Diversity when Training Large Language Model for Code ACL 2025

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation ACL 2025

D-GEN: Automatic Distractor Generation and Evaluation for Reliable Assessment of Generative Models ACL 2025

Explainable Depression Detection in Clinical Interviews with Personalized Retrieval-Augmented Generation ACL 2025

Anastasia at SemEval-2025 Task 9: Subtask 1, Ensemble Learning with Data Augmentation and Focal Loss for Food Risk Classification. ACL 2025

HTU at SemEval-2025 Task 11: Divide and Conquer - Multi-Label emotion classification using 6 DziriBERTs submodels with Label-fused Iterative Mask Filling technique for low-resource data augmentation. ACL 2025

Tuebingen at SemEval-2025 Task 10: Class Weighting, External Knowledge and Data Augmentation in BERT Models ACL 2025

Ustnlp16 at SemEval-2025 Task 9: Improving Model Performance through Imbalance Handling and Focal Loss ACL 2025

Scalable Vision Language Model Training via High Quality Data Curation ACL 2025

ScanEZ: Integrating Cognitive Models with Self-Supervised Learning for Spatiotemporal Scanpath Prediction ACL 2025

IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation ACL 2025

SOMD2025: A Challenging Shared Tasks for Software Related Information Extraction ACL 2025

BrightCookies at SemEval-2025 Task 9: Exploring Data Augmentation for Food Hazard Classification ACL 2025

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia ACL 2025

MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval ACL 2025

Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction ACL 2025

TRATES: Trait-Specific Rubric-Assisted Cross-Prompt Essay Scoring ACL 2025

TeleAI at SemEval-2025 Task 11: Bridging the Gap in Text-Based Emotion Detection with Prompt Engineering and Data Augmentation ACL 2025

Angeliki Linardatou at SemEval-2025 Task 11: Multi-label Emotion Detection ACL 2025