The Good, the Bad, and the Debatable: A Survey on the Impacts of Data for In-Context Learning

Stephanie Schoch; Yangfeng Ji

2025 EMNLP EMNLP 2025

The Good, the Bad, and the Debatable: A Survey on the Impacts of Data for In-Context Learning

Abstract

AbstractIn-context learning is an emergent learning paradigm that enables an LLM to learn an unseen task by seeing a number of demonstrations in the context window. The quality of the demonstrations is of paramount importance as 1) context window size limitations restrict the number of demonstrations that can be presented to the model, and 2) the model must identify the task and potentially learn new, unseen input-output mappings from the limited demonstration set. An increasing body of work has also shown the sensitivity of predictions to perturbations on the demonstration set. Given this importance, this work presents a survey on the current literature pertaining to the relationship between data and in-context learning. We present our survey in three parts: the “good” – qualities that are desirable when selecting demonstrations, the “bad” – qualities of demonstrations that can negatively impact the model, as well as issues that can arise in presenting demonstrations, and the “debatable” – qualities of demonstrations with mixed results or factors modulating data impacts.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — demonstration set

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Stephanie Schoch , Yangfeng Ji

Topics

Artificial Intelligence > Learning Paradigms > Few-Shot Learning Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > In-Context Learning Deep Learning > Learning Types > In-Context Learning

Keywords

few-shot learning in-context learning prompt engineering data quality context window demonstration selection large language model demonstration set

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025