SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains

Krithika Ramesh; Daniel Smolyak; Zihao Zhao; Nupoor Gandhi; Ritu Agarwal; Margrét V. Bjarnadóttir; Anjalie Field

2025 EMNLP EMNLP 2025

SynthTextEval: Synthetic Text Data Generation and Evaluation for High-Stakes Domains

Abstract

AbstractWe present SynthTextEval, a toolkit for conducting comprehensive evaluations of synthetic text. The fluency of large language model (LLM) outputs has made synthetic text potentially viable for numerous applications, such as reducing the risks of privacy violations in the development and deployment of AI systems in high-stakes domains. Realizing this potential, however, requires principled consistent evaluations of synthetic data across multiple dimensions: its utility in downstream systems, the fairness of these systems, the risk of privacy leakage, general distributional differences from the source text, and qualitative feedback from domain experts. SynthTextEval allows users to conduct evaluations along all of these dimensions over synthetic data that they upload or generate using the toolkit’s generation module. While our toolkit can be run over any data, we highlight its functionality and effectiveness over datasets from two high-stakes domains: healthcare and law. By consolidating and standardizing evaluation metrics, we aim to improve the viability of synthetic text, and in-turn, privacy-preservation in AI development.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Data Science & Analytics and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Krithika Ramesh , Daniel Smolyak , Zihao Zhao , Nupoor Gandhi , Ritu Agarwal , Margrét V. Bjarnadóttir , Anjalie Field

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Data Augmentation Machine Learning > Application Areas > Privacy Data Science & Analytics > Methods > Data Mining Data Science & Analytics > Applications Artificial Intelligence > Core AI > Privacy Artificial Intelligence > Core AI > Large Language Models Machine Learning > Optimization & Theory > Evaluation Machine Learning > Learning Types > Evaluation Machine Learning > Learning Types > Privacy

Keywords

domain adaptation text generation privacy preservation synthetic data generation evaluation framework synthetic datum privacy leakage synthetic text large language model data evaluation high-stakes domain evaluation toolkit

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025