2025 ACL ACL 2025

SYNTHVERIFY: Enhancing Zero-Shot Claim Verification through Step-by-Step Synthetic Data Generation

Abstract

AbstractClaim verification is a fundamental task in natural language processing (NLP), involving the assessment of whether available evidence supports or refutes a given claim. While large language models (LLMs) have shown promise in this area, they continue to struggle with domain-specific knowledge. Synthetic data generation has emerged as an effective solution to this challenge. However, existing methods are often either inefficient to scale across multiple domains or overly reliant on external documents. We introduce SYNTHVERIFY, a novel step-by-step prompting-based synthetic data generation framework designed to enhance zero-shot claim verification. Our core insight is that guiding generation with domain-specific claim patterns and structured evidence plans can bridge LLMs’ knowledge gaps in specialized domains without requiring access to external corpora or sacrificing generalizability. Using SYNTHVERIFY, we construct a diverse synthetic dataset for zero-shot verification, enabling instruction fine-tuning tailored to the verification task. Empirical results across multiple specialized domains demonstrate significant accuracy improvements, including a 20.1-point gain on the Llama-3-8B model. Our results highlight the effectiveness of structured synthetic data generation in addressing the limitations of verification systems, particularly in domain-specific tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio