Do We Need Language-Specific Fact-Checking Models? The Case of Chinese

Caiqi Zhang; Zhijiang Guo; Andreas Vlachos

2024 EMNLP EMNLP 2024

Do We Need Language-Specific Fact-Checking Models? The Case of Chinese

Abstract

AbstractThis paper investigates the potential benefits of language-specific fact-checking models, focusing on the case of Chinese using CHEF dataset. To better reflect real-world fact-checking, we first develop a novel Chinese document-level evidence retriever, achieving state-of-the-art performance. We then demonstrate the limitations of translation-based methods and multilingual language models, highlighting the need for language-specific systems. To better analyze token-level biases in different systems, we construct an adversarial dataset based on the CHEF dataset, where each instance has a large word overlap with the original one but holds the opposite veracity label. Experimental results on the CHEF dataset and our adversarial dataset show that our proposed method outperforms translation-based methods and multilingual language models and is more robust toward biases, emphasizing the importance of language-specific fact-checking systems.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Data Science & Analytics and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Caiqi Zhang , Zhijiang Guo , Andreas Vlachos

Topics

Natural Language Processing > Applications > Fact-Checking Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Resources & Methods > Multilingual NLP Computer Science > Applications > Information Retrieval Data Science & Analytics > Applications > Information Retrieval Machine Learning > Application Areas > Information Retrieval Artificial Intelligence > Core AI > Natural Language Processing Artificial Intelligence > Core AI > Information Retrieval Machine Learning > Learning Types > Multi-Lingual Learning

Keywords

multilingual model evidence retrieval language-specific model chinese language adversarial dataset

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024