Assessing Web Search Credibility and Response Groundedness in Chat Assistants

Ivan Vykopal; Matúš Pikuliak; Simon Ostermann; Marian Simko

2026 EACL EACL 2026

Assessing Web Search Credibility and Response Groundedness in Chat Assistants

Abstract

AbstractChat assistants increasingly integrate web search functionality, enabling them to retrieve and cite external sources. While this promises more reliable answers, it also raises the risk of amplifying misinformation from low-credibility sources. In this paper, we introduce a novel methodology for evaluating assistants’ web search behavior, focusing on source credibility and the groundedness of responses with respect to cited sources. Using 100 claims across five misinformation-prone topics, we assess GPT-4o, GPT-5, Perplexity, and Qwen Chat. Our findings reveal differences between the assistants, with Perplexity achieving the highest source credibility, whereas GPT-4o exhibits elevated citation of non-credible sources on sensitive topics. This work provides the first systematic comparison of commonly used chat assistants for fact-checking behavior, offering a foundation for evaluating AI systems in high-stakes information environments.

🧭 Keyword Pioneer — chat assistant

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Knowledge & Reasoning, Machine Learning, Natural Language Processing, Reinforcement Learning

Authors

Ivan Vykopal , Matúš Pikuliak , Simon Ostermann , Marian Simko

Topics

Natural Language Processing > Applications > Fact-Checking Natural Language Processing > Resources & Methods > Large Language Models

Keywords

web search source credibility chat assistant

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026