Can I trust You? LLMs as conversational agents

Marc Döbler; Raghavendran Mahendravarman; Anna Moskvina; Nasrin Saef

2024 EACL EACL 2024

Can I trust You? LLMs as conversational agents

Abstract

AbstractWith the rising popularity of LLMs in the public sphere, they become more and more attractive as a tool for doing one’s own research without having to rely on search engines or specialized knowledge of a scientific field. But using LLMs as a source for factual information can lead one to fall prey to misinformation or hallucinations dreamed up by the model. In this paper we examine the gpt-4 LLM by simulating a large number of potential research queries and evaluate how many of the generated references are factually correct as well as existent.

❓ The Questioner

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐣 Hot Topic Early Bird — conversational agent

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Marc Döbler , Raghavendran Mahendravarman , Anna Moskvina , Nasrin Saef

Topics

Artificial Intelligence > Core AI > AI Safety Artificial Intelligence > Core AI > Human-AI Interaction Natural Language Processing > Applications > Fact-Checking Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Evaluation

Keywords

fact verification factuality evaluation hallucination detection conversational agent large language model reference verification

Download PDF

Related papers

A Dataset for Metaphor Detection in Early Medieval Hebrew Poetry 2024

PRILoRA: Pruned and Rank-Increasing Low-Rank Adaptation 2024

Overview of the Hate Speech Detection in Turkish and Arabic Tweets (HSD-2Lang) Shared Task at CASE 2024 2024

Evaluating In-Context Learning for Computational Literary Studies: A Case Study Based on the Automatic Recognition of Knowledge Transfer in German Drama 2024

Selam@DravidianLangTech 2024:Identifying Hate Speech and Offensive Language 2024