Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

Minh-Vuong Nguyen; Linhao Luo; Fatemeh Shiri; Dinh Phung; Yuan-Fang Li; Thuy-Trang Vu; Gholamreza Haffari

2024 ACL ACL 2024

Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

Abstract

AbstractLarge language models (LLMs) have demonstrated strong reasoning abilities when prompted to generate chain-of-thought (CoT) explanations alongside answers. However, previous research on evaluating LLMs has solely focused on answer accuracy, neglecting the correctness of the generated CoT. In this paper, we delve deeper into the CoT reasoning capabilities of LLMs in multi-hop question answering by utilizing knowledge graphs (KGs). We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs’ knowledge of reasoning and the accuracy of the generated CoT. Through experiments conducted on 5 different families of LLMs across 2 multi-hop question-answering datasets, we find that LLMs possess sufficient knowledge to perform reasoning. However, there exists a significant disparity between answer accuracy and faithfulness of the CoT generated by LLMs, indicating that they often arrive at correct answers through incorrect reasoning.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Knowledge & Reasoning and Natural Language Processing

🧭 Keyword Pioneer — reasoning faithfulness

🐣 Hot Topic Early Bird — multi-hop question answering

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Minh-Vuong Nguyen , Linhao Luo , Fatemeh Shiri , Dinh Phung , Yuan-Fang Li , Thuy-Trang Vu , Gholamreza Haffari

Topics

Artificial Intelligence > Core AI > Interpretability Natural Language Processing > Applications > Question Answering Knowledge & Reasoning > Representation > Knowledge Graphs Knowledge & Reasoning > Reasoning > Causal Inference Artificial Intelligence > Core AI > Large Language Models Artificial Intelligence > Core AI > Reasoning

Keywords

question answering chain-of-thought reasoning knowledge graph reasoning knowledge graph large language model evaluation multi-hop question answering multi-hop reasoning large language model reasoning faithfulness

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024