CLOMO: Counterfactual Logical Modification with Large Language Models

Yinya Huang; Ruixin Hong; Hongming Zhang; Wei Shao; Zhicheng Yang; Dong Yu; Changshui Zhang; Xiaodan Liang; Linqi Song

2024 ACL ACL 2024

CLOMO: Counterfactual Logical Modification with Large Language Models

Abstract

AbstractIn this study, we delve into the realm of counterfactual reasoning capabilities of large language models (LLMs). Our primary objective is to cultivate the counterfactual thought processes within LLMs and rigorously assess these processes for their validity. Specifically, we introduce a novel task, Counterfactual Logical Modification (CLOMO), and a high-quality human-annotated benchmark. In this task, LLMs must adeptly alter a given argumentative text to uphold a predetermined logical relationship. To effectively evaluate a generation model’s counterfactual capabilities, we propose an innovative evaluation metric, the decomposed Self-Evaluation Score (SES) to directly evaluate the natural language output of LLMs instead of modeling the task as a multiple-choice problem. Analysis shows that the proposed automatic metric aligns well with human preference. Our experimental results show that while LLMs demonstrate a notable capacity for logical counterfactual thinking, there remains a discernible gap between their current abilities and human performance. Code and data are available at https://github.com/Eleanor-H/CLOMO.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — logical modification

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Yinya Huang , Ruixin Hong , Hongming Zhang , Wei Shao , Zhicheng Yang , Dong Yu , Changshui Zhang , Xiaodan Liang , Linqi Song

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Optimization & Theory > Learning Theory Artificial Intelligence > Core AI > Reasoning Natural Language Processing > Applications > Natural Language Inference Natural Language Processing > Applications > Text Generation

Keywords

benchmark evaluation logical reasoning text generation counterfactual reasoning large language model logical modification argument modification

Download PDF

Related papers

Reinforcement Learning-Driven LLM Agent for Automated Attacks on LLMs 2024

EtymoLink: A Structured English Etymology Dataset 2024

Turkish Delights: A Dataset on Turkish Euphemisms 2024

Subjectivity Detection in English News using Large Language Models 2024

Does DetectGPT Fully Utilize Perturbation? Bridging Selective Perturbation to Fine-tuned Contrastive Learning Detector would be Better 2024