Halu-NLP at SemEval-2024 Task 6: MetaCheckGPT - A Multi-task Hallucination Detection using LLM uncertainty and meta-models

Rahul Mehta; Andrew Hoblitzell; Jack O’keefe; Hyeju Jang; Vasudeva Varma

2024 NAACL NAACL 2024

Halu-NLP at SemEval-2024 Task 6: MetaCheckGPT - A Multi-task Hallucination Detection using LLM uncertainty and meta-models

Abstract

AbstractHallucinations in large language models(LLMs) have recently become a significantproblem. A recent effort in this directionis a shared task at Semeval 2024 Task 6,SHROOM, a Shared-task on Hallucinationsand Related Observable Overgeneration Mis-takes. This paper describes our winning so-lution ranked 1st and 2nd in the 2 sub-tasksof model agnostic and model aware tracks re-spectively. We propose a meta-regressor basedensemble of LLMs based on a random forestalgorithm that achieves the highest scores onthe leader board. We also experiment with var-ious transformer based models and black boxmethods like ChatGPT, Vectara, and others. Inaddition, we perform an error analysis com-paring ChatGPT against our best model whichshows the limitations of the former

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rahul Mehta , Andrew Hoblitzell , Jack O’keefe , Hyeju Jang , Vasudeva Varma

Topics

Artificial Intelligence > Core AI > Responsible AI Machine Learning > Application Areas > Fairness

Keywords

hallucination detection random forest large language model

Download PDF

Related papers

Working Alliance Transformer for Psychotherapy Dialogue Classification 2024

Named Entity Recognition Under Domain Shift via Metric Learning for Life Sciences 2024

Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study 2024

TelME: Teacher-leading Multimodal Fusion Network for Emotion Recognition in Conversation 2024

Extractive Summarization with Text Generator 2024