Exploring Prompting Large Language Models as Explainable Metrics

Ghazaleh Mahmoudi

2023 IJCNLP IJCNLP 2023

Exploring Prompting Large Language Models as Explainable Metrics

Abstract

AbstractThis paper describes the IUST NLP Lab submission to the Prompting Large Language Models as Explainable Metrics Shared Task at the Eval4NLP 2023 Workshop on Evaluation & Comparison of NLP Systems. We have proposed a zero-shot prompt-based strategy for explainable evaluation of the summarization task using Large Language Models (LLMs). The conducted experiments demonstrate the promising potential of LLMs as evaluation metrics in Natural Language Processing (NLP), particularly in the field of summarization. Both few-shot and zero-shot approaches are employed in these experiments. The performance of our best provided prompts achieved a Kendall correlation of 0.477 with human evaluations in the text summarization task on the test data.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — prompt-based strategy

🐣 Hot Topic Early Bird — zero-shot prompting

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Ghazaleh Mahmoudi

Topics

Artificial Intelligence > Core AI > Interpretability Machine Learning > Learning Types > Zero-Shot Learning Natural Language Processing > Generation > Summarization Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models

Keywords

prompt engineering text summarization evaluation metric zero-shot prompting human evaluation explainable metrics explainable metric prompt-based strategy large language model

Download PDF

Related papers

On the Use of Language Models for Function Identification of Citations in Scholarly Papers 2023

Team NLLG submission for Eval4NLP 2023 Shared Task: Retrieval-Augmented In-Context Learning for NLG Evaluation 2023

Automatic Translation of Span-Prediction Datasets 2023

PACT: Pretraining with Adversarial Contrastive Learning for Text Classification 2023

VACASPATI: A Diverse Corpus of Bangla Literature 2023