HIT-MI&T Lab’s Submission to Eval4NLP 2023 Shared Task

Rui Zhang; Fuhai Song; Hui Huang; Jinghao Yuan; Muyun Yang; Tiejun Zhao

2023 AACL AACL 2023

HIT-MI&T Lab’s Submission to Eval4NLP 2023 Shared Task

Abstract

AbstractRecently, Large Language Models (LLMs) have boosted the research in natural language processing and shown impressive capabilities across numerous domains, including machine translation evaluation. This paper presents our methods developed for the machine translation evaluation sub-task of the Eval4NLP 2023 Shared Task. Based on the provided LLMs, we propose a generation-based method as well as a probability-based method to perform evaluation, explore different strategies when selecting the demonstrations for in-context learning, and try different ensemble methods to further improve the evaluation accuracy. The experiment results on the development set and test set demonstrate the effectiveness of our proposed method.

🧭 Keyword Pioneer — probability-based evaluation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Rui Zhang , Fuhai Song , Hui Huang , Jinghao Yuan , Muyun Yang , Tiejun Zhao

Topics

Natural Language Processing > Applications > Machine Translation Natural Language Processing > Resources & Methods > Large Language Models

Keywords

in-context learning ensemble method machine translation evaluation probability-based evaluation large language model

Download PDF

Related papers

We Need to Talk About Classification Evaluation Metrics in NLP 2023

A Novel Dataset Towards Extracting Virus-Host Interactions 2023

Improving Neural Machine Translation with Offline Evaluations 2023

Perplexity-Driven Case Encoding Needs Augmentation for CAPITALIZATION Robustness 2023

Are Machine Reading Comprehension Systems Robust to Context Paraphrasing? 2023