Evaluating Large Language Models on Controlled Generation Tasks

Jiao Sun; Yufei Tian; Wangchunshu Zhou; Nan Xu; Qian Hu; Rahul Gupta; John Wieting; Nanyun Peng; Xuezhe Ma

2023 EMNLP EMNLP 2023

Evaluating Large Language Models on Controlled Generation Tasks

Abstract

AbstractWhile recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks. We present an extensive analysis of various benchmarks including a sentence planning benchmark with different granularities. After comparing large language models against state-of-the-start finetuned smaller models, we present a spectrum showing large language models falling behind, are comparable, or exceed the ability of smaller models. We conclude that *large language models struggle at meeting fine-grained hard constraints*.

🧭 Keyword Pioneer — generation constraint

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Jiao Sun , Yufei Tian , Wangchunshu Zhou , Nan Xu , Qian Hu , Rahul Gupta , John Wieting , Nanyun Peng , Xuezhe Ma

Topics

Natural Language Processing > Generation > Text Generation Natural Language Processing > Resources & Methods > Large Language Models

Keywords

benchmark evaluation text generation language model controlled generation generation constraint

Download PDF

Related papers

Exploring Linguistic Probes for Morphological Generalization 2023

NameGuess: Column Name Expansion for Tabular Data 2023

Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning 2023

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation 2023

On the Calibration of Large Language Models and Alignment 2023