2023 INTERSPEECH INTERSPEECH 2023

Pragmatic Pertinence: A Learnable Confidence Metric to Assess the Subjective Quality of LM-Generated Text

Abstract

To be perceived as trustworthy, artificially generated text must be sufficiently congruent with the available discourse history. Pre-trained language models (LMs) operating in generative mode are capable of predicting locally coherent phrases, but those do not always reflect salient syntactic, semantic, or pragmatic facets of prior content. This paper introduces a learnable evaluation metric to assess the pragmatic pertinence of LM-generated text for a given history. Pertinence is closely aligned with qualitative human judgments of acceptability, thereby emerging as a blend of sensibleness and specificity. Experiments conducted across different domains using different learning architectures show that this approach circumvents the issue of multiple valid ground-truths, while providing a reliable quantitative ranking of generated text completion candidates in context. Pertinence scoring could thus prove useful for the detection of hallucinations.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning
🧭 Keyword Pioneer — language model generation
🐣 Hot Topic Early Bird — hallucination detection
🐝 Cross-Pollinator — Artificial Intelligence, Computer Vision, Deep Learning, Machine Learning, Natural Language Processing, Speech & Audio