2024
EMNLP
EMNLP 2024
CUTE: Measuring LLMs’ Understanding of Their Tokens
Abstract
AbstractLarge Language Models (LLMs) show remarkable performance on a wide variety of tasks. Most LLMs split text into multi-character tokens and process them as atomic units without direct access to individual characters. This raises the question: To what extent can LLMs learn orthographic information? To answer this, we propose a new benchmark, CUTE, which features a collection of tasks designed to test the orthographic knowledge of LLMs. We evaluate popular LLMs on CUTE, finding that most of them seem to know the spelling of their tokens, yet fail to use this information effectively to manipulate text, calling into question how much of this knowledge is generalizable.
🌉
Interdisciplinary Bridge
— Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing
🧭
Keyword Pioneer
— token understanding
🐝
Cross-Pollinator
— Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio
Authors
Topics
Artificial Intelligence > Core AI > Interpretability
Natural Language Processing > Resources & Methods > Large Language Models
Natural Language Processing > Resources & Methods > Text Representation
Artificial Intelligence > Core AI > Large Language Models
Deep Learning > Models > Large Language Models
Machine Learning > Optimization & Theory > Evaluation
Machine Learning > Learning Types > Evaluation