TLUE: A Tibetan Language Understanding Evaluation Benchmark

Fan Gao; Cheng Huang; Yutong Liu; Nyima Tashi; Xiangxiang Wang; Thupten Tsering; Ban Ma-bao; Renzeng Duojie; Gadeng Luosang; Rinchen Dongrub; Dorje Tashi; Xiao Feng Cd; Yongbin Yu; Hao Wang

2025 EMNLP EMNLP 2025

TLUE: A Tibetan Language Understanding Evaluation Benchmark

Abstract

AbstractLarge language models have made tremendous progress in recent years, but low-resource languages, like Tibetan, remain significantly underrepresented in their evaluation. Despite Tibetan being spoken by over seven million people, it has largely been neglected in the development and assessment of LLMs. To address this gap, we present a Tibetan Language Understanding Evaluation Benchmark, TLUE, which is also the first large-scale benchmark for measuring the proficiency of large language models in the Tibetan language. TLUE comprises two major components: a comprehensive multi-task understanding benchmark spanning 5 domains and 67 subdomains, and a safety benchmark encompassing 7 subdomains. Finally, we evaluate a diverse set of state-of-the-art LLMs. Experimental results demonstrate that most large language models perform below the random baseline, especially highlighting the considerable challenges they face in Tibetan language processing. TLUE provides a crucial foundation for advancing future research in Tibetan language understanding and highlights the importance of promoting greater inclusivity in the development of large language models.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Fan Gao , Cheng Huang , Yutong Liu , Nyima Tashi , Xiangxiang Wang , Thupten Tsering , Ban Ma-bao , Renzeng Duojie , Gadeng Luosang , Rinchen Dongrub , Dorje Tashi , Xiao Feng Cd , Yongbin Yu , Hao Wang

Topics

Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Evaluation Machine Learning > Learning Types > Multi-Lingual Learning

Keywords

benchmark evaluation multilingual nlp llm evaluation language understanding low-resource language evaluation benchmark multilingual evaluation multilingual natural language processing large language model language benchmark tibetan language

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025