DharmaBench: Evaluating Language Models on Buddhist Texts in Sanskrit and Tibetan

Kai Golan Hashiloni; Shay Cohen; Asaf Shina; Jingyi Yang; Orr Meir Zwebner; Nicola Bajetta; Guy Bilitski; Rebecca Sundén; Guy Maduel; Ryan Conlon; Ari Barzilai; Daniel Mass; Shanshan Jia; Aviv Naaman; Sonam Choden; Sonam Jamtsho; Yadi Qu; Harunaga Isaacson; Dorji Wangchuk; Shai Fine; Orna Almogi; Kfir Bar

2025 IJCNLP IJCNLP 2025

DharmaBench: Evaluating Language Models on Buddhist Texts in Sanskrit and Tibetan

Abstract

AbstractWe assess the capabilities of large language models on tasks involving Buddhist texts written in Sanskrit and Classical Tibetan—two typologically distinct, low-resource historical languages. To this end, we introduce DharmaBench, a benchmark suite comprising 13 classification and detection tasks grounded in Buddhist textual traditions: six in Sanskrit and seven in Tibetan, with four shared across both. The tasks are curated from scratch, tailored to the linguistic and cultural characteristics of each language. We evaluate a range of models, from proprietary systems like GPT-4o to smaller, domain-specific open-weight models, analyzing their performance across tasks and languages. All datasets and code are publicly released, under the CC-BY-4 License and the Apache-2.0 License respectively, to support research on historical language processing and the development of culturally inclusive NLP systems.

👥 Mega-Team — 22 authors

🌉 Interdisciplinary Bridge — Interdisciplinary and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Topics

Machine Learning > Application Areas > Domain Adaptation Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Large Language Models Natural Language Processing > Resources & Methods > Multilingual NLP Interdisciplinary > Linguistics > Computational Linguistics

Keywords

text classification language model evaluation sanskrit language low-resource language multilingual evaluation historical language historical language processing buddhist text tibetan language

Download PDF

Cold Starts and Hard Cases: A Two-Stage SFT-RLVR Approach for Legal Machine Translation (Just-NLP L-MT shared task) 2025

Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective 2025

MELAC: Massive Evaluation of Large Language Models with Alignment of Culture in Persian Language 2025

From Anger to Joy: How Nationality Personas Shape Emotion Attribution in Large Language Models 2025

DharmaBench: Evaluating Language Models on Buddhist Texts in Sanskrit and Tibetan

Abstract

Authors

Topics

Keywords

Related papers