Tracking the evolution of LLM capabilities for Belarusian with OpenAI Evals

Vladislav Poritski; Oksana Volchek; Maksim Aparovich; Volha Harytskaya; Pavel Smrz

2026 EACL EACL 2026

Tracking the evolution of LLM capabilities for Belarusian with OpenAI Evals

Abstract

AbstractWe examine how the capabilities of large language models (LLMs) have evolved on eight Belarusian language tasks contributed in 2023 to OpenAI’s Evals framework. We evaluate state-of-the-art models both on the original development sets and newly created test sets. Results demonstrate significant but non-uniform progress over this period: some tasks are almost saturated, while others show minor improvement beyond trivial baselines. Error analysis shows that certain challenges haven’t yet been addressed, e.g. misidentification of non-words as legitimate vocabulary, or conversion from modern to classical orthography. We release the datasets and the generated completions (https://doi.org/10.5281/zenodo.18163825).

🌉 Interdisciplinary Bridge — Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — openai eval

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Security & Privacy, Speech & Audio

Authors

Vladislav Poritski , Oksana Volchek , Maksim Aparovich , Volha Harytskaya , Pavel Smrz

Topics

Natural Language Processing > Resources & Methods > Multilingual NLP Machine Learning > Optimization & Theory > Evaluation

Keywords

llm evaluation low-resource language performance tracking belarusian language openai eval

Download PDF

Related papers

Investigating Gender Stereotypes in Large Language Models via Social Determinants of Health 2026

A Benchmark for Audio Reasoning Capabilities of Multimodal Large Language Models 2026

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 2026

Generative Personality Simulation via Theory-Informed Structured Interview 2026

Word Surprisal Correlates with Sentential Contradiction in LLMs 2026