Evaluating Financial Literacy of Large Language Models through Domain Specific Languages for Plain Text Accounting

Alexei Gustavo Figueroa Rosero; Paul Grundmann; Julius Freidank; Wolfgang Nejdl; Alexander Loeser

2025 COLING COLING 2025

Evaluating Financial Literacy of Large Language Models through Domain Specific Languages for Plain Text Accounting

Abstract

AbstractLarge language models (LLMs) have proven highly effective for a wide range of tasks, including code generation. Recently, advancements in their capabilities have shown promise in areas like mathematical reasoning, chain-of-thought processes and self-reflection. However, their effectiveness in domains requiring nuanced understanding of financial contexts, such as accounting, remains unclear. In this study, we evaluate how well LLMs perform in generating code for domain-specific languages (DSLs) in accounting, using Beancount as a case study. We create a set of tasks based on common financial ratios, to evaluate the numeracy and financial literacy of LLMs. Our findings reveal that while LLMs are state-of-the art in generative tasks, they struggle severely with accounting, often producing inaccurate calculations and misinterpreting financial scenarios. We characterize these shortcomings through a comprehensive evaluation, shedding light on the limitations of LLMs in understanding and handling money-related tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Interdisciplinary and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Alexei Gustavo Figueroa Rosero , Paul Grundmann , Julius Freidank , Wolfgang Nejdl , Alexander Loeser

Topics

Natural Language Processing > Generation > Text Generation Interdisciplinary > Social > Education Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Resources & Methods > Language Modeling Machine Learning > Learning Types > Evaluation

Keywords

text generation code generation domain-specific language financial literacy large language model

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025