Synthetic Data Generation Using Large Language Models for Financial Question Answering

Chetan Harsha; Karmvir Singh Phogat; Sridhar Dasaratha; Sai Akhil Puranam; Shashishekar Ramakrishna

2025 COLING COLING 2025

Synthetic Data Generation Using Large Language Models for Financial Question Answering

Abstract

AbstractRecent research has shown excellent performance of large language models (LLMs) for answering questions requiring multi-step financial reasoning. While the larger models have been used with zero-shot or few-shot prompting, the smaller variants need fine-tuning on training data containing questions and the corresponding answers that includes detailed reasoning demonstrations. To alleviate the significant cost of creating a data set with complex questions and corresponding answers, we explore the use of synthetic data for financial question answering using a multi-step LLM based approach to generate question as well as the answers with reasoning steps. We consider standard as well as conversational financial question answering scenarios. We experiment with synthetic data generation for three different real financial reasoning problems that already have manually collected data sets created with the help of financial experts. Using the same document sources, we use the proposed LLM based approach to generate synthetic questions and answers. To measure the effectiveness, we train multiple small language models (SLMs) on these synthetic data and compare the performance with that of the same SLMs trained on the real data. We further perform extensive experimental analysis generating important evidence on the potential of using synthetic data in financial reasoning tasks.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Chetan Harsha , Karmvir Singh Phogat , Sridhar Dasaratha , Sai Akhil Puranam , Shashishekar Ramakrishna

Topics

Natural Language Processing > Applications > Question Answering Machine Learning > Learning Types > Transfer Learning Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Synthetic Data Generation

Keywords

question answering synthetic data generation few-shot prompting financial question answering multi-step reasoning financial reasoning large language model

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025