RE-FIN: Retrieval-based Enrichment for Financial data

Lorenzo Malandri; Fabio Mercorio; Mario Mezzanzanica; Filippo Pallucchini

2025 COLING COLING 2025

RE-FIN: Retrieval-based Enrichment for Financial data

Abstract

AbstractEnriching sentences with knowledge from qualitative sources benefits various NLP tasks and enhances the use of labeled data in model training. This is crucial for Financial Sentiment Analysis (FSA), where texts are often brief and contain implied information. We introduce RE-FIN (Retrieval-based Enrichment for FINancial data), an automated system designed to retrieve information from a knowledge base to enrich financial sentences, making them more knowledge-dense and explicit. RE-FIN generates propositions from the knowledge base and employs Retrieval-Augmented Generation (RAG) to augment the original text with relevant information. A large language model (LLM) rewrites the original sentence, incorporating this data. Since the LLM does not create new content, the risk of hallucinations is significantly reduced. The LLM generates multiple new sentences using different relevant information from the knowledge base; we developed an algorithm to select one that best preserves the meaning of the original sentence while avoiding excessive syntactic similarity. Results show that enhanced sentences present lower perplexity than the original ones and improve performances on FSA.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Computer Science and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Lorenzo Malandri , Fabio Mercorio , Mario Mezzanzanica , Filippo Pallucchini

Topics

Computer Science > Applications > Information Retrieval Natural Language Processing > Applications > Sentiment Analysis Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Retrieval-Augmented Generation Natural Language Processing > Generation > Retrieval-Augmented Generation

Keywords

knowledge base hallucination mitigation retrieval-augmented generation text rewriting financial sentiment analysis knowledge enrichment text enrichment

Download PDF

Related papers

Navigating Dialectal Bias and Ethical Complexities in Levantine Arabic Hate Speech Detection 2025

TaCIE: Enhancing Instruction Comprehension in Large Language Models through Task-Centred Instruction Evolution 2025

Positive Text Reframing under Multi-strategy Optimization 2025

RAM2C: A Liberal Arts Educational Chatbot based on Retrieval-augmented Multi-role Multi-expert Collaboration 2025

Two-stage Incomplete Utterance Rewriting on Editing Operation 2025