DeFuzzRAG: Handling Fuzzy Time Expressions for Temporal Robustness in Retrieval-Augmented Generation
Abstract
Abstract Large Language Models (LLMs) have achieved remarkable success across reasoning and knowledge-intensive tasks, yet their static pretraining leaves them unable to handle rapidly evolving or domain-specific knowledge. Retrieval-Augmented Generation (RAG) addresses this by grounding LLM outputs in dynamically retrieved evidence, improving factual accuracy and reducing hallucinations. However, standard RAG pipelines struggle with temporally sensitive queries, especially when documents contain fuzzy or indirect time expressions (e.g., “a few years later”). This leads to Temporal Misalignment, where topically relevant but temporally incorrect results are retrieved. To overcome this, we propose DeFuzzRAG, a lightweight framework that enhances temporal robustness in RAG. DeFuzzRAG employs a small local language model to infer concrete time scopes from vague expressions and applies metadata-based filtering to realign retrieval with the query’s temporal intent. Experiments on a benchmark of fuzzified queries demonstrate that DeFuzzRAG substantially improves retrieval accuracy, raising Hit Rate by 15.7% while maintaining efficiency and model-agnostic integration. Our findings highlight the importance of temporal reasoning in RAG and establish DeFuzzRAG as a practical, plug-and-play solution for deploying temporally robust LLM systems in real-world settings.