Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models

Seonmin Koo; JINSUNG KIM; Chanjun Park; Heuiseok Lim

2024 EMNLP EMNLP 2024

Search if you don’t know! Knowledge-Augmented Korean Grammatical Error Correction with Large Language Models

Abstract

AbstractGrammatical error correction (GEC) system is a practical task used in the real world, showing high achievements alongside the development of large language models (LLMs). However, these achievements have been primarily obtained in English, and there is a relative lack of performance for non-English data, such as Korean. We hypothesize that this insufficiency occurs because relying solely on the parametric knowledge of LLMs makes it difficult to thoroughly understand the given context in the Korean GEC. Therefore, we propose a Knowledge-Augmented GEC (KAGEC) framework that incorporates evidential information from external sources into the prompt for the GEC task. KAGEC first extracts salient phrases from the given source and retrieves non-parametric knowledge based on these phrases, aiming to enhance the context-aware generation capabilities of LLMs. Furthermore, we conduct validations for fine-grained error types to identify those requiring a retrieval-augmented manner when LLMs perform Korean GEC. According to experimental results, most LLMs, including ChatGPT, demonstrate significant performance improvements when applying KAGEC.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Seonmin Koo , JINSUNG KIM , Chanjun Park , Heuiseok Lim

Topics

Artificial Intelligence > Learning Paradigms > Transfer Learning Natural Language Processing > Applications > Text Classification Natural Language Processing > Resources & Methods > Multilingual NLP Artificial Intelligence > Core AI > Large Language Models Deep Learning > Learning Types > Retrieval-Augmented Generation Natural Language Processing > Applications > Text Processing

Keywords

grammatical error correction retrieval-augmented generation knowledge retrieval external knowledge korean language knowledge augmentation large language model

Download PDF

Related papers

EmbodiedBERT: Cognitively Informed Metaphor Detection Incorporating Sensorimotor Information 2024

Mitigating Matthew Effect: Multi-Hypergraph Boosted Multi-Interest Self-Supervised Learning for Conversational Recommendation 2024

Learning to Extract Structured Entities Using Language Models 2024

Towards Understanding Jailbreak Attacks in LLMs: A Representation Space Analysis 2024

CSSL: Contrastive Self-Supervised Learning for Dependency Parsing on Relatively Free Word Ordered and Morphologically Rich Low Resource Languages 2024