Building Resource-Constrained Language Agents: A Korean Case Study on Chemical Toxicity Information

Hojun Cho; Donghu Kim; Soyoung Yang; Chan Lee; Hunjoo Lee; Jaegul Choo

2025 EMNLP EMNLP 2025

Building Resource-Constrained Language Agents: A Korean Case Study on Chemical Toxicity Information

Abstract

AbstractLanguage agents powered by large language models (LLMs) face significant deployment challenges in resource-constrained environments, particularly for specialized domains and less-common languages. This paper presents Tox-chat, a Korean chemical toxicity information agent devised within these limitations. We propose two key innovations: a context-efficient architecture that reduces token consumption through hierarchical section search, and a scenario-based dialogue generation methodology that effectively distills tool-using capabilities from larger models. Experimental evaluations demonstrate that our fine-tuned 8B parameter model substantially outperforms both untuned models and baseline approaches, in terms of DB faithfulness and preference. Our work offers valuable insights for researchers developing domain-specific language agents under practical constraints.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Deep Learning and Machine Learning and Natural Language Processing

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hojun Cho , Donghu Kim , Soyoung Yang , Chan Lee , Hunjoo Lee , Jaegul Choo

Topics

Artificial Intelligence > Core AI > Agent Systems Natural Language Processing > Applications > Information Retrieval Natural Language Processing > Resources & Methods > Large Language Models Artificial Intelligence > Core AI > Large Language Models Natural Language Processing > Applications > Dialogue Systems Machine Learning > Learning Types > Domain Adaptation Machine Learning > Learning Types > Knowledge Distillation Deep Learning > Learning Types > Knowledge Distillation

Keywords

domain adaptation knowledge distillation dialogue generation hierarchical search dialogue system language agent large language model token consumption

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025