LoCt-Instruct: An Automatic Pipeline for Constructing Datasets of Logical Continuous Instructions

Hongyu Sun; Yusuke Sakai; Haruki Sakajo; Shintaro Ozaki; Kazuki Hayashi; Hidetaka Kamigaito; Taro Watanabe

2025 EMNLP EMNLP 2025

LoCt-Instruct: An Automatic Pipeline for Constructing Datasets of Logical Continuous Instructions

Abstract

AbstractContinuous instruction following closely mirrors real-world tasks by requiring models to solve sequences of interdependent steps, yet existing multi-step instruction datasets suffer from three key limitations: (1) lack of logical coherence across turns, (2) narrow topical breadth and depth, and (3) reliance on rigid templates or heavy manual effort. We introduce LoCt-Pipeline, a novel pipeline that leverages modern LLMs’ reasoning capabilities to assemble rich, topic-related single-instruction data into multi-turn dialogues, producing chains that are logically coherent, progressively deepen in content, and span diverse domains without fixed templates or extensive human annotation. We employed this pipeline to construct LoCt-Instruct for assessing models’ problem-solving abilities. The generated chains serve as a testbed for benchmarking a variety of models, including reasoning-oriented architectures, instruction-tuned variants, and state-of-the-art closed-source LLMs on their capacity to follow and correctly respond to each step. Our results reveal a substantial performance gap between current LLMs and human solvers. These findings highlight the need for more robust continuous instruction following. We publicly release the dataset and end-to-end pipeline.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — continuous instruction

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Hongyu Sun , Yusuke Sakai , Haruki Sakajo , Shintaro Ozaki , Kazuki Hayashi , Hidetaka Kamigaito , Taro Watanabe

Topics

Machine Learning > Learning Types > Self-Supervised Learning Natural Language Processing > Generation > Dialogue Systems Natural Language Processing > Generation > Text Generation Artificial Intelligence > Core AI > Large Language Models Machine Learning > Learning Types > Evaluation Machine Learning > Learning Paradigms > Multi-Task Learning Natural Language Processing > Applications > Natural Language Understanding

Keywords

instruction following multi-turn dialogue logical coherence problem solving dataset construction large language model continuous instruction

Download PDF

Related papers

Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework 2025

VoiceCraft-X: Unifying Multilingual, Voice-Cloning Speech Synthesis and Speech Editing 2025

Model-based Large Language Model Customization as Service 2025

ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration 2025

SlideCoder: Layout-aware RAG-enhanced Hierarchical Slide Generation from Design 2025