FOCUS: A Benchmark for Targeted Socratic Question Generation via Source-Span Grounding

Surawat Pothong; Machi Shimmei; Naoya Inoue; Paul Reisert; Ana Brassard; Wenzhi Wang; Shoichi Naito; Jungmin Choi; Kentaro Inui

2025 AACL AACL 2025

FOCUS: A Benchmark for Targeted Socratic Question Generation via Source-Span Grounding

Abstract

AbstractWe present FOCUS, a benchmark and task setting for Socratic question generation that delivers more informative and targeted feedback to learners. Unlike prior datasets, which rely on broad typologies and lack grounding in the source text, FOCUS introduces a new formulation: each Socratic question is paired with a fine-grained, 11-type typology and an explicit source span from the argument it targets. This design supports clearer, more actionable feedback and facilitates interpretable model evaluation. FOCUS includes 440 annotated instances with moderate partial-match agreement, establishing it as a reliable benchmark. Baseline experiments with representative state-of-the-art models reveal, through detailed error analysis, that even strong models struggle with span selection and context-sensitive categories. An extension study on the LogicClimate dataset further confirms the generalizability of the task and annotation framework. FOCUS sets a new standard for pedagogically grounded and informative Socratic question generation.

🌉 Interdisciplinary Bridge — Artificial Intelligence and Machine Learning and Natural Language Processing

🧭 Keyword Pioneer — socratic question generation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Speech & Audio

Authors

Surawat Pothong , Machi Shimmei , Naoya Inoue , Paul Reisert , Ana Brassard , Wenzhi Wang , Shoichi Naito , Jungmin Choi , Kentaro Inui

Topics

Artificial Intelligence > Core AI > Human-AI Interaction Machine Learning > Learning Types > Weakly Supervised Learning Natural Language Processing > Applications > Question Generation

Keywords

benchmark dataset educational feedback socratic question generation source-span grounding question typology

Download PDF

Related papers

Judging the Judges: A Systematic Study of Position Bias in LLM-as-a-Judge 2025

Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems 2025

Enhancing Training Data Quality through Influence Scores for Generalizable Classification: A Case Study on Sexism Detection 2025

CtrlShift: Steering Language Models for Dense Quotation Retrieval with Dynamic Prompts 2025

A Diagnostic Framework for Auditing Reference-Free Vision-Language Metrics 2025