2026 EACL EACL 2026

PictureStories: Predicting the Task Adherence of Language Learner Answers to a Picture Story-Based Writing Task

Abstract

AbstractWe investigate the automated evaluation of English language learner answers to writing tasks featuring picture stories.This is usually limited to language proficiency only, neglecting the context of the picture. Instead, our analysis focuses on task adherence, which for example allows detection of off-topic answers.Since there is a lack of suitable training and evaluation data, our first step is to build the PictureStories dataset.To this end, we develop a marking rubric that covers task adherence with respect to both form and content. Six annotators mark 713 learner answers written in response to one of five picture stories.Having assembled the dataset, we then explore to what extent task adherence can be predicted automatically. Our experiments assume a scenario where no or just a few labelled answers are available for the picture story which is being marked.For form-focused criteria, we find that it is beneficial to finetune models across tasks.With content-focused criteria, few-shot prompting Qwen emerges as the best-performing method. We examine the trade-off between including the story image vs. example answers in the prompt and find that examples suffice in many cases. While for some LLMs, few-shot prompting results may look promising on the surface, we demonstrate that a much simpler method can do just as well when shown the same examples.

🧭 Keyword Pioneer — task adherence
🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Speech & Audio