← CHI 2026 map

CHI '26 · Best paper · full-paper review · confidence high

When Scaffolding Breaks: Investigating Student Interaction with LLM-Based Writing Support in Real-Time K-12 EFL Classrooms

Junho Myung , Hyunseung Lim , Hana Oh , Hyoungwook Jin , Nayeon Kang , So-Yeon Ahn , Hwajung Hong , Alice Oh , Juho Kim

A strong field deployment showing that LLM scaffolding helps grammar but can also demotivate weaker students, increase reliance, and distort teacher attention and classroom equity in real-time school settings. Its value lies in showing that classroom AI changes social organization, not just individual writing performance.


Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form
descriptive knowledge typical · 92/268
Novelty type
empirical finding typical · 68/268
Abstraction level
practice typical · 85/268
Generalization target
user population typical · 75/268
Validation mode
field deployment typical · 9/268

Evidence profile

Evidence strength
strong typical · 158/268
Claim alignment
strong typical · 231/268
Overclaim risk
low typical · 53/268

Review Summary

This paper is best read as a careful empirical correction to optimistic assumptions about classroom LLM scaffolding. Its contribution is not merely that an LLM can support writing in school, which many readers would already expect, but that the same step-by-step support can function very differently across proficiency groups once it is embedded in a live, time-constrained classroom. The deployment scale matters: 157 eighth-grade students over six weeks, thousands of interaction pairs, qualitative analysis of student queries and affective responses, and naturalistic classroom observation. That combination gives the paper stronger ecological validity than many prior studies of AI writing support that rely on asynchronous use, adult learners, or tightly controlled tasks. The most important finding is the asymmetry in who benefits and who pays the cost. Higher-performing students appear to use the system to offload lower-order tasks such as vocabulary translation or correctness checking, preserving room for their own higher-order composition work. Lower-performing students, by contrast, more often outsource sentence generation and become frustrated by iterative hints, especially under time pressure. That makes the paper valuable not only as a descriptive study of usage patterns but also as a warning that scaffolding can break when the learner lacks enough prior knowledge to productively use indirect guidance. The classroom observations deepen the contribution by showing that the tool also reshapes participation and teacher awareness: some students rely on the system instead of speaking up, extroverted students still dominate teacher attention, and AI-polished outputs can hide who is actually struggling. The authors also appropriately bound their claims through a real limitation section covering the single-school Korean context, sampled qualitative logs, and limited access to direct student self-report. Overall, this is a high-value CHI paper because it reframes LLM classroom support as a socio-technical intervention whose effects depend on proficiency, timing, and classroom dynamics, not just model capability.

What Changed

Canon before

The common assumption is that LLM-based scaffolding in K-12 classrooms will uniformly benefit learners by providing step-by-step guidance, which leads to improved engagement and learning outcomes without significant drawbacks.

Departure from common sense

Contrary to the assumption that step-by-step scaffolding by LLMs is universally beneficial, the paper shows that such scaffolding can demotivate lower-proficiency students and increase their reliance on the system, especially within real-time, time-constrained classroom settings.

Actual novelty

This work presents a large-scale real-time deployment and analysis of an LLM-based scaffolding system in K-12 EFL classrooms, revealing nuanced interaction differences across proficiency levels, identifying unintended negative consequences on motivation and classroom dynamics, and proposing design guidelines for equitable and effective integration.

Evidence

The evidence combines a six-week classroom deployment with 157 eighth-grade students, analysis of 14,863 interaction utterances, qualitative coding of interaction patterns, classroom observations, and explicit limitation statements that bound generalization. Together these support claims about differential use by proficiency, motivational downsides, and classroom equity effects.

“rface. We deployed WriteAid with eighth-grade students over a six-week period, during which they completed two separate writing tasks. A total of 157 students used the system, and 133 of them consented to share their data for analysis. As a result, we collected 14,863 query-response pairs with a total of 3,733 conversation threads from student-LLM interactions”

actual novelty · 1 Introduction · confidence 0.98

“• We identify limitations in LLM scaffolding, revealing that for lower-proficiency students in time-constrained classrooms, it can be counterproductive by fostering dependency and demotivation.”

departure from common sense · 1 Introduction · confidence 0.97

“This study presents several limitations and opportunities for future research. The study was conducted with eighth-grade students at a single public middle school in South Korea. Therefore, the findings may not be generalizable to students of different age groups (e”

limitation · 7 Limitation · confidence 0.98

“To complement the system logs and written outputs, we conducted a naturalistic observation study [4] to better understand how students engaged with WriteAid during in-class writing activities. Observations were carried out across three class sessions (45 minutes each).”

validation scope · 4.3 Observation Study · confidence 0.93

Limits

Method limits

The study was conducted in one South Korean middle school with eighth-grade EFL learners; qualitative analysis covered only a 9.3% random sample of interaction logs; and the authors could not collect detailed student satisfaction surveys or interviews because that would have disrupted classroom activity.

Deployment limits

Findings are specific to real-time middle-school EFL writing classrooms in a Korean curriculum context and may not transfer to other ages, subjects, languages, school settings, or LLM models.

Boundary conditions

Effects vary by proficiency level, classroom time pressure, and the local Korean-English curriculum context; the system was designed for teacher-supervised in-class writing rather than open-ended or asynchronous use.

Position in field

This paper fills a gap in CHI and educational AI research by moving beyond asynchronous or adult-focused LLM studies to a real-time K-12 classroom deployment, showing both learning benefits and social costs of scaffolding.

Abstract