CHI '26 · Honorable mention · full-paper review · confidence medium-high

Will They Try Again? A Large-Scale RCT on Scaffolds that Support Persistence in an Intelligent Tutoring System

Michael W Asher , Yumou Wei , Adam Daniel Reynolds , Amy Ogan , Paulo F. Carvalho

This is a strong CHI-scale field RCT with a clear behavioral outcome and unusually large sample size. The main value is empirical: it shows that a prompt and a nudge can work together rather than cancel each other out, but the claims should remain anchored to the tested tutoring context and subject areas.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: causal knowledge typical · 31/268
Novelty type: empirical finding typical · 68/268
Abstraction level: system typical · 61/268
Generalization target: user population typical · 75/268
Validation mode: controlled experiment typical · 47/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

This paper is best read as a high-confidence empirical contribution to persuasive design in educational systems rather than as a new interaction primitive. The central idea is straightforward but important: instead of assuming that an explicit motivational prompt and an interface-level default nudge are substitutes, the authors test whether they can be complementary. That is a meaningful departure from a common-sense “one intervention should be enough” intuition, and the paper backs it with a very large randomized controlled trial in a real intelligent tutoring system. The scale is a major strength: 164,532 students and 17 million practice problems is far beyond the typical CHI evaluation, and it gives the results unusual weight for a behavioral intervention paper. The evidence also supports a fairly narrow but solid novelty claim: the paper does not invent a new theory of persistence, but it does provide causal evidence that two familiar persuasive mechanisms can be combined and that their effects are additive rather than redundant. The limitations matter, though. The prompt was delivered probabilistically to reduce fatigue, so the observed effect is not a clean always-on treatment effect. And the empirical setting is confined to mathematics and science learning, which means the generalization target should stay close to digital tutoring contexts with similar retry decisions. Overall, this is a strong, well-validated CHI paper with a clear practical takeaway, but its strongest contribution is the scale and rigor of the causal evidence, not broad theoretical generalization.

What Changed

Canon before

Prior CHI work on persistence scaffolds and persuasive design often treats prompts and nudges as alternative interventions; this paper tests whether they can be combined without redundancy in a real tutoring system.

Departure from common sense

The paper’s non-obvious move is to treat an interface-level default nudge and an explicit persuasive prompt as potentially complementary rather than redundant. That is a departure from the common-sense expectation that one strong cue should make the other unnecessary.

Actual novelty

The paper’s novelty is a large-scale causal test of whether implicit nudges and explicit persuasive prompts can be combined to increase persistence after failure in a real-world intelligent tutoring system, with evidence that their effects are additive rather than overlapping.

Evidence

The paper reports a randomized controlled trial in Siyavula’s math and science intelligent tutoring system with 164,532 students in Grades 8-12 and 17 million practice problems. It compares a brief persuasive prompt and a visual default nudge, alone and in combination, to measure persistence after failure. The discussion and limitations sections also note probabilistic prompt delivery and subject-area scope.

“ First, we present a large-scale, causal investigation about whether implicit (nudges) versus explicit (persuasive prompts) approaches to persuasive design can combine to increase student persistence after failure in a real-world, digital learning environment”

actual novelty · Abstract / Introduction contributions · confidence 0.70

“ This asymmetry suggests that the prompt’s influence may be somewhat less dependent on its continued presence than that of the nudge, which appeared to have more transient, context-bound effect”

departure from common sense · Section 3 (The Present Research: Combining Persuasive Prompts with Default Nudges) · confidence 0.55

“ Second, while we have argued for the generalizability of our findings, our empirical work was conducted entirely within mathematics and science learnin”

limitation · Section 6.3 (Limitations and next steps) · confidence 0.82

“ We conducted a randomized controlled trial in an intelligent tutoring system for math and science, involving 164,532 students (Grades 8-12) who completed 17 million practice problems”

validation scope · Abstract; Section 4.2 (RCT design) and Section 5.1 (RQ1/RQ2) · confidence 0.80

Limits

Method limits

The prompt was delivered probabilistically to reduce fatigue, which complicates interpretation of the prompt’s full-dose effect and may attenuate estimates. The study is also limited to math and science learning, so heterogeneity across other domains remains untested.

Deployment limits

The interventions were evaluated in one intelligent tutoring system context, so deployment claims should stay close to digital learning environments with similar retry mechanics and failure moments. The prompt’s probabilistic delivery also means operational performance may differ under always-on deployment.

Boundary conditions

The paper itself notes that the prompt was probabilistic and that the empirical work was conducted entirely within mathematics and science learning. Effects may vary by problem type, subject, and student characteristics, and the additive pattern may not hold when retry choices are structured differently.

Position in field

This paper sits at the intersection of persuasive design, educational technology, and large-scale field experimentation. Its main contribution is not a new interface primitive but evidence that two familiar intervention families can be combined in a high-scale tutoring setting without simple redundancy.

Abstract

Persistence after failure is critical for learning—but when students make mistakes in intelligent tutoring systems, they often choose not to try again. How can digital platforms encourage students to persist at these moments? We conducted a randomized controlled trial in an intelligent tutoring system for math and science, involving 164,532 students (Grades 8-12) who completed 17 million practice problems. We tested two scalable interventions: a brief persuasive prompt encouraging students to try again, and a visual default nudge that highlighted the retry option. Both interventions increased persistence after failure, and when combined, their effects were additive—suggesting they operate through distinct psychological mechanisms. The nudge had a much larger immediate effect, but the prompt showed proportionally greater spillover to untreated problems. These findings advance theories of persuasive design, demonstrating that implicit, interface-level nudges and explicit motivational prompts can be combined to avoid redundancy while amplifying impact.