Will They Try Again? A Large-Scale RCT on Scaffolds that Support Persistence in an Intelligent Tutoring System
This is a strong CHI-scale field RCT with a clear behavioral outcome and unusually large sample size. The main value is empirical: it shows that a prompt and a nudge can work together rather than cancel each other out, but the claims should remain anchored to the tested tutoring context and subject areas.
Axes Lens
Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.
Contribution shape
- Knowledge form
- causal knowledge typical · 31/268
- Novelty type
- empirical finding typical · 68/268
- Abstraction level
- system typical · 61/268
- Generalization target
- user population typical · 75/268
- Validation mode
- controlled experiment typical · 47/268
Evidence profile
- Evidence strength
- strong typical · 158/268
- Claim alignment
- strong typical · 231/268
- Overclaim risk
- medium typical · 210/268
Review Summary
This paper is best read as a high-confidence empirical contribution to persuasive design in educational systems rather than as a new interaction primitive. The central idea is straightforward but important: instead of assuming that an explicit motivational prompt and an interface-level default nudge are substitutes, the authors test whether they can be complementary. That is a meaningful departure from a common-sense “one intervention should be enough” intuition, and the paper backs it with a very large randomized controlled trial in a real intelligent tutoring system. The scale is a major strength: 164,532 students and 17 million practice problems is far beyond the typical CHI evaluation, and it gives the results unusual weight for a behavioral intervention paper. The evidence also supports a fairly narrow but solid novelty claim: the paper does not invent a new theory of persistence, but it does provide causal evidence that two familiar persuasive mechanisms can be combined and that their effects are additive rather than redundant. The limitations matter, though. The prompt was delivered probabilistically to reduce fatigue, so the observed effect is not a clean always-on treatment effect. And the empirical setting is confined to mathematics and science learning, which means the generalization target should stay close to digital tutoring contexts with similar retry decisions. Overall, this is a strong, well-validated CHI paper with a clear practical takeaway, but its strongest contribution is the scale and rigor of the causal evidence, not broad theoretical generalization.
What Changed
Canon before
Prior CHI work on persistence scaffolds and persuasive design often treats prompts and nudges as alternative interventions; this paper tests whether they can be combined without redundancy in a real tutoring system.
Departure from common sense
The paper’s non-obvious move is to treat an interface-level default nudge and an explicit persuasive prompt as potentially complementary rather than redundant. That is a departure from the common-sense expectation that one strong cue should make the other unnecessary.
Actual novelty
The paper’s novelty is a large-scale causal test of whether implicit nudges and explicit persuasive prompts can be combined to increase persistence after failure in a real-world intelligent tutoring system, with evidence that their effects are additive rather than overlapping.
Evidence
The paper reports a randomized controlled trial in Siyavula’s math and science intelligent tutoring system with 164,532 students in Grades 8-12 and 17 million practice problems. It compares a brief persuasive prompt and a visual default nudge, alone and in combination, to measure persistence after failure. The discussion and limitations sections also note probabilistic prompt delivery and subject-area scope.
“ First, we present a large-scale, causal investigation about whether implicit (nudges) versus explicit (persuasive prompts) approaches to persuasive design can combine to increase student persistence after failure in a real-world, digital learning environment”
actual novelty · Abstract / Introduction contributions · confidence 0.70
“ This asymmetry suggests that the prompt’s influence may be somewhat less dependent on its continued presence than that of the nudge, which appeared to have more transient, context-bound effect”
departure from common sense · Section 3 (The Present Research: Combining Persuasive Prompts with Default Nudges) · confidence 0.55
“ Second, while we have argued for the generalizability of our findings, our empirical work was conducted entirely within mathematics and science learnin”
limitation · Section 6.3 (Limitations and next steps) · confidence 0.82
“ We conducted a randomized controlled trial in an intelligent tutoring system for math and science, involving 164,532 students (Grades 8-12) who completed 17 million practice problems”
validation scope · Abstract; Section 4.2 (RCT design) and Section 5.1 (RQ1/RQ2) · confidence 0.80
Limits
Method limits
The prompt was delivered probabilistically to reduce fatigue, which complicates interpretation of the prompt’s full-dose effect and may attenuate estimates. The study is also limited to math and science learning, so heterogeneity across other domains remains untested.
Deployment limits
The interventions were evaluated in one intelligent tutoring system context, so deployment claims should stay close to digital learning environments with similar retry mechanics and failure moments. The prompt’s probabilistic delivery also means operational performance may differ under always-on deployment.
Boundary conditions
The paper itself notes that the prompt was probabilistic and that the empirical work was conducted entirely within mathematics and science learning. Effects may vary by problem type, subject, and student characteristics, and the additive pattern may not hold when retry choices are structured differently.
Position in field
This paper sits at the intersection of persuasive design, educational technology, and large-scale field experimentation. Its main contribution is not a new interface primitive but evidence that two familiar intervention families can be combined in a high-scale tutoring setting without simple redundancy.