CHI '26 · Best paper · full-paper review · confidence high

“I Don’t Think RAI Applies to My Model” – Engaging Non-champions with Sticky Stories for Responsible AI Work

Nadia Nahar , Chenyang Yang , Yanxin Chen , Wesley Hanwen Deng , Ken Holstein , Motahhare Eslami , Christian Kästner

This paper’s real contribution is not just another RAI artifact, but a reframing of the adoption problem: many practitioners are not persuaded by existing governance tools at all. By designing sticky, tailored harm narratives and showing measurable gains in engagement and harm discovery, the authors offer a credible intervention for the neglected non-champion majority.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: generative knowledge typical · 35/268
Novelty type: tool typical · 14/268
Abstraction level: practice typical · 85/268
Generalization target: user population typical · 75/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

This is a strong and timely CHI contribution because it identifies a practical failure mode in Responsible AI work that many organizations likely recognize but rarely study directly: the people most RAI processes need to influence are often not the already committed champions, but the larger set of practitioners who see these activities as irrelevant, abstract, or bureaucratic. The paper’s conceptual move is therefore important. Rather than assuming better checklists or more governance structure will solve the problem, it treats motivation and attention as the bottleneck. The sticky stories intervention is compelling because it is not merely “use stories”; it operationalizes a specific design theory around five qualities—concrete, severe, surprising, diverse, and relevant—and then builds a compound AI pipeline to generate such stories at scale. That combination of theory-informed design and technical implementation gives the contribution substance beyond a simple design recommendation. The empirical package is also persuasive. The offline evaluation shows the generated stories are meaningfully different from a zero-shot baseline on the intended qualities, while the user study with 29 practitioners connects those design differences to behavioral and reflective outcomes that matter: more time spent, broader harm categories surfaced, and richer critical reflection. The qualitative findings are especially useful because they show how sticky stories appear to move some participants from indifference or resistance toward recognition of overlooked risks. At the same time, the paper is not limitation-free. The authors appropriately acknowledge that some stories can become overly dramatic and less relatable, the generation pipeline is substantially more expensive than a baseline approach, and the follow-up evidence is too limited to support claims about long-term transformation. Those caveats keep the paper from overclaiming. Overall, this looks like a high-value contribution to HCI and RAI practice: it offers a concrete intervention, a plausible mechanism for why it works, and evidence that engagement itself can be designed for rather than assumed.

What Changed

Canon before

Prior research and dominant Responsible AI interventions largely center RAI champions and assume that governance artifacts, checklists, templates, or generic harm stories can motivate practitioners to engage with fairness concerns. A common expectation is that exposure to algorithmic harm narratives should broadly raise awareness across practitioner populations.

Departure from common sense

The paper argues that many non-champion practitioners do not find standard RAI tools or familiar media-style harm stories motivating at all; instead they often see RAI as irrelevant or bureaucratic and engage only superficially. This challenges the intuitive belief that simply providing governance processes or cautionary stories is enough to trigger meaningful ethical reflection.

Actual novelty

The core novelty is the design and evaluation of sticky stories: LLM-generated, practitioner-relevant narratives intentionally optimized to be concrete, severe, surprising, diverse, and relevant. The paper contributes both the intervention concept and a scalable compound AI generation pipeline, then shows in user studies that these stories outperform baseline stories in engaging mostly non-champion practitioners and broadening harm identification and reflection.

Evidence

Evidence comes from a formative organizational study identifying an engagement gap among non-champions, an offline evaluation comparing sticky-story generation against a zero-shot baseline on five target qualities and cost, and a mixed-design user study with 29 practitioners assessing time spent, diversity of harms surfaced, and qualitative reflection. The paper also includes a limited two-month follow-up survey for transparency about possible downstream effects.

“ Drawing on theories from psychology and business communication [44, 70], we developed an LLM-based system that generates scenarios to embody five qualities that are known to drive engagement and memorability: Scenarios should be concrete, severe, surprising, diverse, and relevant”

actual novelty · 1 Introduction · confidence 0.98

“rast: Governance champions actively advanced RAI and designed governance structures, whereas many data scientists remained dismissive and disengaged regarding RAI concerns. Even when required to participate in activities, non-champions checked the boxes with minimal engagement. The key bottleneck was ”

departure from common sense · 1 Introduction · confidence 0.97

“While insufficient to measure long-term transformations (which may require years), this still captures some concrete behavioral and social impact over time, even if modest, indicating whether participants engaged with RAI concepts in their professional communities beyond our study.”

limitation · 5.1.5 Study Protocol. · confidence 0.98

“To evaluate the impact of sticky stories on practitioner engagement, we conducted a mixed-design user study that combined both within-subject and between-subject elements. Unlike prior work that often focuses only on champions, we deliberately sought to include non-champions, and ended up with a range of practitioners with varied levels of RAI motivation. Each ”

validation scope · 5.1 Study Design · confidence 0.97

Limits

Method limits

The paper explicitly notes that the two-month follow-up is insufficient to measure long-term transformations. The mixed-design study only partially controls for learning and ordering effects, and the think-aloud protocol may shape participant behavior despite minimal prompting. The follow-up response count is small and not sufficient for statistical conclusions.

Deployment limits

Sticky stories trade off somewhat on relevance because some generated stories can feel overly dramatic and hard to relate. The generation pipeline is also substantially more resource intensive than the baseline, requiring 5.5x time and 46x token usage, which may limit lightweight deployment.

Boundary conditions

The demonstrated benefits are strongest for non-champion practitioners in harm-identification tasks where stories can be tailored to practitioners’ own ML systems. The paper itself notes little difference among already motivated champions, so the intervention is not equally necessary across all practitioner profiles. Claims about sustained organizational change remain bounded because long-term transformation was not measured.

Position in field

This work shifts Responsible AI research from supporting already motivated champions toward motivating the larger population of non-champions. It combines narrative theory, practitioner-centered design, and LLM-based generation into a practical intervention that complements rather than replaces governance templates and checklists. Its contribution is especially notable in reframing engagement itself as a central bottleneck in operationalizing RAI.

Abstract

Responsible AI (RAI) tools—checklists, templates, and governance processes—often engage RAI champions, individuals intrinsically motivated to advocate ethical practices, but fail to reach non-champions, who frequently dismiss them as bureaucratic tasks. To explore this gap, we shadowed meetings and interviewed data scientists at an organization, finding that practitioners perceived RAI as irrelevant to their work. Building on these insights and theoretical foundations, we derived design principles for engaging non-champions, and introduced sticky stories—narratives of unexpected ML harms designed to be concrete, severe, surprising, diverse, and relevant, unlike widely circulated media to which practitioners are desensitized. Using a compound AI system, we generated and evaluated sticky stories through human and LLM assessments at scale, confirming they embodied the intended qualities. In a study with 29 practitioners, we found that, compared to regular stories, sticky stories significantly increased time spent on harm identification, broadened the range of harms recognized, and fostered deeper reflection.