CHI '26 · Honorable mention · full-paper review · confidence medium-high

Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models

Sieun Kim , Yeeun Jo , Sungmin Na , Hyunseung Lim , Eunchae Lee , Yu Min Choi , Soohyun Cho , Hwajung Hong

This is a strong CHI-style empirical paper because it does not just advocate participation; it shows the double edge of involving stereotype targets in red-teaming. The contribution is credible and timely: participants gain strategic expertise and agency, but the study also documents real psychological costs and clear limits on generalization.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: descriptive knowledge typical · 92/268
Novelty type: empirical finding typical · 68/268
Abstraction level: practice typical · 85/268
Generalization target: user population typical · 75/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

This paper’s main strength is that it treats participatory red-teaming as a genuinely ambivalent practice rather than a simple inclusion win. The abstract and findings support a nuanced empirical contribution: people stigmatized by stereotypes can leverage lived discrimination as strategic expertise for eliciting harmful model behavior, yet the same process can increase distress, negative affect, stigma consciousness, and lower collective self-esteem. That combination makes the paper feel important in CHI terms because it speaks to both the promise and the ethical burden of involving affected communities in AI evaluation. The novelty is not a new algorithm or interface artifact; it is a mixed-methods empirical account that reframes stereotype targets as both knowledgeable evaluators and vulnerable participants. The validation is appropriately scoped for that claim: a single study with 20 participants, short-session red-teaming, and pre/post psychological measures. The authors are also explicit about the limits: one cultural context, small N, no longitudinal follow-up, and no control condition to isolate identity-targeted effects. That honesty strengthens the paper rather than weakening it. My only caution is that the broader normative implication—how and when to operationalize this approach safely—still depends on context, and the paper should not be read as proving that participatory red-teaming is broadly beneficial. It shows that the practice can surface valuable harms and empower participants, but only under conditions where ethical safeguards are taken seriously and where researchers accept that psychological costs are part of the design space, not an edge case.

What Changed

Canon before

Participatory red-teaming is often discussed as a way to include affected communities in evaluating harms, but the paper positions stereotype targets as uniquely able to detect subtle bias while also being vulnerable to psychological harm.

Departure from common sense

The paper argues that targets of stereotyping should be involved as red-teamers because their lived experience makes subtle harms more detectable, even though this means asking stigmatized people to confront discriminatory content and potential psychological risk.

Actual novelty

The paper’s contribution is an empirical mixed-methods account showing that participatory red-teaming can convert lived discrimination into strategic expertise for bias detection, while also documenting psychological costs and a sense of empowerment among participants.

Evidence

The paper reports a single mixed-methods study with 20 participants, a 45-minute red-teaming task, pre/post psychological surveys, and interviews. Quantitatively, distress, negative affect, stigma consciousness increased and collective self-esteem decreased, while individual self-esteem and positive affect stayed stable. Qualitatively, participants turned lived discrimination into prompting expertise and also described hurt, shame, and empowerment. The discussion and limitation sections explicitly frame the work as bounded by small N, one cultural context, no longitudinal follow-up, and no control condition.

“ensive content. Red-teaming—where adversarial prompts are crafted to expose harmful behaviors and assess risks—offers a dynamic approach to surfacing underlying stereotypical bias in large language mod”

actual novelty · Abstract · confidence 0.97

“ Information & Contributors Bibliometrics & Citations Reading Options References Figures Tables Media Share Abstrac”

departure from common sense · Abstract · confidence 0.96

“ilton, Emily Tseng, Jina Suh, Lama Ahmad, Ram Shankar Siva Kumar, Julian Posada, Benjamin Shestakofsky, et al. 2024. The human factor in ai red teaming: Perspectives from social and collaborative computing. In Companion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing”

limitation · Section 7.5 Limitation and Future Work · confidence 0.99

“re Abstract Warning : This article contains stereotypical and offensive content. Red-teaming—where adversarial prompts are crafted to expose harmful behaviors and assess risks—offers a dynamic approach to surfacing underlying stereotypical bias in large language mod”

validation scope · Abstract and Method · confidence 0.98

Limits

Method limits

The study is limited by a small sample size (N=20), a single cultural context, and no longitudinal follow-up. The design also lacks control conditions, making it hard to separate identity-targeted effects from general exposure to harmful AI content.

Deployment limits

The approach requires careful safety protocols because it deliberately exposes stigmatized participants to discriminatory content. The paper’s own framing suggests deployment should prioritize psychological safeguarding and ethical treatment alongside empowerment.

Boundary conditions

Findings are bounded by the specific stereotype group studied, the South Korean educational-meritocracy context, and the short-session participatory red-teaming format. Effects may differ across stereotypes, cultures, and longer-term deployments.

Position in field

This sits at the intersection of participatory AI evaluation, harm elicitation, and HCI work on stigmatized or affected communities. Its value is less a new tool than an empirical argument for and against involving targets of stereotyping in red-teaming.

Abstract

Red-teaming—where adversarial prompts are crafted to expose harmful behaviors and assess risks—offers a dynamic approach to surfacing underlying stereotypical bias in large language models. Because such subtle harms are best recognized by those with lived experience, involving targets of stereotyping as red-teamers is essential. However, critical challenges remain in leveraging their lived experience for red-teaming while safeguarding psychological well-being. We conducted an empirical study of participatory red-teaming with 20 individuals stigmatized by stereotypes against nonprestigious college graduates in South Korea’s rigid educational meritocracy. Through mixed-methods analysis, we found participants transformed experienced discrimination into strategic expertise for identifying biases, while facing psychological costs such as stress and negative reflections on group identity. Notably, red-team participation enhanced their sense of agency and empowerment through their role as guardians of the AI ecosystem. We discuss the implications for designing participatory red-teaming that prioritizes both the ethical treatment and the empowerment of stigmatized groups.