Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models
This is a strong CHI-style empirical paper because it does not just advocate participation; it shows the double edge of involving stereotype targets in red-teaming. The contribution is credible and timely: participants gain strategic expertise and agency, but the study also documents real psychological costs and clear limits on generalization.
Axes Lens
Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.
Contribution shape
- Knowledge form
- descriptive knowledge typical · 92/268
- Novelty type
- empirical finding typical · 68/268
- Abstraction level
- practice typical · 85/268
- Generalization target
- user population typical · 75/268
- Validation mode
- mixed methods typical · 136/268
Evidence profile
- Evidence strength
- strong typical · 158/268
- Claim alignment
- strong typical · 231/268
- Overclaim risk
- medium typical · 210/268
Review Summary
This paper’s main strength is that it treats participatory red-teaming as a genuinely ambivalent practice rather than a simple inclusion win. The abstract and findings support a nuanced empirical contribution: people stigmatized by stereotypes can leverage lived discrimination as strategic expertise for eliciting harmful model behavior, yet the same process can increase distress, negative affect, stigma consciousness, and lower collective self-esteem. That combination makes the paper feel important in CHI terms because it speaks to both the promise and the ethical burden of involving affected communities in AI evaluation. The novelty is not a new algorithm or interface artifact; it is a mixed-methods empirical account that reframes stereotype targets as both knowledgeable evaluators and vulnerable participants. The validation is appropriately scoped for that claim: a single study with 20 participants, short-session red-teaming, and pre/post psychological measures. The authors are also explicit about the limits: one cultural context, small N, no longitudinal follow-up, and no control condition to isolate identity-targeted effects. That honesty strengthens the paper rather than weakening it. My only caution is that the broader normative implication—how and when to operationalize this approach safely—still depends on context, and the paper should not be read as proving that participatory red-teaming is broadly beneficial. It shows that the practice can surface valuable harms and empower participants, but only under conditions where ethical safeguards are taken seriously and where researchers accept that psychological costs are part of the design space, not an edge case.
What Changed
Canon before
Participatory red-teaming is often discussed as a way to include affected communities in evaluating harms, but the paper positions stereotype targets as uniquely able to detect subtle bias while also being vulnerable to psychological harm.
Departure from common sense
The paper argues that targets of stereotyping should be involved as red-teamers because their lived experience makes subtle harms more detectable, even though this means asking stigmatized people to confront discriminatory content and potential psychological risk.
Actual novelty
The paper’s contribution is an empirical mixed-methods account showing that participatory red-teaming can convert lived discrimination into strategic expertise for bias detection, while also documenting psychological costs and a sense of empowerment among participants.
Evidence
The paper reports a single mixed-methods study with 20 participants, a 45-minute red-teaming task, pre/post psychological surveys, and interviews. Quantitatively, distress, negative affect, stigma consciousness increased and collective self-esteem decreased, while individual self-esteem and positive affect stayed stable. Qualitatively, participants turned lived discrimination into prompting expertise and also described hurt, shame, and empowerment. The discussion and limitation sections explicitly frame the work as bounded by small N, one cultural context, no longitudinal follow-up, and no control condition.
“ensive content. Red-teaming—where adversarial prompts are crafted to expose harmful behaviors and assess risks—offers a dynamic approach to surfacing underlying stereotypical bias in large language mod”
actual novelty · Abstract · confidence 0.97
“ Information & Contributors Bibliometrics & Citations Reading Options References Figures Tables Media Share Abstrac”
departure from common sense · Abstract · confidence 0.96
“ilton, Emily Tseng, Jina Suh, Lama Ahmad, Ram Shankar Siva Kumar, Julian Posada, Benjamin Shestakofsky, et al. 2024. The human factor in ai red teaming: Perspectives from social and collaborative computing. In Companion Publication of the 2024 Conference on Computer-Supported Cooperative Work and Social Computing”
limitation · Section 7.5 Limitation and Future Work · confidence 0.99
“re Abstract Warning : This article contains stereotypical and offensive content. Red-teaming—where adversarial prompts are crafted to expose harmful behaviors and assess risks—offers a dynamic approach to surfacing underlying stereotypical bias in large language mod”
validation scope · Abstract and Method · confidence 0.98
Limits
Method limits
The study is limited by a small sample size (N=20), a single cultural context, and no longitudinal follow-up. The design also lacks control conditions, making it hard to separate identity-targeted effects from general exposure to harmful AI content.
Deployment limits
The approach requires careful safety protocols because it deliberately exposes stigmatized participants to discriminatory content. The paper’s own framing suggests deployment should prioritize psychological safeguarding and ethical treatment alongside empowerment.
Boundary conditions
Findings are bounded by the specific stereotype group studied, the South Korean educational-meritocracy context, and the short-session participatory red-teaming format. Effects may differ across stereotypes, cultures, and longer-term deployments.
Position in field
This sits at the intersection of participatory AI evaluation, harm elicitation, and HCI work on stigmatized or affected communities. Its value is less a new tool than an empirical argument for and against involving targets of stereotyping in red-teaming.