CHI '26 · Honorable mention · full-paper review · confidence medium-high

Skin-Deep Bias: How Avatar Appearances Shape Perceptions of AI Hiring

Ka Hei Carrie Lau , Philipp Stark , Efe Bozkir , Enkelejda Kasneci

This is a solid, well-scoped CHI paper with a clear empirical contribution: avatar appearance changes fairness attributions in AI hiring even when trust stays high. The study is strongest as a bounded mixed-methods finding about simulated interviews, not as a general theory of hiring fairness.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: descriptive knowledge typical · 92/268
Novelty type: empirical finding typical · 68/268
Abstraction level: task typical · 36/268
Generalization target: task class typical · 63/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

This paper’s main value is that it cleanly separates two often conflated reactions to AI interviewers: trust and justice attribution. The reported pattern—identity matching not affecting trust, while mismatch increases perceived ethnic bias and partial matches reduce distributive justice—makes the contribution more interesting than a generic “avatars matter” result. In CHI terms, that is a meaningful empirical finding because it shows that social appearance cues can alter fairness judgments even when the interaction is otherwise standardized and the outcome is fixed. The use of a real-time LLM-based conversational interview, plus self-report, sentiment analysis, and eye tracking, gives the paper a stronger methodological profile than a single-survey vignette study. At the same time, the paper is best read as a bounded task-level study rather than a broad claim about hiring systems in the wild. The manipulation is narrow: black/white race presentation and male/female sex presentation, one avatar per condition, and a standardized rejection after a simulated interview. Those choices are appropriate for internal validity, but they also mean the findings are tied to a specific design family and cannot be generalized to broader identity expressions, real organizational stakes, or longitudinal applicant experiences without caution. So the paper is strong as an empirical contribution to HCI and algorithmic fairness, especially for designers of AI interview interfaces, but it should be cited as evidence about perceived fairness under controlled simulation rather than as a comprehensive account of hiring justice.

What Changed

Canon before

Prior CHI and HCI work on algorithmic fairness in hiring has largely centered on technical bias mitigation, system accuracy, and procedural fairness, with less emphasis on how applicants infer justice from the social appearance of an AI interviewer.

Departure from common sense

A notable result is that trust can remain high even when identity cues shift fairness judgments: the paper reports that identity matching did not affect trust, yet mismatch increased perceived ethnic bias and partial matches reduced distributive justice. That separates interpersonal-like trust from justice attribution in a way that is not obvious from common assumptions about avatar realism or similarity.

Actual novelty

The paper’s novelty is not a new hiring algorithm but a new empirical lens on AI interview perception: it studies participant–avatar identity alignment in a real-time LLM-based conversational interview and combines self-report, sentiment analysis, and eye tracking. The contribution is an empirical finding about how avatar appearance shapes fairness attributions, framed as extending the Computers-Are-Social-Actors paradigm.

Evidence

The paper validates its claims in a controlled crowdsourced interview study with 215 participants, using a 2×2 between-subjects manipulation of avatar race and sex, a standardized rejection outcome, and multimodal measures. The evidence supports a bounded claim about how identity cues affect trust, fairness, and bias perceptions in simulated AI hiring.

“ and other media as if they were social actors [ 62 ], which suggests that some reactions relevant to fairness unfold implicitly and are not fully captured by explicit ratings. This highlights the value of multimodal process measures when studying fairness in AI interviews. In our study, all participants received the same scripted rejection, so di”

actual novelty · Introduction (1) and Table 1 positioning · confidence 0.66

“January 2026 7 https://www.realeye.io/, last accessed 22 January 2026 8 https://app.realeye.io/docs/embedded-website-sdk, last accessed 22 January 2026 References [1] Elham Albaroudi, Taha Mansouri, and Ali Alameer. 2024. A Comprehensive Review of AI Techniques for Addressing Algorithmic Bias in Job Hiring. AI 5, 1 (2024), 383–404. Crossref Google Scholar [2] Anna Aumüller, Andreas Winklbauer, Beatrice Schreibmaier, Bernad Batinic, and Martina Mara. 2024. Rethinking feminized service bots: user responses to abstract and gender-ambiguous chatbot avatars in a large-scale interaction study. Personal and Ubiquitous Computing 28, 6 (01 Dec 2024), 1021–1032. Digital Library Google Scholar [3] Jasmin Baake, Josephine Schmitt, and Julia Metag. 2025. Balancing Realism and Trust: AI Avatars In Science Communication. Journal of Science Communication 24 (04 2025). Crossref Google Scholar [4] Gérard Bailly, Stephan Raidt, and Frédéric Elisei. 2010. Gaze, conversational agents”

departure from common sense · Discussion (5.1) and Results (4.2) · confidence 0.72

“ Although we framed the study as a simulated job interview, it had no real employment consequences and thus simulated a hiring domain without real stakes for participant”

limitation · Limitations and Future Work (5.3.1 and 5.3.3) · confidence 0.80

“post-outcome attributions in AI-mediated interviews. 5.1.3 Implicit Behavior: Sentiment and Focal Attention. For RQ3 , implicit measures diverged across modalities. Sentiment showed only a minor match interaction in polarity, and subjectivity showed no reliable effects. This is consistent with evidence that consequential evaluations limit negative responses due to social desirability and impression-management norms, particularly in AI-based interviews where awareness of automated assessment can hinder expressive responses [ 39 ]. Moreover, prior work shows that lexicon-based sentiment analyzers can disagree on polarity for the same data [ 55 ], so each captures only part of the affective signal. We therefore interpret these sentiment patterns cautiously and treat them as a complementary signal to our main findings on perceived fairness and bias. By contrast, gaze was more concentrated on the interviewer’s face under racial mismatch, as indicated by a higher normalized K-coefficient. We interpret this as suggestive of heightened vigilance for observable identity cues. Although we did not test this mechanism directly, this interpretation”

validation scope · Method (3.1/3.1.1/3.2) and Limitations (5.3.1) · confidence 0.78

Limits

Method limits

The study is experimentally controlled but narrow: it uses a single simulated interview flow, a fixed rejection script, and a limited 2×2 avatar manipulation. The multimodal measures strengthen inference, but the design cannot isolate all visual properties of the avatars from identity cues, especially because each condition used one avatar.

Deployment limits

The findings do not directly establish behavior in real hiring systems with actual employment stakes, organizational consequences, or repeated interactions. Deployment should be cautious in settings where candidate stakes, interviewer diversity, or avatar design vary substantially from the study context.

Boundary conditions

Effects are bounded to photorealistic avatars varying only in black/white race presentation and male/female sex presentation, within a crowdsourced English-speaking sample and a standardized rejection scenario. The results speak to perceived fairness and bias after rejection, not to acceptance decisions or long-term applicant behavior.

Position in field

This paper sits at the intersection of algorithmic fairness, social perception, and HCI-mediated hiring. Its contribution is to show that avatar identity cues can shape justice-related evaluations of AI interviewers, extending social-actor framing into fairness attribution rather than only trust or anthropomorphism.

Abstract

Artificial intelligence is increasingly used in hiring, raising concerns about how applicants perceive these systems. While prior work on algorithmic fairness has emphasized technical bias mitigation, little is known about how avatar identity cues influence applicants’ justice attributions in an interview context. We conducted a crowdsourcing study with 215 participants who completed an interview with photorealistic AI avatars varied in phenotypic traits (race and sex), followed by a standardized rejection. Using self-reports, sentiment analysis, and eye tracking, we measured perceptions of trust, fairness, and bias. Results show that racial mismatch heightened perceptions of ethnic bias, while partial match (sharing only one identity) reduced fairness judgments compared to both full and no match. This work extends the Computers-Are-Social-Actors paradigm by demonstrating that avatar appearances shape justice-related evaluations of AI. We contribute to HCI by revealing how identity cues influence fairness attributions and offer actionable insights for designing equitable AI interview systems.