CHI '26 · Honorable mention · full-paper review · confidence medium-high

Peeking Ahead of the Field Study: Exploring VLM Personas as Support Tools for Embodied Studies in HCI

Xinyue Gui , Ding Xia , Mark Colley , Yuan Li , Vishal Chauhan , Anubhav Anubhav , Zhongyi Zhou , Ehsan Javanmardi , Stela Hanbyeol Seo , Chia-Ming Chang , Manabu Tsukada , Takeo Igarashi

DOI PDF Program page

This is a method paper with a clear CHI-relevant ambition: using VLM personas to pretest embodied field-study outcomes before running expensive human studies. The contribution is credible because it is validated against a real-world comparison, but the scope is narrow and the paper is careful enough to show that mimicry is partial, not a replacement for field evidence.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: method knowledge typical · 29/268
Novelty type: method typical · 21/268
Abstraction level: task typical · 36/268
Generalization target: task class typical · 63/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

The paper’s core move is a genuine methodological reframing: instead of treating field studies as something that can only be done after full human recruitment and deployment, it asks whether VLM personas can serve as a preparatory proxy for embodied HCI studies. That is a meaningful CHI contribution because it targets a pain point that is both practical and epistemic: field studies are expensive, slow, and error-prone, yet they remain the standard for embodied interaction evidence. The paper does not merely speculate; it validates the idea through parallel studies, one with 20 human participants in a real-world street-crossing task and one with 20 VLM personas in a video-based version of the same task, then compares behavioral and subjective responses and adds interviews with five HCI researchers. That gives the work a solid mixed-methods backbone and makes the claim more than a conceptual proposal. At the same time, the evidence also shows why the contribution should be read as bounded. The paper itself emphasizes that the personas lack behavioral variability and depth, and it flags prompt dependence and limited interpretability as important constraints. So the strongest reading is not that VLM personas can replace field studies, but that they can support formative exploration, study preparation, and data augmentation in a specific class of embodied tasks. In field terms, this is a promising method contribution with real novelty, but its generalization is still task-specific and contingent on model quality, prompt design, and the availability of suitable visual context.

What Changed

Canon before

Field studies are treated as the gold standard for embodied HCI evidence, while synthetic or model-based proxies are usually considered insufficient for validating human behavior in situ.

Departure from common sense

The paper challenges the default assumption that embodied field evidence must come only from real participants by proposing VLM personas as a fast, low-cost proxy for simulating field-study outcomes.

Actual novelty

The paper’s novelty is in operationalizing VLM personas for an embodied AV-pedestrian task and evaluating them through parallel human and VLM studies with shared measures, including persona construction from participant questionnaires and comparison of response patterns.

Evidence

The paper explicitly frames field studies as costly and error-prone, then proposes VLM personas as a low-cost evaluation method. It validates the idea with parallel studies: 20 human participants in a real-world street-crossing study and 20 VLM personas in a video-based study, using the same behavioral and subjective measures plus interviews with five HCI researchers. The evidence supports a method contribution with bounded scope rather than a general claim about replacing field studies.

“ Enhancing Shuttle–Pedestrian Communication: An Exploratory Evaluation of External HMI Systems Including Participants Experienced in Interacting with Automated Shuttles. Future Transportation 5, 4 (2025), 1”

actual novelty · Introduction + Scope/Experimental Setup + Study 1/2 overview · confidence 0.60

“ This raises our research question: To what extent can VLM personas mimic human responses in field studies”

departure from common sense · Abstract/Introduction (motivation for VLM personas as field-study support) · confidence 0.66

“ Enriching social media personas with personality traits: A deep learning approach using the big five classes. In International conference on human-computer interaction . Springer, 101–120”

limitation · Discussion (guidelines) + Limitation and Future work + Conclusion · confidence 0.78

“ We conducted parallel studies: 1) one real-world study with 20 participants, and 2) one video-study using 20 VLM personas, both on a street-crossing task”

validation scope · Abstract + Scope/Experimental Setup + Study 1/2 + Measures · confidence 0.72

Limits

Method limits

The method depends on prompt-driven persona construction and model reasoning, and the paper notes that interpretability remains limited. The evaluation is also tied to a specific embodied street-crossing setup and a particular set of metrics, so the method is not demonstrated as broadly transferable across all embodied study designs.

Deployment limits

The approach is positioned as support for formative studies, field-study preparation, and human data augmentation rather than a replacement for real participants. Deployment is constrained by the quality of the underlying VLM, the prompt design, and the availability of suitable visual inputs for the target task.

Boundary conditions

Evidence is limited to a street-crossing task in an autonomous-vehicle pedestrian context with predefined conditions and a video-based VLM persona setup. The paper itself indicates uncertainty about generalization to other visual formats and spatial reasoning demands.

Position in field

This sits in the emerging CHI space of model-based proxies for human studies, but with a stronger embodied focus than typical text-only LLM persona work. Its contribution is methodological: showing that VLM personas can approximate some response patterns while also exposing where they fail to capture variability and depth.

Abstract

Field studies are irreplaceable but costly, time-consuming, and error-prone, which need careful preparation. Inspired by rapid-prototyping in manufacturing, we propose a fast, low-cost evaluation method using Vision-Language Model (VLM) personas to simulate outcomes comparable to field results. While LLMs show human-like reasoning and language capabilities, autonomous vehicle (AV)-pedestrian interaction requires spatial awareness, emotional empathy, and behavioral generation. This raises our research question: To what extent can VLM personas mimic human responses in field studies? We conducted parallel studies: 1) one real-world study with 20 participants, and 2) one video-study using 20 VLM personas, both on a street-crossing task. We compared their responses and interviewed five HCI researchers on potential applications. Results show that VLM personas mimic human response patterns (e.g., average crossing times of 5.25 s vs. 5.07 s) lack the behavioral variability and depth. They show promise for formative studies, field study preparation, and human data augmentation.