CHI '26 · Best paper · full-paper review · confidence high

Small Talk, Big Impact? LLM-based Conversational Agents to Mitigate Passive Fatigue in Conditional Automated Driving

Lewis Cockram , Yueteng Yu , Jorge Pardo , Xiaomeng Li , Andry Rakotonirainy , Jonny Kuo , Sebastien Demmel , Mike Lenné , Ronald Schroeter

DOI PDF Program page

This is a compelling CHI contribution because it tests an LLM-based conversational intervention in a live L3 prototype rather than only in simulation, and it pairs that deployment with mixed-method evidence plus a useful archetype analysis. The contribution is strongest as a proof-of-concept with design implications, not as definitive evidence for production deployment or long-term behavioral change.

Video Figure

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: causal knowledge typical · 31/268
Novelty type: artifact typical · 20/268
Abstraction level: interaction typical · 22/268
Generalization target: user population typical · 75/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

The paper’s strongest contribution is its shift in evaluation context. Instead of treating conversational fatigue mitigation as a simulator-only or Wizard-of-Oz concept, it places a real-time conversational agent into an authentic conditional automated driving setup and studies how people respond under more ecologically credible conditions. That matters because the introduction explicitly frames prior literature as constrained by limited realism, and the method section shows a between-subjects test-track study with 40 participants in an L3 prototype vehicle. The validation is also appropriately mixed: the authors combine sleepiness ratings, interviews, and behavioral observation rather than relying on a single subjective metric. Within those bounds, the conclusion’s claim that brief conversational exchanges woke drivers up and re-engaged attention is reasonably well supported. The second contribution is more design-oriented and is easy to overlook if one focuses only on the fatigue claim. The paper does not merely say that some users liked the agent and others did not. It extracts three preference orientations and connects them to established conversational-agent archetypes, which gives the work a reusable HCI framing. That move helps translate a driving-specific study into broader guidance about adaptive conversational design in safety-critical settings. In other words, the paper contributes both an artifact demonstration and a way of thinking about user heterogeneity. At the same time, the paper is not a final deployment study, and its own limitations section is clear about that. The AV and CA were both prototypes, the autonomy stack was rule- and map-based rather than production-grade, latency fluctuated because of network conditions, and participants only experienced a single session. Those constraints limit claims about robustness, habituation, and real-world transfer. So the right expert reading is that this is a strong, award-level proof-of-concept with meaningful ecological validity and thoughtful design interpretation, but its conclusions should remain bounded to prototype L3 conditions rather than generalized too quickly to production autonomous driving.

What Changed

Canon before

Prior work assumed that mitigating passive fatigue in automated driving primarily relied on non-driving-related tasks or gamified interactions, usually studied in driving simulators or Wizard-of-Oz setups that lack ecological validity.

Departure from common sense

Contrary to common assumptions that passive fatigue interventions require active, demanding tasks or intrusive alerts, this paper argues that brief, natural conversational exchanges can mitigate passive fatigue by embedding safety goals into environment-related dialogue that re-engages attention without overt distraction.

Actual novelty

The paper contributes one of the first field studies of a real-time conversational agent during authentic L3 automated driving, and it adds an empirical archetype analysis showing three user preference orientations aligned with established CA design frameworks.

Evidence

Grounded evidence comes from the introduction, method, conclusion, and limitations sections. The paper contrasts its approach with simulator and Wizard-of-Oz studies, reports a between-subjects closed-track study with 40 participants in an L3 prototype vehicle plus interviews and sleepiness ratings, claims one of the first real-time field deployments of a CA in authentic automated driving, and explicitly limits transfer due to prototype autonomy, fluctuating latency, and single-session exposure.

“Thematic user insights into drivers’ experiences and perceptions, highlighting the factors that shape the acceptability of CAs in safety-critical contexts. (3) Emerging user archetypes that align with established CA design frameworks, revealing how safety-first, entertainment-seeking, and socially oriented drivers value different aspects of conversational support.”

actual novelty · 1 Introduction · confidence 0.98

“ Addressing passive fatigue is not simply a technical challenge but a safety-critical issue in Human-AV interactions [11, 47]. Various countermeasures for passive fatigue have been proposed, such as non-driving-related tasks (NDRTs) [17, 22, 30, 48] or gamification [41] to stimulate driver engagement during automated driving.”

departure from common sense · 1 Introduction · confidence 0.96

“ While the on-road environment adds ecological validity, both the vehicle and the agent were prototypes. The AV operated on a rule- and map-based platform rather than production-level autonomy, and the agent occasionally experienced network-related delays due to the setting. The agent latency fluctuated, which may have provided an inconsistent experience between users”

limitation · 7 Limitations · confidence 0.99

“A between-subjects test-track study was conducted to investigate the impact of a real-time CA on passive fatigue in the context of conditional automated driving. Perceived usefulness, acceptability, and usability were also assessed via semi-structured interviews. Forty participants each completed a fifty-minute drive in an L3 prototype vehicle on a closed test-track”

validation scope · 4 Method · confidence 0.97

Limits

Method limits

The study uses a between-subjects test-track design with 40 participants and a single roughly 50–55 minute session, so it cannot establish longitudinal adoption, habituation, or stability of effects over repeated exposure. The paper also notes possible age-related differences and natural variation in communication style that were not experimentally controlled.

Deployment limits

The system was deployed on a prototype AV platform using rule- and map-based autonomy rather than production autonomy, and the CA ran as a mobile phone application with bone-conduction headphones. Network-related delays and fluctuating latency may have made the interaction inconsistent, limiting direct transfer to production vehicles and open-road deployment.

Boundary conditions

The claims are bounded to conditional automated driving in a closed test-track setting with monotonous driving, short voice-based interactions, and a prototype L3 vehicle. The archetypes are presented as fluid orientations rather than fixed categories, and the paper notes that effects and preferences may shift with longer exposure, different populations, and different social presentation choices.

Position in field

This paper advances passive-fatigue mitigation research by moving beyond simulator and Wizard-of-Oz studies to a live, real-time conversational agent in an authentic L3 driving context. It also extends CA design discourse by linking empirically observed driver orientations to established archetype frameworks, yielding both ecological validation and design guidance.

Abstract

Passive fatigue during conditional automated driving can compromise driver readiness and safety. This paper presents findings from a test-track study with 40 participants in a real-world automated driving scenario. In this scenario, a Large Language Model (LLM) based conversational agent (CA) was designed to check in with drivers and re-engage them with their surroundings. Drawing on in-car video recordings, sleepiness ratings and interviews, we analysed how drivers interacted with the agent and how these interactions shaped alertness. Results show the CA is helpful for supporting vigilance during passive fatigue. Thematic analysis of acceptability further revealed three user preference profiles that implicate future intention to use CAs. Positioning empirically observed profiles within existing CA archetype frameworks highlights the need for adaptive design sensitive to diverse user groups. This work underscores the potential of CAs as proactive Human–Machine Interface (HMI) interventions, demonstrating how natural language can support context-aware interaction during automated driving.