CHI '26 · Best paper · full-paper review · confidence high

When Workout Buddies Are Virtual: AI Agents and Human Peers in a Longitudinal Physical Activity Study

Alessandro Silacci , Mauro Cherubini , Arianna Boldi , Amon Rapp , Maurizio Caon

This best-paper study matters because it rejects the easy story that better AI support comes from acting more human. Across a six-month trial, the authors show a sharper result: human peers feel more socially present, but AI peers can be more dependable and alliance-building. That distinction gives designers a clearer target than vague anthropomorphism.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: causal knowledge typical · 31/268
Novelty type: empirical finding typical · 68/268
Abstraction level: practice typical · 85/268
Generalization target: user population typical · 75/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: low typical · 53/268

Review Summary

This paper stands out because it does not merely ask whether an AI workout companion can motivate people; it asks what kind of relationship such a companion actually creates over time, and whether that relationship differs in kind from a human peer relationship. The answer is more interesting than a simple human-versus-AI ranking. The study’s central contribution is the “partnership paradox”: human peers generated stronger social presence and a richer sense of authentic connection, while AI peers generated a stronger working alliance by being more reliable, steady, and task-focused. That is an important correction to a common design instinct in HCI and industry, where conversational agents are often pushed toward human imitation under the assumption that authenticity-like behavior is the route to motivation. Here, the evidence suggests that imitation is not the main value proposition. AI support appears to work best when it is coherent with its artificial role and dependable in its encouragement. Methodologically, the paper is strong because it combines a six-month randomized controlled design with multiple outcome types: behavioral data, repeated questionnaires, and qualitative interviews. That gives the authors enough leverage to make a nuanced claim rather than a hype-driven one. Just as importantly, the discussion and design implications are disciplined. The paper does not claim that AI replaces human connection; instead, it argues that AI and human peers have complementary strengths. This leads to a practical design lesson with broad relevance: systems for sustained motivation may be better when they orchestrate authentic human accountability together with AI consistency, rather than trying to collapse both into a single pseudo-human agent. The limitations still matter—especially the young academic sample, high attrition, and constrained chat-based implementation—but they narrow the scope rather than undermine the core insight. Overall, this is a strong empirical contribution that sharpens theory about relatedness, social presence, and alliance in human-agent interaction while offering actionable guidance for future health-support systems.

What Changed

Canon before

The dominant assumption has been that AI agents should mimic human authenticity or coaching authority to sustain motivation, and that social presence requires human-like mutual caring, while scalable social support is scarce and human peers are assumed most effective for authentic motivation.

Departure from common sense

The paper breaks common sense by showing that AI agents do not need to replicate human authenticity to be effective; instead, human peers evoke stronger social presence but AI peers provide steadier encouragement and stronger working alliance. Relatedness support can arise from reliability and consistency rather than authenticity, contradicting the assumption that only authentic human connection suffices for lasting motivation.

Actual novelty

The paper presents a novel six-month RCT evaluating large language model-powered simulated exercising peers (SEPs) against human peers and solo controls, revealing a partnership paradox between social presence and working alliance, and an emergent design insight that AI agents are more effective as reliable, consistent supporters rather than human authenticity mimics.

Evidence

The paper provides strong evidence from a six-month randomized controlled trial with 280 recruited participants across four conditions, combining longitudinal step-count analysis, repeated questionnaires, and post-study interviews. The strongest support is for a differentiated relational effect: human peers produced stronger social presence, while AI peers produced stronger working alliance. The evidence is substantial but not unlimited because attrition was high, the sample was mainly young adults in an academic setting, and the intervention used a constrained chat-based SEP design.

“ion? To examine these questions, we carried out a six-month randomized controlled trial (N = 280) with four conditions: human–human dyads (HUM), AI peers with either a human-like avatar (SEPH) or a cyborg-like avatar (SEPC), and a no-peer control (CON). Our study makes three contributions. First, we provide one of the first long-term empirical evaluations of an LLM-powered SEP in supporting physic”

actual novelty · 1 Introduction · confidence 0.98

“y encouragement and non-judgmental support. These findings advance HCI research on human-agent interaction by showing that AI peers should not aim to replicate human authenticity, but rather complement it with reliability and coherence.”

departure from common sense · 7 Conclusion · confidence 0.97

“While this study provides novel insights into the role of human and AI peers in supporting physical activity, several limitations should be acknowledged. Our sample consisted primarily of young adults in an academic setting, which may limit the generalizability of the findings to older adults, clinical populations, or individuals with different lifestyles and cultural backgrounds”

limitation · 6 Limitations · confidence 0.99

“ Of these, 250 synchronized their step data for at least a portion of the study (120 female, 77 male, 2 other/non-binary), while 128 participants completed the full six-month data synchronization (68 female, 58 male, 2 other/non-binary; mean age = 21”

validation scope · 4 Results · confidence 0.95

Limits

Method limits

The sample consisted primarily of young adults in an academic setting, limiting generalizability. The study relied on step counts, questionnaires, and interviews, which do not fully capture the complexity of physical activity motivation. Attrition was substantial, and the SEP condition used chat-based companions with simple avatars and limited adaptability. Self-report measures may also reflect recall or social desirability bias, and LLM-generated responses required researcher monitoring.

Deployment limits

The intervention was implemented as chat-based SEP companions with simple avatars and limited adaptability, so findings may not transfer directly to richer multimodal or more personalized systems. The study context centered on physical activity support with step-count goals and a six-month horizon, leaving longer-term sustainability and broader health-domain deployment unresolved. Future deployments would need stronger automated safety filters and domain-specific guardrails for LLM output.

Boundary conditions

The findings apply most directly to young adults in an academic setting participating in a six-month physical activity intervention with human peers, AI peers, or no-peer control. The AI peers were explicitly bounded, chat-based companions rather than fully embodied or highly adaptive agents. The central paradox may depend on this design context, the measured outcomes of social presence and working alliance, and the specific motivational task of sustaining walking activity.

Position in field

This paper makes a notable contribution to HCI and human-agent interaction by moving beyond the simple question of whether AI can imitate human support. Instead, it shows that human and AI peers may contribute through different relational mechanisms, with authenticity and reliability functioning as complementary rather than interchangeable assets. The work is especially valuable because it studies LLM-based peer support longitudinally and ties empirical findings to concrete design implications for hybrid human-AI motivational systems.

Abstract

Physical inactivity remains a critical global health issue, yet scalable strategies for sustained motivation are scarce. Conversational agents designed as simulated exercising peers (SEPs) represent a promising alternative, but their long-term impact is unclear. We report a six-month randomized controlled trial (N=280) comparing individuals exercising alone, with a human peer, or with a large language model-driven SEP. Results revealed a partnership paradox: human peers evoked stronger social presence, while AI peers provided steadier encouragement and more reliable working alliances. Humans motivated through authentic comparison and accountability, whereas AI peers fostered consistent, low-stakes support. These complementary strengths suggest that AI agents should not mimic human authenticity but augment it with reliability. Our findings advance human-agent interaction research and point to hybrid designs where human presence and AI consistency jointly sustain physical activity.