When Workout Buddies Are Virtual: AI Agents and Human Peers in a Longitudinal Physical Activity Study
This best-paper study matters because it rejects the easy story that better AI support comes from acting more human. Across a six-month trial, the authors show a sharper result: human peers feel more socially present, but AI peers can be more dependable and alliance-building. That distinction gives designers a clearer target than vague anthropomorphism.
Axes Lens
Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.
Contribution shape
- Knowledge form
- causal knowledge typical · 31/268
- Novelty type
- empirical finding typical · 68/268
- Abstraction level
- practice typical · 85/268
- Generalization target
- user population typical · 75/268
- Validation mode
- mixed methods typical · 136/268
Evidence profile
- Evidence strength
- strong typical · 158/268
- Claim alignment
- strong typical · 231/268
- Overclaim risk
- low typical · 53/268
Review Summary
This paper stands out because it does not merely ask whether an AI workout companion can motivate people; it asks what kind of relationship such a companion actually creates over time, and whether that relationship differs in kind from a human peer relationship. The answer is more interesting than a simple human-versus-AI ranking. The study’s central contribution is the “partnership paradox”: human peers generated stronger social presence and a richer sense of authentic connection, while AI peers generated a stronger working alliance by being more reliable, steady, and task-focused. That is an important correction to a common design instinct in HCI and industry, where conversational agents are often pushed toward human imitation under the assumption that authenticity-like behavior is the route to motivation. Here, the evidence suggests that imitation is not the main value proposition. AI support appears to work best when it is coherent with its artificial role and dependable in its encouragement. Methodologically, the paper is strong because it combines a six-month randomized controlled design with multiple outcome types: behavioral data, repeated questionnaires, and qualitative interviews. That gives the authors enough leverage to make a nuanced claim rather than a hype-driven one. Just as importantly, the discussion and design implications are disciplined. The paper does not claim that AI replaces human connection; instead, it argues that AI and human peers have complementary strengths. This leads to a practical design lesson with broad relevance: systems for sustained motivation may be better when they orchestrate authentic human accountability together with AI consistency, rather than trying to collapse both into a single pseudo-human agent. The limitations still matter—especially the young academic sample, high attrition, and constrained chat-based implementation—but they narrow the scope rather than undermine the core insight. Overall, this is a strong empirical contribution that sharpens theory about relatedness, social presence, and alliance in human-agent interaction while offering actionable guidance for future health-support systems.
What Changed
Canon before
The dominant assumption has been that AI agents should mimic human authenticity or coaching authority to sustain motivation, and that social presence requires human-like mutual caring, while scalable social support is scarce and human peers are assumed most effective for authentic motivation.
Departure from common sense
The paper breaks common sense by showing that AI agents do not need to replicate human authenticity to be effective; instead, human peers evoke stronger social presence but AI peers provide steadier encouragement and stronger working alliance. Relatedness support can arise from reliability and consistency rather than authenticity, contradicting the assumption that only authentic human connection suffices for lasting motivation.
Actual novelty
The paper presents a novel six-month RCT evaluating large language model-powered simulated exercising peers (SEPs) against human peers and solo controls, revealing a partnership paradox between social presence and working alliance, and an emergent design insight that AI agents are more effective as reliable, consistent supporters rather than human authenticity mimics.
Evidence
The paper provides strong evidence from a six-month randomized controlled trial with 280 recruited participants across four conditions, combining longitudinal step-count analysis, repeated questionnaires, and post-study interviews. The strongest support is for a differentiated relational effect: human peers produced stronger social presence, while AI peers produced stronger working alliance. The evidence is substantial but not unlimited because attrition was high, the sample was mainly young adults in an academic setting, and the intervention used a constrained chat-based SEP design.
“ion? To examine these questions, we carried out a six-month randomized controlled trial (N = 280) with four conditions: human–human dyads (HUM), AI peers with either a human-like avatar (SEPH) or a cyborg-like avatar (SEPC), and a no-peer control (CON). Our study makes three contributions. First, we provide one of the first long-term empirical evaluations of an LLM-powered SEP in supporting physic”
actual novelty · 1 Introduction · confidence 0.98
“y encouragement and non-judgmental support. These findings advance HCI research on human-agent interaction by showing that AI peers should not aim to replicate human authenticity, but rather complement it with reliability and coherence.”
departure from common sense · 7 Conclusion · confidence 0.97
“While this study provides novel insights into the role of human and AI peers in supporting physical activity, several limitations should be acknowledged. Our sample consisted primarily of young adults in an academic setting, which may limit the generalizability of the findings to older adults, clinical populations, or individuals with different lifestyles and cultural backgrounds”
limitation · 6 Limitations · confidence 0.99
“ Of these, 250 synchronized their step data for at least a portion of the study (120 female, 77 male, 2 other/non-binary), while 128 participants completed the full six-month data synchronization (68 female, 58 male, 2 other/non-binary; mean age = 21”
validation scope · 4 Results · confidence 0.95
Limits
Method limits
The sample consisted primarily of young adults in an academic setting, limiting generalizability. The study relied on step counts, questionnaires, and interviews, which do not fully capture the complexity of physical activity motivation. Attrition was substantial, and the SEP condition used chat-based companions with simple avatars and limited adaptability. Self-report measures may also reflect recall or social desirability bias, and LLM-generated responses required researcher monitoring.
Deployment limits
The intervention was implemented as chat-based SEP companions with simple avatars and limited adaptability, so findings may not transfer directly to richer multimodal or more personalized systems. The study context centered on physical activity support with step-count goals and a six-month horizon, leaving longer-term sustainability and broader health-domain deployment unresolved. Future deployments would need stronger automated safety filters and domain-specific guardrails for LLM output.
Boundary conditions
The findings apply most directly to young adults in an academic setting participating in a six-month physical activity intervention with human peers, AI peers, or no-peer control. The AI peers were explicitly bounded, chat-based companions rather than fully embodied or highly adaptive agents. The central paradox may depend on this design context, the measured outcomes of social presence and working alliance, and the specific motivational task of sustaining walking activity.
Position in field
This paper makes a notable contribution to HCI and human-agent interaction by moving beyond the simple question of whether AI can imitate human support. Instead, it shows that human and AI peers may contribute through different relational mechanisms, with authenticity and reliability functioning as complementary rather than interchangeable assets. The work is especially valuable because it studies LLM-based peer support longitudinally and ties empirical findings to concrete design implications for hybrid human-AI motivational systems.