CHI '26 · Best paper · full-paper review · confidence high

Generative Muscle Stimulation: Providing Users with Physical Assistance by Constraining Multimodal-AI with Embodied Knowledge

Yun Ho , Romain Nith , Peili Jiang , Steven He , Bruno Felalaga , Shan-Yuan Teng , Rhea Seeralan , Pedro Lopes

This is a strong best-paper contribution because it turns EMS from a fixed-program assistance technique into a context-aware generative system, and the paper backs that claim with both ablation and user-study evidence. What makes it especially convincing is that the authors do not just demo novelty: they show why contextual cues, pose, and EMS knowledge each matter, and they are explicit about latency, hallucinations, lab constraints, and the practical limits of EMS hardware.

Video Figure

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: generative knowledge typical · 35/268
Novelty type: system architecture typical · 35/268
Abstraction level: system typical · 61/268
Generalization target: design family typical · 38/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: low typical · 53/268

Review Summary

The paper’s main contribution is not merely adding an LLM to an EMS pipeline, but redefining what EMS assistance can be in HCI. Prior systems largely encoded fixed stimulation scripts for narrow tasks. Here, the authors instead propose a system that generates EMS instructions from a user’s request and sensed context, then constrains those instructions with embodied knowledge such as feasible movements, joint limits, and kinematic structure. That combination is the real novelty: multimodal AI provides flexible inference about objects, situations, and goals, while the EMS knowledge base and biomechanical constraints keep the output grounded enough to be physically meaningful. This makes the work a system-architecture contribution with generative implications for a broader family of EMS interfaces. The evidence is also unusually well matched to the claim. The technical evaluation is not just a benchmark flourish; it directly tests whether the contextual, pose-aware, and EMS-knowledge modules each contribute to better instruction generation. Across 12 tasks, the full system has the lowest distance to expert-authored ground truth, and the breakdowns show why each module matters. The user study then complements that technical result by asking a more interaction-centered question: can people actually use, interpret, and recover from generated EMS guidance? The answer is mostly yes. Participants completed most tasks, often even under forced-error conditions, and they used repetition, slowing down, and re-prompting as active strategies rather than passively obeying the system. That is important because it frames EMS guidance as collaborative and interpretable rather than fully autonomous. The paper is also careful enough to avoid the biggest overclaim trap. It does not show a universally deployable EMS assistant. The authors explicitly acknowledge that the system is slow, depends on cloud models that can hallucinate, and was evaluated in a controlled lab setup with pre-cached outputs and fixed starting poses. They also note enduring EMS constraints such as manual calibration, electrode placement, difficulty reaching deeper muscles, and limited support for precise complex motions. So the right expert reading is that this paper establishes a credible new direction for general-purpose, context-aware EMS assistance, not that it solves physical assistance broadly. That balance between ambition and restraint is a major reason the contribution feels award-level.

What Changed

Canon before

Prior EMS-based assistance systems rely on fixed, non-contextual programming of muscle stimulation instructions, limiting their flexibility and generalizability across different physical tasks and contexts.

Departure from common sense

This paper breaks the common sense assumption that EMS assistance must be fixed and task-specific by generating EMS instructions dynamically based on multimodal AI reasoning about the user's context, pose, and environment.

Actual novelty

The paper introduces a generative EMS architecture that combines multimodal AI (vision and language models) with an EMS knowledge base and biomechanical constraints to generate muscle-stimulation instructions contextually, enabling flexible and previously unexplored EMS-based physical assistance without task-specific programming.

Evidence

The paper supports its core claim with both technical and user evaluation. The ablation study compares the full system against module removals and a naïve baseline across 12 physical tasks, showing the full system has the lowest distance to ground truth. The user study then shows participants can complete tasks, detect injected errors, recover via re-prompting, and understand the generated EMS instructions. The paper also states explicit limitations around latency, hallucinations, scope, and EMS practicality.

“, pose, location, surroundings). The resulting system is more general—enabling unprecedented EMS interactions (e.g., opening a pill bottle) yet also replicating existing systems (e.g., Affordance++) without task-specific programming. It uses computer-vision/large-language-models to generate EMS-instructions, constraining these to a muscle-stimulation knowledge-base & joint-limits”

actual novelty · Abstract · confidence 0.99

“ Examples of interactive EMS-assistance include: sign-language [76], eyes-free navigation [98], piano-playing [73, 97], and demonstrating movement instructions to assist users in operating objects that users have not used before [63]—in Affordance++ [63], EMS is used to help users discover actions, e”

departure from common sense · 1 Introduction · confidence 0.98

“ reduced latency, both of which constrain our current approach. In light of this limitation, it is important to underscore that our contribution is not advancing multimodal-AI but using it to advance interactive-EMS. Latency. We run a state-of-the-art model (~175 billion parameters [6]) with general-reasoning capabilities, but it does not run locally nor fast”

limitation · 8.2 Limitations · confidence 0.99

“In this evaluation, we break down the contributions of the modules through an ablation study, where we compare instructions generated by one version of our system (e.g., complete system, ablation of a module, and a Naïve-VLM with our EMS-knowledge base) against a set of ground-truth EMS-instructions.”

validation scope · 5 Technical evaluation via an ablation study · confidence 0.98

Limits

Method limits

The evaluation is strong but bounded: the ablation study uses distance-to-ground-truth metrics, and the authors note that the human body is complex, multiple motion trajectories can achieve the same goal, and the real-time aspects were disabled in evaluation. The user study also used pre-cached outputs and fixed starting poses.

Deployment limits

Deployment is constrained by LLM/VLM hallucinations and latency, dependence on internet API calls, EMS hardware requirements, manual electrode placement and calibration, and the limited practicality of EMS for deeper muscles or precise movements. The system is also limited to tasks that fit movement-based assistance and its sensing assumptions.

Boundary conditions

The approach is best suited to tasks that require movement-based assistance, can tolerate latency, and do not demand fast reactions or highly precise complex motions. It depends on wearable EMS hardware, pose/context sensing, and user willingness to confirm or re-prompt. The paper explicitly notes that some examples were chosen because they fit the reasoning capabilities of the models and the capabilities of EMS.

Position in field

This work shifts EMS assistance from fixed, task-specific stimulation programs toward a general-purpose, context-aware generation pipeline. In CHI terms, it is a system-level advance that combines multimodal AI with embodied constraints to broaden the design space of EMS interfaces while remaining grounded in empirical evaluation.

Abstract

Electrical-muscle-stimulation (EMS) can support physical-assistance (e.g., shaking a spray-can before painting). However, EMS-assistance is highly-specialized because it is (1) fixed (e.g., one program for shaking spray-cans, another for opening windows); and (2) non-contextual (e.g., a spray-can for cooking dispenses cooking-oil, not paint—shaking it is unnecessary). Instead, we explore a different approach where muscle-stimulation instructions are generated considering the user’s context (e.g., pose, location, surroundings). The resulting system is more general—enabling unprecedented EMS-interactions (e.g., opening a pill-bottle) yet also replicating existing systems (e.g., Affordance++) without task-specific programming. It uses computer-vision/large-language-models to generate EMS-instructions, constraining these to a muscle-stimulation knowledge-base & joint-limits. In our user-study, we found participants successfully completed physical-tasks while guided by generative-EMS, even when EMS-instructions were (purposely) erroneous. Participants understood generated-gestures and, even during forced-errors, understood partial-instructions, identified errors, and re-prompted the system. We believe our concept marks a shift toward more general-purpose EMS-interfaces.