CHI '26 · Best paper · full-paper review · confidence high

Cocoa: Co-Planning and Co-Execution with AI Agents

K. J. Kevin Feng , Kevin Pu , Matt Latzke , Tal August , Pao Siangliulue , Jonathan Bragg , Daniel S Weld , Amy X. Zhang , Joseph Chee Chang

DOI PDF Program page

Cocoa is a strong CHI systems contribution because it turns a broad critique of rigid agent workflows into a concrete interaction design: editable shared plans, explicit delegation, and notebook-like execution inside documents. The evaluations suggest this design improves steerability without obvious usability loss, though the evidence is still bounded to short-term research settings and a CS-heavy participant pool.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: generative knowledge typical · 35/268
Novelty type: artifact typical · 20/268
Abstraction level: artifact typical · 19/268
Generalization target: task class typical · 63/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

Cocoa’s main contribution is not merely that it adds planning to an AI assistant, but that it treats planning and execution as continuously revisable collaborative objects. That is a meaningful departure from the dominant chatbot pattern and from systems that isolate planning before execution. By embedding interactive plans directly in a document editor, the paper gives users a shared representation of task structure, role assignment, and progress, which makes delegation visible and editable rather than implicit. This is a compelling HCI move because it translates mixed-initiative ideals into concrete interface mechanisms: users can rewrite steps, reassign them, inspect intermediate outputs, and move back and forth between planning and execution as the task unfolds. The evidence base is also relatively solid for a CHI artifact paper. The formative study motivates the design space, the implementation section clearly states what Cocoa is, the lab study compares it against a strong status-quo chat baseline, and the deployment adds ecological validity. The reported results support the claim that Cocoa improves steerability while preserving ease of use, and the qualitative framing helps explain why: users can intervene earlier and more locally, instead of relying on coarse prompt-and-wait cycles. That said, the paper should still be read as a strong demonstration in a specific domain rather than a universal recipe. The baseline does not isolate every design factor, the deployment is short, and the participant pool is concentrated in CS and adjacent research cultures. The authors are appropriately candid about these limits, including document linearity, lack of multimodality, and no code execution. Overall, this is an excellent best-paper-level contribution because it offers both a polished system and a reusable interaction thesis for future human-agent workflow design.

What Changed

Canon before

Many existing AI agent systems treat planning and execution as rigidly separate stages or support limited human interaction only to fix faults in "autonomous" workflows, lacking flexibility in delegation and iteration during collaborative research tasks.

Departure from common sense

This paper rejects the assumption that planning and execution in human-agent systems should be cleanly separated or that human involvement should mainly be reactive correction. Instead, it argues for interleaving co-planning and co-execution so users and agents can jointly edit plans, assign responsibility step by step, and revise workflows after partial results.

Actual novelty

The paper introduces Cocoa, a document-editor-based system for scientific research that uses interactive plans to support mixed-initiative co-planning and co-execution. Its novelty is the combination of editable shared plans, explicit assignment of steps to either user or agent, notebook-like stepwise execution, and direct refinement of intermediate outputs within one integrated workflow.

Evidence

Evidence comes from a formative study with 9 researchers, a system implementation section describing Cocoa’s interactive-plan design, a within-subjects lab study with 16 researchers against a chat baseline, and a 7-day field deployment with 7 participants. The lab results report significantly greater steerability and more active co-execution behavior, while the deployment shows how explicit delegation and interleaving were used in day-to-day research work.

“4 Cocoa: System Walkthrough and Implementation We present Cocoa, a system that embeds an AI agent into a document editor and provide an affordance we call interactive plans that allow users to collaboratively plan (co-plan) with the agent—the agent proposes an initial plan of execution to tackle a user request that the user can edit to their liking.”

actual novelty · 4 Cocoa: System Walkthrough and Implementation · confidence 0.98

“ On the other hand, researchers have developed systems with user-guided workflows, where the user is in full control of task planning, while the agent executes scoped subtasks [1, 33, 47, 99]. Whether agent- or user-guided, these systems encode a particular configuration of human-agent collaboration—there are few affordances to flexibly delegate agency by altering who is guidin”

departure from common sense · 1 Introduction · confidence 0.96

“blating parts of our design. For example, our baseline could have been an interactive planning interface that did not support stepwise execution. In future work, we can expand our participant pool and run more studies with ablations of Cocoa to pinpoint which particular affordances participants valued most. Further, the underlying agent used in Cocoa also had some technical limitations”

limitation · 9 Limitations and Future Work · confidence 0.99

“ A lab study (n = 16) found that Cocoa enabled steerability without sacrificing ease-of-use, and a week-long field deployment (n = 7) showed how researchers collaborated with Cocoa to accomplish real-world tasks. Figure 1: On the left: an illustration of co-planning: the agent proposes a plan of action in response to a user request.”

validation scope · Abstract · confidence 0.97

Limits

Method limits

The evaluation is strong for an HCI systems paper but still bounded: the lab study involved 16 researchers in 90-minute sessions on research-document tasks, and the field deployment lasted only 7 days with 7 participants. The comparison baseline was a chatbot-style interface rather than a fuller ablation set, so the contribution of individual Cocoa affordances is not fully isolated.

Deployment limits

Deployment evidence is limited to a short 7-day study with researchers in CS and CS-adjacent domains. This constrains claims about long-term adoption, broader workplace integration, and transfer to non-research or non-CS settings.

Boundary conditions

Cocoa is designed for scientific research workflows centered on documents, literature-grounded tasks, and iterative planning/execution. Its usefulness depends on contexts where users benefit from explicit delegation and can inspect or edit intermediate outputs. The implementation also depends on a GPT-4o-based agent and Semantic Scholar-backed literature tools, so findings may not directly transfer to multimodal, code-heavy, or non-document-centered domains.

Position in field

The paper contributes a concrete interaction pattern for human-agent collaboration that sits between rigid user-led and agent-led workflows. Rather than only advocating more oversight, it operationalizes mixed initiative through interactive plans, explicit delegation, and notebook-like execution inside a document environment, positioning Cocoa as both a system contribution and a design argument for more fluid human-agent workflow control.

Abstract

As AI agents take on increasingly long-running tasks involving sophisticated planning and execution, there is a corresponding need for novel interaction designs that enable deeper human-agent collaboration. However, most prior works leverage human interaction to fix "autonomous" workflows that have yet to become fully autonomous or rigidly treat planning and execution as separate stages. Based on a formative study with 9 researchers using AI to support their work, we propose a design that affords greater flexibility in collaboration, so that users can 1) delegate agency to the user or agent via a collaborative plan where individual steps can be assigned; and 2) interleave planning and execution so that plans can adjust after partial execution. We introduce Cocoa, a system that takes design inspiration from computational notebooks to support complex research tasks. A lab study (n=16) found that Cocoa enabled steerability without sacrificing ease-of-use, and a week-long field deployment (n=7) showed how researchers collaborated with Cocoa to accomplish real-world tasks.