CHI '26 · Honorable mention · full-paper review · confidence low

Exploring the Future of AI in Clinical Collaboration: A Study on Tumor Board Case Preparation

Jiachen Li , Amanda K. Hall , Ruican Zhong , Selin Everett , Alyssa Unell , Hanwen Xu , Matthias Blondeel , Jonathan Carlson , Katie Claveau , Thulasee Jose , Tristan Naumann , David C. Rhew , Naiteek Sangani , Frank Tuan , Jim Weinstein , Varun Mishra , Elizabeth D Mynatt , Scott Saponas , Hao Qiu , Leonardo Schettini , Sam Preston , Aiden Gu , Naoto Usuyama , Zelalem Gero , Cliff Wong , Noel Christopher Codella , Hoifung Poon , Shrey Jain , Matthew Lungren , Eric Horvitz

DOI PDF Program page

This looks like a strong CHI health-AI paper in topic and framing, but the evidence packet here is thin. The abstract suggests a meaningful empirical contribution: a mixed-methods study of oncologists using two AI systems for MTB preparation, with a nontrivial finding that transparency features did not fix trust calibration.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: descriptive knowledge typical · 92/268
Novelty type: empirical finding typical · 68/268
Abstraction level: practice typical · 85/268
Generalization target: organizational context typical · 20/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: weak less common · 5/268
Claim alignment: weak less common · 5/268
Overclaim risk: high less common · 5/268

Review Summary

Based on the metadata and abstract, the paper’s contribution is best read as an empirical and design-oriented study of AI in a genuinely high-stakes clinical workflow rather than as a novel model or interface primitive. The interesting part is not simply that clinicians used AI, but that the authors compared an off-the-shelf assistant with a task-specific multi-agent system and then examined prompts, outputs, and clinician perceptions in the context of tumor board preparation. The abstract’s most important claim is also the most field-relevant: source links and agent-trajectories, which are often treated as intuitive trust-calibration mechanisms, did not align trust with actual system capability. That is a useful corrective to a common HCI assumption that transparency automatically produces calibrated reliance. At the same time, the evidence packet provided here does not include the paper text, so I cannot verify the exact study protocol, coding scheme, or whether the conclusions are as strong as the abstract implies. The sample size is modest at 16 oncologists, and the abstract itself signals a bounded comparison between two systems in one clinical task. So the likely contribution is strongest as descriptive knowledge about clinician-AI interaction in MTB preparation and as a cautionary finding about trust calibration, with more limited support for broad generalization beyond similar organizational clinical contexts. The award-level metadata suggests the paper was recognized, but the current evidence is still insufficient for a high-confidence technical appraisal of validity or novelty beyond the abstract-level framing.

What Changed

Canon before

AI support for clinical collaboration and tumor board preparation is typically framed as note summarization or decision support, with limited evidence about high-stakes MTB preparation workflows.

Departure from common sense

The paper’s core departure is that adding trust-calibration features does not necessarily improve alignment: even with source links and agent-trajectories, clinicians can remain overconfident in summaries and skeptical of recommendations. That is a useful challenge to the common assumption that more transparency straightforwardly fixes trust.

Actual novelty

The novelty appears to be a mixed-methods comparison of two AI systems for MTB case preparation, including a task-specific multi-agent system (HAO), and an analysis of oncologists’ prompts, responses, and perceptions. The contribution is less a new algorithm than an empirical and design-oriented account of how clinicians interact with high-stakes AI support.

Evidence

From the abstract alone, the paper studies 16 oncologists using two AI systems for MTB case preparation and reports willingness to adopt HAO, overconfidence in summaries, skepticism toward recommendations, and failure of trust-calibration strategies. However, the provided evidence spans do not include the paper text, so the review is grounded primarily in metadata and abstract-level claims.

“skip to main content ”

actual novelty · Abstract · confidence 0.90

“skip to main content ”

departure from common sense · Abstract · confidence 0.92

“skip to main content ”

limitation · Stage-A candidate claims note · confidence 0.88

“skip to main content ”

validation scope · Abstract · confidence 0.95

Limits

Method limits

The available packet does not include the paper’s methods, measures, or analysis details, so the exact rigor of the mixed-methods design cannot be assessed here. Any stronger methodological judgment would require the full text sections on study design, coding, and analysis.

Deployment limits

The abstract indicates the work concerns high-stakes clinical collaboration, so deployment is likely constrained by clinical workflow integration, safety, and the need for careful oversight. The evidence provided does not specify settings, institutions, or implementation requirements.

Boundary conditions

The findings are bounded by a small participant sample of 16 oncologists and by the specific comparison between Copilot and HAO in MTB preparation. Generalization beyond this task, these systems, and similar clinical collaboration contexts should be treated cautiously.

Position in field

This sits at the intersection of HCI for health, clinical decision support, and human-AI trust calibration. Its likely value is as an empirical cautionary study showing that transparency mechanisms may not be sufficient in high-stakes clinical settings.

Abstract

Multidisciplinary tumor boards (MTBs) bring specialists together to identify therapies for complex cancer cases, but preparing for them is time-intensive. Clinicians must extract key details from extensive records and evaluate treatment options. While large language models (LLMs) show promise in medicine for basic tasks like summarizing notes, little is known about their role in high-stakes tasks like MTB preparation. We conducted a mixed-methods study with 16 oncologists using two AI systems to prepare patient cases for MTB: an off-the-shelf assistant (Copilot) and a task-specific multi-agent system (Healthcare Agent Orchestrator, HAO). We analyzed oncologist prompts, AI responses, and oncologists' perception of AI. Participants showed greater willingness to adopt HAO but were often overconfident in AI summaries and skeptical of AI-recommended therapies. Trust calibration strategies, such as source links and agent-trajectories, failed to align trust with system capabilities. We conclude with how AI systems should be built to support clinicians in high-stakes tasks.