CHI '26 · Honorable mention · full-paper review · confidence medium-high

Dialogues with AI Reduce Beliefs in Misinformation but Build No Lasting Discernment Skills

Anku Rani , Valdemar Danry , Paul Pu Liang , Andrew Lippman , Pattie Maes

This is a strong CHI paper because it does more than show that AI can help people answer misinformation questions correctly in the moment. Its longitudinal design makes the central tension legible: immediate assistance improves performance, but the benefit does not translate into lasting unassisted skill, which is exactly the kind of nuanced human-AI finding CHI values.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: causal knowledge typical · 31/268
Novelty type: empirical finding typical · 68/268
Abstraction level: task typical · 36/268
Generalization target: user population typical · 75/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

This paper’s value is that it reframes a familiar human-AI optimism story into a more disciplined longitudinal claim. The intuitive expectation is that if an AI dialogue partner helps users classify misinformation more accurately, then repeated exposure should also teach them to do that task better on their own. The paper argues otherwise, and the evidence summary supports that reversal: AI assistance yields immediate gains during use, but unassisted performance on unseen items declines by week 4 relative to week 0. That is a meaningful departure from common sense because it separates short-term correctness from durable discernment skill. The novelty is not just the headline result; it is the study structure that makes the distinction observable. A month-long, three-phase design, combined with conversation analysis, lets the authors ask whether the AI is acting as a tutor that transfers skill or as a crutch that improves in-the-moment judgments without building independent capability. The paper’s own framing of dependency versus learning transfer is therefore well aligned with the evidence. At the same time, the limitations matter and are not incidental. The validated item set is relatively small, the follow-up is only four weeks, and there is no no-AI control condition, so the work is best read as strong evidence about this task family and this interaction regime rather than a universal statement about all AI-assisted misinformation training. In CHI terms, that makes it a solid empirical finding with practical implications: designers should not assume that conversational AI assistance automatically produces lasting discernment, even when it improves immediate accuracy. The paper is strongest when interpreted as a caution against overclaiming training effects from assistance alone.

What Changed

Canon before

Prior work suggests AI dialogue can reduce belief in false information, but it is unclear whether such interactions build durable discernment skill rather than only improving immediate judgments.

Departure from common sense

The paper’s core result cuts against the intuitive expectation that if AI dialogue helps people judge misinformation correctly in the moment, it should also train them to do better later without assistance. Instead, the paper reports immediate gains during AI use but a later decline in unassisted performance.

Actual novelty

The paper’s main novelty is a month-long, three-phase longitudinal design that separates immediate AI-assisted accuracy from later unassisted discernment, plus conversation analysis intended to distinguish learning transfer from dependence. That combination supports a more specific claim than simple in-session accuracy improvement.

Evidence

The evidence supports a longitudinal claim about misinformation discernment under AI assistance: 67 participants completed a month-long study, AI assistance improved in-session accuracy, and unassisted performance on unseen items declined by week 4. The paper also reports conversation-strategy analysis and explicitly discusses dependence versus learning transfer. The main limitation is that the study uses a relatively small validated item set and a bounded follow-up window.

“ Information & Contributors Bibliometrics & Citations Reading Options References Figures Tables Media Share Abstract Given the growing prevalence of fake information, including increasingly realistic AI-generated news, there is an ur”

actual novelty · Introduction / Contributions · confidence 0.96

“ Information & Contributors Bibliometrics & Citations Reading Options References Figures Tables Media Share Abstract Given the growing prevalence of fake information, including increasingly realistic AI-generated news, there is an ur”

departure from common sense · Abstract · confidence 0.98

“ Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes 151 (2019), 90–103. Google Scholar [49] Christopher Manning and Hinrich Schutze. 1999. Foundations of statistical natural language processing”

limitation · 11 Limitations and Future Work · confidence 0.99

“ Information & Contributors Bibliometrics & Citations Reading Options References Figures Tables Media Share Abstract Given the growing prevalence of fake information, including increasingly realistic AI-generated news, there is an ur”

validation scope · Study Design / Participants · confidence 0.97

Limits

Method limits

The study is limited by a relatively small validated item set, a single month of follow-up, and the absence of a no-AI control condition for isolating all alternative explanations.

Deployment limits

The findings speak to misinformation-detection tasks with headline-image pairs and AI-assisted dialogue in a controlled study setting; they do not by themselves establish effects for broader real-world misinformation ecosystems or longer-term deployment.

Boundary conditions

The reported effects are bounded by the specific participant sample, the curated news-item set, and the four-week observation window. The paper’s own framing suggests the key boundary is the transition from AI-assisted judgment to later unassisted discernment.

Position in field

This paper sits at the intersection of human-AI interaction and misinformation detection, contributing evidence that AI can improve immediate judgments while failing to produce durable discernment skill. It is positioned as a cautionary result about over-reliance rather than a purely optimistic training intervention.

Abstract

Given the growing prevalence of fake information, including increasingly realistic AI-generated news, there is an urgent need to train people to better evaluate and detect misinformation. While interactions with AI have been shown to durably reduce people's beliefs in false information, it is unclear whether these interactions also teach people the skills to discern false information themselves. We conducted a month-long study where 67 participants classified news headline-image pairs as real or fake, discussed their assessments with an AI system, followed by an unassisted evaluation of unseen news items to measure accuracy before, during, and after AI assistance. While AI assistance produced immediate improvements during AI-assisted sessions (+21\% average), participants' unassisted performance on new items declined significantly by 15.3\% in week 4 compared to week 0. These results indicate that while AI may help immediately, it may ultimately degrade long-term misinformation detection abilities.