CHI '26 · Honorable mention · full-paper review · confidence medium-high

Can LLM-Simulated Practice and Feedback Upskill Human Counselors? A Randomized Study with 90+ Novice Counselors

Ryan Louie , Raj Sanjay Shah , Ifdita Hasan Orney , Juan Pablo Pacheco , Emma Brunskill , Diyi Yang

This is a strong CHI paper because it does not stop at showing that LLM patient simulation is feasible; it tests whether the training actually changes novice counselor behavior. The key result is nuanced and important: practice alone is not enough, and feedback appears to be the mechanism that drives improvement.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: causal knowledge typical · 31/268
Novelty type: empirical finding typical · 68/268
Abstraction level: practice typical · 85/268
Generalization target: user population typical · 75/268
Validation mode: controlled experiment typical · 47/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: medium typical · 210/268

Review Summary

This paper’s main contribution is not a new counseling interface in the narrow sense, but a credible causal evaluation of whether LLM-simulated practice can actually upskill novice counselors. That distinction matters in CHI: many systems are compelling as prototypes, but far fewer are tested with randomized evidence against a meaningful comparison condition. Here, the authors compare practice alone versus practice plus feedback in a study with 94 novice counselors, and the results are clear enough to matter. The feedback condition improved client-centered microskills such as reflections and questions, while practice alone did not improve those skills. Even more striking, empathy declined over time in the practice-only group and was significantly worse than in the feedback group. That is a useful departure from common intuition that more practice by itself should help; the paper shows that simulated interaction without structured feedback may be insufficient and can even leave learners stuck in a solution-oriented mode. The novelty is therefore empirical and methodological: the paper provides randomized evidence about the training value of LLM simulation, rather than only arguing that the simulation is realistic or usable. At the same time, the validation scope is appropriately bounded. The demonstrated effects are on client-centered microskills in short, text-based, immediate pre/post training sessions with novice counselors. The paper itself notes that it measures immediate skill acquisition rather than long-term retention or transfer to real clinical encounters, so the results should not be overgeneralized to actual therapy practice, other modalities, or broader counselor populations. Within those limits, the study is strong and well aligned with its claims, and it makes a persuasive case that LLM-based training systems need structured feedback to produce meaningful skill development.

What Changed

Canon before

Prior CHI and HCI work on LLM-based training for counseling has emphasized simulated interactions, usability, and output quality, but not rigorous evidence that such systems improve novice counselor skill development.

Departure from common sense

A simulated patient alone is not enough to upskill novice counselors; in this study, practice without feedback not only failed to improve performance but was associated with worse empathy over time than practice plus feedback.

Actual novelty

The paper’s novelty is a randomized evaluation of an LLM-simulated counseling practice-and-feedback system with novice counselors, directly testing whether the system improves skill development rather than only whether the simulation is plausible or usable.

Evidence

The paper reports a randomized study with 94 novice counselors comparing practice alone versus practice with feedback. The feedback condition improved client-centered microskills such as reflections and questions, while practice alone showed no improvement. Empathy declined in the practice-alone group and was significantly worse than in the feedback group. The paper also states that prior work had not evaluated whether LLM-simulated training systems promote novice skill improvements.

“ Despite increasing interest in using LLMs in mental health, to our knowledge, this is the first study to conduct a large-scale evaluation ( N = 94) of an LLM-based training system for developing core skills in novice counselors”

actual novelty · Introduction gap statement · confidence 0.74

“ We developed an LLM-simulated practice and feedback system and conducted a randomized study with 94 novice counselors, comparing practice alone versus practice with feedback”

departure from common sense · Abstract + Results summary (practice-only empathy decline) · confidence 0.80

“ Regardless of the measurement approach employed, our pre-post randomized study focused primarily on assessing immediate skill acquisition rather than long-term retention or transfer to real-world clinical encounters with actual patients”

limitation · Limitations and Future Work · confidence 0.78

“ The efficacy of LLM-based practice and feedback training was demonstrated only for client-centered microskills, which represent foundational communication techniques that may serve as a base for therapeutic practice”

validation scope · Scope statement in discussion/limitations · confidence 0.72

Limits

Method limits

The evidence is based on a pre-post randomized study focused on immediate skill acquisition, with behavioral scoring of client-centered microskills and qualitative reflections. The paper itself frames the assessment as immediate rather than long-term, so the causal claims should not be extended beyond the measured training interval.

Deployment limits

The validated setting is novice counselors in a controlled training context using simulated practice and structured feedback. The findings do not establish effectiveness for experienced counselors, real clinical encounters, other therapeutic modalities, or deployment without feedback scaffolding.

Boundary conditions

The demonstrated effects are bounded to client-centered microskills and short-term training outcomes in text-based counseling practice. The paper notes that the study assessed immediate skill acquisition rather than long-term retention or transfer to real-world clinical encounters with actual patients.

Position in field

This work moves beyond prior LLM counseling-training papers that emphasized simulation quality or usability by providing randomized evidence that feedback is critical for skill gains. It positions LLM simulation as a potentially scalable training tool, but only when paired with structured feedback.

Abstract

The growing demand for accessible mental health support requires training more counselors, yet existing approaches remain resource-intensive and difficult to scale. LLMs can realistically simulate patients and generate actionable feedback for training, but their actual impact on novice counselor skill development remains unknown. We developed an LLM-simulated practice and feedback system and conducted a randomized study with 94 novice counselors, comparing practice alone versus practice with feedback. We evaluated behavioral performance, self-efficacy, and qualitative reflections. Results showed the practice-and-feedback group improved in client-centered microskills (reflections, questions), while the practice-alone group showed no improvements. For empathy, the practice-alone group declined over time and performed significantly worse than the feedback group. Qualitative interviews reinforced these findings: feedback helped participants adopt a client-centered listening approach, while practice-alone participants remained solution-oriented. These results suggest LLM-based training systems can promote effective skill development, and combining simulated practice with structured feedback is critical for meaningful improvement.