CHI '26 · Best paper · full-paper review · confidence high

CoBRA: Programming Cognitive Bias in Social Agents Using Classic Social Science Experiments

Xuan Liu , HaoYang Shang , Haojian Jin

CoBRA is a strong toolkit contribution because it replaces vague persona prompting with an explicit closed-loop control layer grounded in classic experiments. The paper’s main value is not a new psychological theory, but a reproducible way to measure and regulate observable bias behavior across models, with clear scope limits around internals, multimodality, and compositional control. Its strongest evidence comes from showing that ordinary persona prompting is unstable, while CoBRA provides measurable and more portable behavioral control.

Axes Lens

Rare contribution shape, typical evidence profile. The point here is not a score. It is to show what kind of claim the paper makes, and whether the evidence pattern is unusual or baseline in this 268 -review set.

Contribution shape

Knowledge form: generative knowledge typical · 35/268
Novelty type: tool typical · 14/268
Abstraction level: artifact typical · 19/268
Generalization target: task class typical · 63/268
Validation mode: mixed methods typical · 136/268

Evidence profile

Evidence strength: strong typical · 158/268
Claim alignment: strong typical · 231/268
Overclaim risk: low typical · 53/268

Review Summary

CoBRA is best understood as a methodological toolkit contribution that turns agent specification from an informal prompting practice into an explicit control problem. The paper starts from a widely used assumption in LLM-based social simulation: if researchers describe an agent with a rich persona, role, or biography, the model will reliably enact the intended behavioral tendencies. The pilot study directly challenges that assumption by showing cross-model inconsistency and failures of simple expected distinctions, such as economist personas not reliably showing reduced framing susceptibility. That empirical setup gives the paper a strong argumentative foundation, because the proposed system is motivated by a concrete failure mode rather than by abstract dissatisfaction with prompting. The actual novelty is the closed-loop architecture that uses classic social science experiments as calibration tasks. CoBRA measures observable bias behavior through the Cognitive Bias Index and then adjusts the agent through prompt-based, representation-level, or fine-tuning interventions until the target bias level is reached. This is a meaningful advance because it offers explicit, quantitative, and more reproducible behavioral specification rather than relying on latent role-playing ability alone. The evaluation claims are also reasonably well aligned with the evidence provided in the focused sections: the paper states that CoBRA improves reproducibility across models and enables explicit quantitative control, and the statistical appendix supports strong cross-model variance reduction and robustness across temperatures, with some nuance around reasoning-mode differences for prompt control. Just as important, the paper is careful about scope. It explicitly says CoBRA is a behavioral control layer, not a detector of internal cognitive mechanisms, and it acknowledges practical limits: multimodal social phenomena are not yet covered, compositional bias control is unexplored, and activation/parameter interventions require access to model internals. Those limitations keep the contribution credible. Overall, this is a high-value HCI systems/toolkit paper because it contributes a reusable infrastructure for programmable social-agent behavior while resisting the temptation to overclaim psychological realism or universal generality.

What Changed

Canon before

The dominant practice in LLM-based social simulation specifies agent behavior implicitly through natural language descriptions of roles, personalities, and social attributes, relying on LLMs' strong role-playing and perceptual abilities. This widely assumed approach presumes that such implicit descriptions yield consistent, nuanced, and predictable agent behavior across models and experiment settings.

Departure from common sense

Implicit natural language specifications for agent behavior do not reliably yield consistent or expected cognitive biases across different foundation models. For example, agents specified as 'Economists' do not consistently show reduced susceptibility to framing effects versus common agents, and behavior varies unpredictably across models, contradicting assumptions that implicit role descriptions suffice for consistent social simulation behavior.

Actual novelty

The paper introduces CoBRA, a toolkit that operationalizes validated classic social science experiments as a closed-loop control system to quantitatively measure and regulate cognitive biases in LLM-based social agents. It offers a model-agnostic, explicit, and reproducible behavioral specification method via the Cognitive Bias Index and a Behavioral Regulation Engine using input prompt engineering, activation-level vector manipulation, and lightweight fine-tuning to control agent bias levels precisely.

Evidence

The paper combines a pilot study showing that implicit persona descriptions produce inconsistent framing-effect behavior across four models, a benchmark showing CoBRA improves reproducibility and quantitative control, and a demonstration that bias settings can be transferred into a social contagion task. The strongest evidence is for behavioral controllability and cross-model consistency, while the paper also explicitly states scope limits around internal mechanisms, multimodal settings, compositional bias control, and model accessibility.

“To realize the above design goal, CoBRA uses a closed-loop system that treats cognitive biases as the primary control abstraction and continuously measures and regulates them via classic social science experiments.”

actual novelty · 4.2 Closed-loop System · confidence 0.99

“We make the following key observations. First, the same specification produced inconsistent behavior across models. As shown in Fig. 3, for all specification variants (Economist, Common, and Blank), agents powered by different models exhibited significantly different response patterns (Chi-square test, p < 0.01). For ”

departure from common sense · 3.2 Results · confidence 0.98

“ We plan deployment studies with sociologists as the next step to assess CoBRA’s usability [54]. Unexplored multi-modal Simulation. We introduce CoBRA for LLM-based agents, where LLMs are utilized for language-based social simulation. We prioritize precision and reproducibility in social simulations. As a result, limits the social phenomena we can model (e”

limitation · 9 Limitations & Future Work · confidence 0.99

“008 data points D.3.3 Statistical Results. Across all paradigm pairs, CoBRA methods exhibit significantly reduced cross-model variance relative to both baselines. As shown in Table 7, one-way ANOVA tests yield highly significant effects (p < 0.001) for every paradigm pair, with large F-statistics indicating strong between-group separation.”

validation scope · D.3 Statistical Validation of Cross-Model Reproducibility · confidence 0.97

Limits

Method limits

CoBRA’s Behavioral Regulation Engine methods relying on activation and parameter space interventions require access to model internals, limiting applicability to open-source models; closed-source APIs can only use prompt-based controls with reduced precision. Fine-tuning may risk overfitting despite diverse training data. Control focuses on single biases, not compositional or interacting biases.

Deployment limits

Deployment is limited to language-based LLM agents; multi-modal phenomena are unaddressed. Usability and integration with domain expert workflows remain to be studied. Closed-source models lack fine-grained control beyond prompt-level. Ethical safeguards are essential due to potential misuse of programmable biases.

Boundary conditions

Behavioral control operates on measurable response patterns in controlled social science paradigms; it does not guarantee changes to internal cognitive mechanisms or human-like reasoning. Transfer across cognitive biases or more complex social phenomena requires further extension. Control effectiveness depends on testbed design and model capabilities.

Position in field

CoBRA advances agent specification for LLM-based social simulations by addressing a key gap: the lack of explicit, reproducible, and quantitative control over cognitive biases. It challenges established reliance on implicit natural language role descriptions and introduces a principled, experimentally grounded control framework. The methods align with current AI alignment and HCI interests in predictable, interpretable system behavior, marking a significant methodological milestone for social simulation and LLM behavior engineering.

Abstract

This paper introduces CoBRA, a novel toolkit for systematically specifying agent behavior in LLM-based social simulation. We found that conventional approaches that specify agent behavior through implicit natural-language descriptions often do not yield consistent behavior across models, and the resulting behavior does not capture the nuances of the descriptions. In contrast, CoBRA introduces a model-agnostic way to control agent behavior that lets researchers explicitly specify desired nuances and obtain consistent behavior across models. At the heart of CoBRA is a novel closed-loop system primitive with two components:(1) Cognitive Bias Index that measures the demonstrated cognitive bias of a social agent, by quantifying the agent’s reactions in a set of validated classic social science experiments; (2) Behavioral Regulation Engine that aligns the agent’s behavior to exhibit controlled cognitive bias. Through CoBRA, we show how to operationalize validated social science knowledge (i.e., classical experiments) as reusable “gym” environments for AI—an approach that may generalize to richer social and affective simulations beyond bias alone.