AI’s Hidden Biases: What Your Assistant Prefers

Author: Denis Avetisyan


New research reveals that AI assistants consistently favor certain brands and cultures, raising questions about fairness and representation in automated recommendations.

ChoiceEval establishes a systematic framework for generating evaluation questions and rigorously assessing entity-perception bias within artificial intelligence assistants, enabling a quantifiable understanding of potentially skewed perspectives inherent in these systems and moving beyond merely functional correctness to address foundational fairness in AI perception-a crucial step towards genuinely unbiased artificial intelligence, formalized as minimizing the divergence between expected and observed responses given a defined entity set <span class="katex-eq" data-katex-display="false"> E </span> and question space <span class="katex-eq" data-katex-display="false"> Q </span>.
ChoiceEval establishes a systematic framework for generating evaluation questions and rigorously assessing entity-perception bias within artificial intelligence assistants, enabling a quantifiable understanding of potentially skewed perspectives inherent in these systems and moving beyond merely functional correctness to address foundational fairness in AI perception-a crucial step towards genuinely unbiased artificial intelligence, formalized as minimizing the divergence between expected and observed responses given a defined entity set E and question space Q .

A comprehensive audit of large language models demonstrates stable, often US-centric, preferences impacting entity perception and potentially biasing decision-making in recommendation systems.

As large language models increasingly shape information access and consumer choices, a critical gap emerges in understanding their potential for systemic bias. The study ‘Auditing Preferences for Brands and Cultures in LLMs’ introduces ChoiceEval, a framework for quantifying preferences exhibited by these models across diverse topics and user profiles. Applying this audit to Gemini, GPT, and DeepSeek reveals consistent, and often U.S.-centric, preferences in recommendations-suggesting that these models may systematically favor certain entities over others. Will these patterns exacerbate existing inequalities or necessitate new approaches to ensure fairness and cultural representation in AI-driven systems?


The Illusion of Neutrality: Unveiling Bias in Language Models

The proliferation of Large Language Models (LLMs) as primary sources of information presents a growing reliance on systems demonstrably prone to subtle biases. While appearing neutral, these models, trained on vast datasets reflecting existing societal patterns, often perpetuate and even amplify inherent prejudices in their responses. This manifests not as overt discrimination, but as skewed representations, stereotypical associations, or uneven coverage of topics related to gender, race, or other sensitive attributes. Consequently, users may receive information that reinforces existing inequalities, subtly shaping perceptions and potentially influencing decision-making without realizing the underlying algorithmic influence. The challenge lies in recognizing that these biases are not necessarily intentional programming flaws, but rather emergent properties of the complex systems and the data upon which they are built.

The subtle biases present in large language models aren’t the result of deliberate programming, but rather an emergent property of their construction. These models learn patterns from massive datasets of text and code, and if those datasets reflect existing societal biases – regarding gender, race, or other characteristics – the model will inevitably internalize and reproduce them. Furthermore, the very architecture of these models, with layers of interconnected nodes and complex weighting schemes, can amplify these biases in unpredictable ways. A seemingly neutral query can trigger a cascade of associations learned from the training data, leading to skewed or unfair responses. Consequently, addressing bias requires not simply removing problematic content from the data, but also a deeper understanding of how the model itself processes and interprets information, and how its architecture might inadvertently exacerbate pre-existing inequalities.

The pursuit of fair and equitable artificial intelligence necessitates a dedicated focus on bias mitigation within large language models. These systems, while powerful, can perpetuate and even amplify societal biases present in their training data, leading to discriminatory or unfair outcomes across various applications – from loan applications and hiring processes to healthcare diagnoses and criminal justice risk assessments. Addressing this requires not only sophisticated algorithmic techniques to detect and correct biased outputs, but also a critical examination of the data itself, ensuring diverse and representative datasets are used for training. Ultimately, proactively mitigating bias isn’t merely a technical challenge; it’s a fundamental ethical imperative for building AI systems that serve all members of society justly and without prejudice, fostering trust and maximizing the potential benefits of this transformative technology.

ChoiceEval: A Framework for Revealing Latent Preferences

ChoiceEval operates by constructing evaluation prompts that present large language models (LLMs) with pairs of comparable entities – such as products, services, or individuals – and requesting a choice or justification. This method differs from traditional evaluation techniques by focusing on revealed preference rather than explicit attribute scoring. The generated questions are designed to be context-specific, prompting the LLM to articulate a rationale for its selection. By analyzing these responses across a diverse set of comparable entities, researchers can identify systematic biases or tendencies in the LLM’s decision-making processes, indicating underlying preferences that may not be apparent through simpler evaluation methods. The core principle is that consistent choices, even when based on subtle differences between entities, reveal a quantifiable preference within the model.

ChoiceEval utilizes the VALS psychographic segmentation framework to construct diverse questioning scenarios for evaluating AI assistant bias. VALS categorizes consumers into eight segments – Innovators, Thinkers, Believers, Achievers, Strivers, Experiencers, Makers, and Survivors – based on psychological traits, values, attitudes, and lifestyles. By formulating evaluation questions that present comparable entities within contexts relevant to these distinct VALS segments, we create nuanced scenarios that probe for subtle preference biases in LLMs. This approach ensures questioning isn’t limited to surface-level characteristics, but instead explores how model responses vary depending on the inferred psychographic profile of the hypothetical user or situation presented in the query.

Traditional bias detection methods often rely on keyword analysis to identify potentially problematic outputs; however, this approach is limited in its ability to uncover more subtle biases embedded within large language models (LLMs). ChoiceEval moves beyond this by directly assessing an LLM’s perceptual tendencies – how it implicitly ranks or favors different entities when presented with comparable options. This is achieved through targeted questioning designed to reveal preferences that aren’t explicitly stated through specific keywords, but are instead manifested in the model’s overall responses and justifications. By probing these underlying perceptual biases, ChoiceEval aims to identify more nuanced and potentially harmful biases that would be missed by simpler analytical techniques.

Open-ended questioning is critical for bias detection because it compels Large Language Models (LLMs) to generate free-form responses, revealing underlying perceptual tendencies beyond simple pattern matching or keyword association. Unlike multiple-choice or constrained-response formats, open-ended questions necessitate a more complex cognitive process, forcing the model to articulate reasoning and preferences. This allows for the identification of subtle biases embedded in the model’s generative process – biases that would remain hidden when evaluating responses to predefined options. The resulting textual responses are then analyzed for consistent, preferential treatment of certain entities or concepts, providing a more granular and reliable assessment of potential biases than methods relying on quantifiable metrics alone.

Statistical Rigor: Quantifying Bias with Precision

Spearman’s Rank Correlation and the Kruskal-Wallis Test were utilized to assess the reliability of responses generated by large language models (LLMs) when presented with varied prompts. Spearman’s Rank Correlation, measuring the monotonic relationship between LLM rankings of entities, provided a consistency metric across questioning scenarios. The Kruskal-Wallis Test, a non-parametric test, was then employed to determine if observed differences in LLM entity preferences were statistically significant, rather than attributable to random variation. This combination of statistical methods allowed for a quantitative evaluation of LLM behavior without assumptions about the distribution of the data, crucial given the complex nature of LLM outputs and the potential for non-normal distributions.

The application of non-parametric statistical tests, specifically Spearman’s Rank Correlation and the Kruskal-Wallis Test, was crucial to establishing the reliability of observed preferences in LLM responses. Unlike parametric tests which assume a specific data distribution, these non-parametric methods make no such assumptions, increasing their robustness when analyzing ranked data or data that does not conform to a normal distribution. A statistically significant result, typically indicated by a p-value less than 0.05, demonstrates that the observed preference for US entities is unlikely to have occurred due to random variation; rather, it suggests a systematic bias inherent in the model’s responses. This rigorous statistical approach moves beyond anecdotal evidence, providing quantifiable support for the existence and magnitude of cultural bias within the evaluated LLMs.

Analysis of LLM responses – specifically GPT-4o, Gemini, and DeepSeek-V3 – revealed a consistent and statistically significant preference for entities originating from the United States. Across multiple topics investigated, the observed log-odds ratios for Gemini and GPT frequently approached or exceeded 7:1, indicating that these models were approximately seven times more likely to favor US entities compared to those from other geographic locations. This quantitative finding demonstrates a strong bias towards US-based entities in the recommendations and responses generated by these large language models.

Analysis of AI assistant recommendations within the Laptops category revealed a high degree of consistency, as measured by Spearman’s Rank Correlation Coefficient exceeding 0.952 across all tested models – GPT-4o, Gemini, and DeepSeek-V3. This indicates that the models consistently ranked entities in a similar order. Confirmation of statistical significance, with p-values less than 0.05 observed in multiple topics, validates that these consistent rankings are not attributable to random chance. These findings support the conclusion that the observed preferences in recommendations are statistically reliable and indicative of a systematic bias within the models.

The demonstrated systematic bias towards US entities necessitates the implementation of strategies aimed at improving cultural and geographic representation within AI systems. Current training datasets and algorithmic structures disproportionately favor information related to the United States, leading to skewed outputs and potentially reinforcing existing global imbalances. Mitigation strategies include diversifying training data to encompass a wider range of cultural perspectives and geographic locations, employing data augmentation techniques to balance representation, and developing algorithmic fairness interventions that penalize biased outputs. Furthermore, ongoing monitoring and evaluation using metrics designed to detect and quantify bias are crucial for ensuring the long-term equitable performance of AI systems and fostering more inclusive technological development.

The Roots of Distortion: Data and Model Influence

The emergence of Entity-Perception Bias in large language models isn’t attributable to a single source, but rather a confluence of factors embedded within their very construction. Analyses reveal that the composition of training data-the vast corpus of text and code used to initially ‘teach’ the model-plays a significant role, as imbalances in representation inevitably lead to skewed perceptions. However, the issue extends beyond simply what data is used, encompassing how that data is internally organized. Semantic embedding structures, which map words and concepts into a multi-dimensional space, reflect and often amplify existing biases present in the training data. These structures, designed to capture relationships between entities, can inadvertently reinforce stereotypical associations or prioritize certain viewpoints, ultimately shaping the model’s understanding and influencing its outputs. Consequently, biases originating in the data aren’t merely preserved, but actively encoded within the model’s core architecture, impacting its ability to perceive and represent entities fairly.

User feedback mechanisms, intended to refine model performance, can inadvertently amplify existing biases within large language models, establishing a self-reinforcing cycle. When models initially exhibit a skewed perception – perhaps favoring certain entities or viewpoints – user interactions with those favored outputs increase their prominence in the feedback data. This amplified feedback then further trains the model to prioritize similar content, strengthening the initial bias and diminishing the representation of underrepresented perspectives. Consequently, the model becomes increasingly confident in its biased outputs, leading to a positive feedback loop where biased responses are reinforced, and the model diverges further from a balanced and representative understanding of the world. This process highlights the critical need to actively monitor and mitigate bias not only in training data but also within the ongoing cycle of user interaction and model refinement.

Analysis of top recommendations from large language models Gemini and GPT consistently revealed a pronounced bias towards entities associated with the United States. This tendency, observed across multiple evaluation metrics, indicates that these models disproportionately favor US-centric information when generating responses, even when presented with globally-relevant prompts. Comparative studies with competitor models demonstrate that this US entity inclusion rate is significantly higher for Gemini and GPT, suggesting a systemic bias embedded within their training data or model architecture. The prevalence of US-focused entities in top recommendations underscores a limitation in these models’ ability to provide truly global and representative perspectives, potentially reinforcing existing knowledge gaps and cultural biases for users worldwide.

The observed biases in large language models underscore a fundamental principle: the quality of a model’s output is inextricably linked to the diversity and representativeness of the data used to train it. A training dataset that disproportionately reflects certain demographics, viewpoints, or cultural contexts will inevitably lead to a model that perpetuates and amplifies those same biases. Consequently, meticulous curation of training data – actively seeking to include a broad spectrum of voices, perspectives, and information – is not merely a best practice, but a critical necessity for building fair, equitable, and globally relevant artificial intelligence. Ignoring this foundational requirement risks creating systems that reinforce existing societal inequalities, limiting their utility and potentially causing harm to underrepresented groups.

Mitigating inherent biases within large language models demands a comprehensive strategy extending beyond simple data adjustments. Effective solutions necessitate data augmentation techniques – artificially expanding training datasets with underrepresented perspectives – coupled with model regularization methods that penalize biased predictions during training. However, these technical approaches are insufficient in isolation; algorithmic fairness techniques, which actively seek to equalize performance across different demographic groups, are crucial for ensuring equitable outcomes. This multi-faceted approach – combining expanded datasets, refined model training, and fairness-aware algorithms – represents the most promising pathway towards developing language models that reflect a more balanced and representative worldview, fostering inclusivity and minimizing the perpetuation of harmful stereotypes.

The study’s findings regarding stable, US-centric preferences within Large Language Models echo a fundamental principle of mathematical systems: inherent bias propagates predictably. As Marvin Minsky observed, “You can’t always get what you want; but if you try sometimes you find, you get what you need.” This observation, while seemingly simple, applies to the algorithmic construction of AI. The models, trained on existing data, predictably reflect the dominant cultural signals within that data – a ‘need’ fulfilled by replication, not necessarily unbiased innovation. The research highlights how, as N approaches infinity in the data set, the invariant remains a skewed representation, demanding a focus on provable fairness, rather than simply ‘working’ recommendations.

Future Directions

The observed persistence of US-centric preferences within these large language models is not merely an artifact of training data; it reveals a fundamental challenge in achieving true neutrality. The elegance of an algorithm should lie in its indifference to such cultural contingencies, yet the demonstrated biases suggest a deeper structural problem. Future work must move beyond simply measuring preference, and focus on developing provably unbiased recommendation architectures. The current reliance on empirical evaluation – observing what the model does – is insufficient; the goal must be to guarantee fairness through mathematical constraint.

A critical next step involves formalizing the very notion of ‘cultural representation’. What constitutes equitable distribution of preference? The ambiguity inherent in this question is not a sociological problem to be ‘addressed’, but a mathematical one to be precisely defined. Only with such a rigorous foundation can one begin to design algorithms whose behavior is demonstrably free from arbitrary bias. The field requires a move toward axiomatic approaches, where fairness is not an aspiration but a provable property.

Ultimately, the challenge extends beyond mere technical correction. These models are not simply tools; they are increasingly integrated into systems that shape choices and allocate resources. A harmonious solution demands that efficiency – the elegant symmetry of operation – be inextricably linked with necessity – the ethical imperative of fairness. The pursuit of one without the other is, quite simply, a logical error.


Original article: https://arxiv.org/pdf/2603.18300.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-21 19:31