The Feedback Loop of Bias: How Predictive Policing Amplifies Racial Disparities

Author: Denis Avetisyan


New research reveals that AI-powered predictive policing systems, even with attempts at data correction, can worsen existing biases and lead to significantly unequal outcomes.

Detection rates in Baltimore between 2017 and 2019 reveal a disproportionate spike in identified individuals from Black neighborhoods in 2019, attributable to the concentration of algorithmic patrolling-a phenomenon where generative adversarial networks (GANs) learned and reinforced existing patrol patterns-within those communities.
Detection rates in Baltimore between 2017 and 2019 reveal a disproportionate spike in identified individuals from Black neighborhoods in 2019, attributable to the concentration of algorithmic patrolling-a phenomenon where generative adversarial networks (GANs) learned and reinforced existing patrol patterns-within those communities.

A GAN-based simulation across multiple cities demonstrates that algorithmic bias in predictive policing persists despite data debiasing, necessitating broader policy and resource adjustments.

Despite the growing reliance on data-driven strategies, predictive policing systems risk perpetuating and amplifying existing societal biases. This is explored in ‘Unmasking Algorithmic Bias in Predictive Policing: A GAN-Based Simulation Framework with Multi-City Temporal Analysis’, which presents a reproducible framework-coupling Generative Adversarial Networks with crime and census data from Baltimore and Chicago-to quantify bias propagation throughout the enforcement pipeline. The analysis reveals substantial and temporally variable disparities in detection rates, with algorithmic approaches demonstrably exacerbating racial imbalances even after attempts at data debiasing. Given these findings, can policy interventions effectively mitigate structural bias embedded within predictive policing technologies and ensure equitable outcomes for all communities?


The Illusion of Objectivity: Data’s Inherent Bias

Predictive policing technologies, despite their potential for streamlining law enforcement, are intrinsically linked to the quality of the data upon which they are built. These platforms analyze historical crime records to forecast future incidents, but those very records are not objective snapshots of criminal activity. Instead, they are a product of decades – even centuries – of societal biases embedded within the criminal justice system. Factors such as disproportionate surveillance in marginalized communities, socioeconomic disparities influencing reporting rates, and subjective decision-making by law enforcement all contribute to a skewed representation of crime. Consequently, algorithms trained on this data do not simply identify areas with higher crime rates; they often identify areas with more policing, effectively reinforcing existing patterns of inequity and potentially leading to a self-fulfilling prophecy of increased arrests in already over-policed neighborhoods.

The challenge of ‘dirty data’ extends far beyond simple inaccuracies; it represents a systemic flaw in the foundation of predictive modeling. Historical datasets used to train algorithms often reflect pre-existing biases embedded within societal structures and law enforcement practices-resulting in a contaminated input. Consequently, these models don’t objectively predict future crime; instead, they learn and replicate patterns of discrimination, effectively automating and amplifying inequity. This process creates a feedback loop where biased data leads to biased predictions, which then influence real-world policing strategies, disproportionately impacting already marginalized communities and reinforcing cycles of disadvantage. The implications are significant, demonstrating that seemingly neutral algorithms can, in fact, perpetuate and worsen existing social harms.

The implementation of predictive policing systems, without careful consideration of underlying data biases, carries a substantial risk of deepening existing societal inequalities. Analyses of these systems reveal significant Disparate Impact Ratios (DIR), quantifying the disproportionate effect on specific groups; for example, data from Baltimore in 2018 showed a DIR of 0.079, meaning one group was impacted nearly eight times less than another, while a year later, the same city exhibited a staggering DIR of 15,714, indicating a dramatically skewed impact. These ratios demonstrate that the systems aren’t simply identifying crime; they are reflecting – and then amplifying – pre-existing biases embedded within historical law enforcement data, potentially creating self-fulfilling prophecies and perpetuating cycles of inequity through automated enforcement.

The efficacy of any predictive model is inextricably linked to the quality and representativeness of the data upon which it is built; therefore, a thorough assessment of input data limitations is paramount before implementation. Datasets reflecting historical patterns, even those meticulously collected, are rarely neutral; they often encode existing societal biases related to reporting practices, enforcement priorities, and systemic inequities. Failing to acknowledge and mitigate these inherent limitations doesn’t simply introduce error – it actively risks automating and amplifying discrimination, leading to outcomes where predictive systems disproportionately target already marginalized communities. Consequently, responsible application of predictive modeling demands not only statistical rigor but also a critical understanding of the social context embedded within the data itself, ensuring fairness and just outcomes are prioritized alongside predictive accuracy.

Analysis of monthly Gini coefficients reveals that algorithmically directed patrols consistently exhibit greater inequality <span class="katex-eq" data-katex-display="false">\mathbb{G}</span> (0.43-0.62) compared to reported patrol patterns (0.12-0.36).
Analysis of monthly Gini coefficients reveals that algorithmically directed patrols consistently exhibit greater inequality \mathbb{G} (0.43-0.62) compared to reported patrol patterns (0.12-0.36).

Synthetic Data: A Pathway to Rebalancing the Scales

Conditional Tabular Generative Adversarial Networks (CTGANs) represent a technique for mitigating the ‘Dirty Data Problem’ by programmatically creating new data instances. These networks learn the underlying statistical relationships within a given dataset and then generate synthetic tabular data that mimics those relationships. Crucially, CTGANs allow for conditional generation, meaning the synthetic data can be created to specifically address imbalances in the original training set. This rebalancing is achieved by targeting the generation of data points representing underrepresented groups or scenarios, effectively augmenting the dataset with instances designed to correct existing biases and improve model performance across all represented demographics.

CTGAN Debiasing addresses data imbalances by generating synthetic data instances specifically designed to increase the representation of historically underrepresented groups or scenarios. This technique moves beyond simply augmenting existing data; it actively constructs new data points that mirror the characteristics of minority classes, effectively rebalancing the training dataset. The goal is to reduce the influence of pre-existing biases present in the original data, leading to predictive models with improved fairness and reduced discriminatory outcomes. While not a complete solution – application of CTGAN resulted in a shift of the Disparate Impact Ratio from 0.51 to 3.11 – it demonstrates the potential to reverse existing disparities and move toward more equitable model outputs.

Synthetic data generated via Conditional Tabular GANs (CTGAN) is not applied indiscriminately; the methodology focuses on augmenting the training dataset with instances specifically designed to increase the representation of historically underrepresented groups and scenarios. Evaluation of CTGAN application revealed a shift in the Disparate Impact Ratio from 0.51 to 3.11, indicating a reversal of the initial disparity; however, this value demonstrates that complete elimination of bias was not achieved through this single application of synthetic data generation.

Traditional predictive modeling often relies on accepting existing datasets, which frequently contain inherent biases reflecting historical or systemic inequalities. This approach perpetuates those biases in model outputs. However, techniques like synthetic data generation enable a proactive shift from passive acceptance to active construction of datasets. By generating data specifically designed to address underrepresentation and mitigate bias, model developers can create a more equitable foundation for their algorithms. This allows for the deliberate creation of training sets that better reflect desired fairness criteria, rather than simply replicating existing skewed distributions and their associated harms.

Training with CTGAN rebalancing improves Black face detection rates by 1.49 percentage points in the Baltimore 2019 dataset, but simultaneously decreases White face detection rates by 5.11 percentage points, thereby reversing the original disparity instead of achieving equitable detection across racial groups.
Training with CTGAN rebalancing improves Black face detection rates by 1.49 percentage points in the Baltimore 2019 dataset, but simultaneously decreases White face detection rates by 5.11 percentage points, thereby reversing the original disparity instead of achieving equitable detection across racial groups.

Beyond Accuracy: Quantifying Fairness in Prediction

While overall prediction accuracy provides a general assessment of a predictive policing system’s performance, it fails to reveal disparities in outcomes across different demographic groups. Traditional metrics do not account for whether a system disproportionately flags individuals from specific communities, even if the overall error rate appears low. Consequently, fairness assessments require nuanced measures such as the Disparate Impact Ratio, which quantifies the likelihood of a positive prediction for one group compared to another, and the Gini Coefficient, which measures the inequality in prediction rates across groups. The Disparate Impact Ratio is calculated as the ratio of positive prediction rates between groups, with a value of 1 indicating equal rates and deviations suggesting potential bias. The Gini Coefficient, ranging from 0 to 1, provides a single-value measure of inequality, where higher values indicate greater disparity in prediction outcomes.

The Bias Amplification Score (BAS) is a composite metric designed to evaluate fairness in predictive systems by simultaneously assessing both directional disparity and inequality. It functions by penalizing configurations that exhibit statistically significant differences in outcomes across protected groups and demonstrate high levels of internal inequality within those groups. Specifically, the BAS considers both the difference in positive rates between groups – quantifying directional disparity – and the Gini coefficient, which measures the distribution of predictions within each group. A higher BAS indicates a greater degree of unfairness, as it reflects both a systematic bias and a lack of equitable distribution of predictions. This holistic approach moves beyond single-dimensional fairness measures to provide a more comprehensive evaluation of predictive system behavior.

Ordinary Least Squares (OLS) regression analysis demonstrates a quantifiable relationship between neighborhood demographic composition and predictive policing detection rates. Specifically, a Pearson correlation coefficient of 0.83 was observed between the percentage of White residents in a neighborhood and the detection rate, indicating a strong positive correlation. Conversely, a Pearson correlation of -0.81 was found between the percentage of Black residents and the detection rate, signifying a strong negative correlation. These findings suggest that detection rates are significantly associated with racial demographics at the neighborhood level, highlighting potential disparities in policing practices that warrant further investigation.

Simulations varying the Citizen Reporting Rate (CRR) are utilized to assess the stability of debiasing methodologies. By systematically altering the CRR-the proportion of incidents reported by citizens versus other sources-we evaluate whether fairness metrics remain consistent across diverse data conditions. Results demonstrate that debiasing techniques which perform well under a standard CRR continue to mitigate disparities even with significantly adjusted reporting rates, indicating robustness to variations in data collection processes. Specifically, analyses across CRR values ranging from 10% to 90% show minimal degradation in fairness scores-measured by the Disparate Impact Ratio and Gini Coefficient-for debiased models compared to unbiased baselines, confirming their reliability in real-world deployments where reporting rates are often unknown or fluctuate.

Sensitivity analysis reveals that officer count most significantly impacts the Detection, Identification, and Reporting (DIR) metric when varied with patrol radius (400-1500 ft) and citizen reporting probability (<span class="katex-eq" data-katex-display="false">0.30 - 0.80</span>).
Sensitivity analysis reveals that officer count most significantly impacts the Detection, Identification, and Reporting (DIR) metric when varied with patrol radius (400-1500 ft) and citizen reporting probability (0.30 - 0.80).

The Runaway Feedback Loop: Perpetuating Bias Through Enforcement

Despite efforts to remove pre-existing biases from datasets used in predictive policing, a troubling phenomenon known as the ‘Runaway Feedback Loop’ persists. Increased police presence in specific neighborhoods, intended to deter crime, inevitably leads to a higher rate of detected incidents – not necessarily because more crime is occurring, but because more officers are present to observe it. This increased detection rate is then misinterpreted as evidence of higher crime in those areas, justifying continued – and potentially escalated – policing. The cycle reinforces itself, creating a self-fulfilling prophecy where biased enforcement patterns become embedded within the data, and disproportionately impact already marginalized communities. This dynamic demonstrates that even with seemingly objective data, systemic inequities can be perpetuated, and require continuous monitoring and adaptive strategies to mitigate.

The probability of detecting crime isn’t solely determined by actual incident rates, but is significantly shaped by police presence, as demonstrated by the Noisy-OR Contact Model. This model simulates detection as a probabilistic function of both an incident occurring and a patrol officer being nearby – essentially, the more officers in an area, the higher the chance of any incident being reported, regardless of underlying crime levels. Consequently, increased policing doesn’t necessarily reflect a genuine surge in criminal activity, but rather a heightened ability to detect it; this creates a feedback loop where areas with greater police attention appear to have higher crime rates, justifying continued – and potentially disproportionate – resource allocation. The model illustrates that even with constant underlying crime, patrol officer proximity exerts a powerful influence on reported statistics, emphasizing the importance of accounting for this dynamic when analyzing crime data and deploying resources.

Generative Adversarial Network (GAN)-based Spatial Patrol Models represent a promising avenue for optimizing law enforcement resource allocation, yet their implementation demands careful consideration to avoid perpetuating existing biases. These models, trained on historical crime data, learn to predict future hotspots and deploy patrols accordingly; however, if the initial data reflects biased policing practices – areas subject to greater surveillance naturally exhibiting higher reported crime rates – the GAN will learn and amplify these patterns. Consequently, the model may recommend increased patrols to already over-policed neighborhoods, leading to a self-fulfilling prophecy of heightened detection rates and reinforcing the initial bias. Without continuous monitoring of key disparity metrics and adaptive strategies to mitigate these effects, ostensibly objective algorithms risk inadvertently creating a runaway feedback loop, exacerbating inequities rather than addressing the root causes of crime.

Effective mitigation of algorithmic bias in policing necessitates ongoing scrutiny using metrics such as the Temporal Instability of the Disparate Impact Ratio, which quantifies how quickly biased outcomes shift over time. Recent simulations, conducted across multiple cities, demonstrate the potential for substantial and rapidly escalating inequities; for example, analyses of Baltimore data from 2017 to 2019 revealed disparities ranging from near-zero to exceeding 15,000. This wide range underscores the instability of biased outcomes and the critical need for adaptive strategies – interventions that dynamically adjust to prevent the re-emergence of inequity. Continuous monitoring, coupled with these responsive adjustments, is paramount to ensuring that predictive policing tools do not inadvertently reinforce existing societal biases and perpetuate cycles of disproportionate enforcement.

Detection rates exhibit a strong negative correlation with the percentage of Black residents (<span class="katex-eq" data-katex-display="false">r = -0.81</span>) and a strong positive correlation with the percentage of White residents (<span class="katex-eq" data-katex-display="false">r = +0.83</span>) across the 279 neighborhoods analyzed.
Detection rates exhibit a strong negative correlation with the percentage of Black residents (r = -0.81) and a strong positive correlation with the percentage of White residents (r = +0.83) across the 279 neighborhoods analyzed.

The study meticulously details how predictive policing systems, even those employing sophisticated generative adversarial networks, can exacerbate existing societal biases. This amplification isn’t merely a statistical anomaly; it’s a logical consequence of feeding flawed data into complex algorithms. As Henri Poincaré observed, “Mathematics is the art of giving reasons.” The research demonstrates a clear lack of mathematical rigor in the application of these systems – a failure to prove the absence of bias despite demonstrations of disparate impact. The authors convincingly show that simply altering the data isn’t enough; a provably fair system requires accompanying resource allocation and policy changes to address the underlying societal issues that generate the biased data in the first place. A ‘working’ system, in this context, is demonstrably insufficient; only a correct one-one backed by mathematical proof-will suffice.

What’s Next?

The presented work, while demonstrating the propensity of generative adversarial networks to exacerbate racial disparities within predictive policing systems, merely scratches the surface of a far deeper, mathematical inevitability. The amplification of bias isn’t a flaw in the algorithm, but a logical consequence of feeding imperfect data – data reflecting existing societal inequities – into a system designed for pattern replication. Simple ‘debiasing’ of the input data, as explored, is akin to treating a symptom while ignoring the disease. It alters the numbers, but not the underlying structural problems.

Future efforts must move beyond empirical mitigation and embrace formal verification. The field requires provably fair algorithms – systems where fairness isn’t measured post-hoc, but guaranteed by construction. This necessitates a shift in focus from statistical parity to mathematical equivalence: ensuring that the algorithm’s behavior is identical across all demographic groups, not just yielding similar outcomes. The challenge lies not in building ‘smarter’ algorithms, but in building algorithms that are, fundamentally, just.

In the chaos of data, only mathematical discipline endures. The pursuit of algorithmic fairness is not an engineering problem; it is a moral imperative demanding rigorous, axiomatic foundations. Until the field prioritizes provability over performance, and justice over efficiency, these systems will remain reflections of our imperfections, amplified by the cold logic of code.


Original article: https://arxiv.org/pdf/2603.18987.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-21 16:10