Balancing Accuracy and Fairness: The Power of Weighted Samples

Author: Denis Avetisyan


A new study reveals that intelligently adjusting the importance of training data points can significantly improve fairness in machine learning models, but success hinges on defining the right priorities.

Despite achieving identical training accuracy, decision boundaries can differ significantly in their fairness, demonstrating that optimization for overall performance does not guarantee equitable predictions.
Despite achieving identical training accuracy, decision boundaries can differ significantly in their fairness, demonstrating that optimization for overall performance does not guarantee equitable predictions.

Evolving sample weights using genetic algorithms offers a tunable approach to mitigating bias, with trade-offs dependent on optimization objectives such as demographic parity and predictive performance.

Machine learning models, despite their predictive power, can perpetuate societal biases present in training data, creating unfair outcomes for marginalized groups. This paper, ‘Evolved SampleWeights for Bias Mitigation: Effectiveness Depends on Optimization Objectives’, investigates a reweighting approach using a Genetic Algorithm to mitigate such bias while maintaining predictive performance. Results demonstrate that evolved sample weights can improve trade-offs between fairness and accuracy, but crucially, the extent of this improvement is heavily influenced by the specific optimization objectives employed. Does this suggest a need for careful consideration of fairness metrics when designing bias mitigation strategies, or can a universally effective optimization approach be identified?


The Inevitable Bias: Why “Objective” Algorithms Fail

Machine learning models, despite their demonstrated capabilities, are susceptible to reflecting and even amplifying existing societal biases. This occurs because these models learn from data, and if that data contains historical prejudices – regarding race, gender, socioeconomic status, or other sensitive attributes – the model will inevitably internalize those patterns. Consequently, seemingly objective algorithms can produce discriminatory outcomes in areas like loan applications, hiring processes, and even criminal justice risk assessments. The bias isn’t necessarily intentional; it arises from the statistical relationships learned within the training data, highlighting the critical need for careful data curation, algorithmic auditing, and the development of techniques to mitigate these unfair outcomes and ensure equitable application of artificial intelligence.

The pursuit of highly accurate machine learning models, while a primary goal in artificial intelligence, is increasingly recognized as incomplete without concurrent consideration of fairness. Simply maximizing predictive power can inadvertently perpetuate and even amplify existing societal biases, leading to discriminatory outcomes across various demographic groups. Responsible AI development necessitates a paradigm shift, moving beyond solely performance-based metrics to incorporate measures of equitable treatment. This requires actively identifying and mitigating biases embedded within training data and algorithmic design, ensuring that models do not systematically disadvantage particular populations. Ignoring fairness considerations not only raises ethical concerns but also undermines the long-term viability and public trust in these powerful technologies, highlighting the imperative for a holistic approach that prioritizes both accuracy and equitable outcomes.

Conventional optimization algorithms frequently falter when tasked with simultaneously maximizing predictive performance and ensuring equitable outcomes in machine learning models. These methods often treat fairness as a post-hoc constraint or a secondary objective, leading to compromises that either diminish accuracy or fail to adequately mitigate bias. The inherent tension between these competing goals demands the application of multi-objective optimization techniques, which explicitly frame the problem as a search for Pareto-optimal solutions – those where improvement in one objective cannot be achieved without sacrificing the other. This approach allows for a nuanced exploration of the trade-off space, enabling developers to select models that best align with specific ethical and performance requirements. Recent advances in areas like evolutionary algorithms and Bayesian optimization are proving particularly effective in navigating this complex landscape, offering the potential to build AI systems that are both powerful and just.

The hypervolume of Pareto fronts demonstrates performance trade-offs between accuracy and demographic parity across different datasets, with each point representing a single experimental run.
The hypervolume of Pareto fronts demonstrates performance trade-offs between accuracy and demographic parity across different datasets, with each point representing a single experimental run.

Reweighting: A Quick Fix, But Don’t Get Complacent

Reweighting techniques modify the impact of individual data points during model training by altering their contribution to the Loss Function. This is achieved by assigning varying weights to each sample; instances from under-represented groups, or those subject to unfair treatment based on protected attributes, receive higher weights. Conversely, over-represented samples may receive lower weights. This adjustment effectively increases the model’s sensitivity to the minority or disadvantaged group during optimization, encouraging it to learn patterns that might otherwise be overshadowed by the majority class. The weighted Loss Function, therefore, becomes $L_{weighted} = \sum_{i=1}^{n} w_i L(y_i, \hat{y}_i)$, where $w_i$ represents the weight assigned to the $i$-th data point, and $L$ is the base loss function.

Reweighting techniques address bias in machine learning models by modulating the influence of individual data points during training. This is achieved by assigning higher weights to samples from under-represented or disadvantaged groups, effectively increasing their contribution to the loss function and encouraging the model to learn more robust representations for these groups. The goal is to improve fairness metrics – such as equal opportunity or demographic parity – without substantially decreasing the model’s overall predictive accuracy. Successful implementation requires careful consideration of the weighting scheme to avoid overfitting to the reweighted data or introducing new biases, and often involves iterative refinement of the weights based on observed performance on both overall accuracy and fairness criteria.

Determining optimal reweighting factors presents a significant computational challenge due to the high-dimensional nature of the weight space and the interdependence between weights assigned to different data points. Naive approaches, such as grid search, become computationally infeasible with large datasets. Consequently, gradient-based optimization methods, including stochastic gradient descent and its variants, are frequently employed, but require careful tuning of learning rates and regularization parameters to avoid overfitting or instability. Furthermore, the objective function itself can be non-convex, leading to local optima; algorithms like Expectation-Maximization (EM) or more advanced techniques from constrained optimization may be necessary to achieve satisfactory results. The complexity is further increased when multiple fairness metrics are considered simultaneously, necessitating multi-objective optimization strategies.

Genetic Algorithms: Evolving Fairness, One Generation at a Time

Non-dominated Sorting Genetic Algorithm II (NSGA-II) is employed to generate a Pareto Front, which visually represents the trade-off between competing objectives, specifically model accuracy and fairness metrics. Each solution on the Pareto Front represents an optimal balance; improving performance on one objective necessarily degrades performance on the other. This front isn’t a single optimal solution, but rather a set of solutions where no single solution dominates another across all objectives. The resulting Pareto Front allows stakeholders to examine the range of possible outcomes and select a solution that best aligns with their specific priorities and risk tolerance, considering the inherent trade-offs between accuracy and fairness.

A Pareto Front represents the set of non-dominated solutions in a multi-objective optimization problem. Each solution on the front achieves a balance between competing objectives; improvement in one objective necessarily results in the degradation of at least one other. This trade-off is inherent to the problem formulation and is visually represented by the front itself. Consequently, decision-makers are not presented with a single “best” solution, but rather a range of options, enabling selection based on specific priorities and acceptable compromises between objectives. The Pareto Front, therefore, facilitates informed decision-making by explicitly showcasing the consequences of optimizing for different goals.

Hypervolume is a key performance indicator used to assess the quality of Pareto fronts generated by multi-objective optimization algorithms. It calculates the volume of the objective space dominated by the solutions in the Pareto front, with a higher hypervolume indicating a better-performing front. Our research demonstrates that employing a Genetic Algorithm (GA) to evolve sample weights consistently yields Pareto fronts with improved hypervolume scores when compared to strategies utilizing deterministic or equal weighting. Specifically, the GA-evolved weights resulted in statistically significant hypervolume gains across multiple datasets, indicating enhanced performance in balancing competing objectives.

Evaluations across multiple datasets demonstrate the superior performance of evolved weights (EW) compared to deterministic and equal weighting strategies. Specifically, the EW approach resulted in statistically significant hypervolume improvements across nine datasets when optimizing for the (Accuracy, Demographic Parity Difference) objective pair. Furthermore, eight datasets exhibited significant hypervolume gains when using EW to optimize the (ROC, Demographic Parity Difference) objective pair. These results consistently indicate that utilizing a Genetic Algorithm to evolve sample weights effectively improves the quality of generated Pareto fronts, as measured by hypervolume, across diverse datasets and objective combinations.

The hypervolume, represented by the shaded region, quantifies the performance of a Pareto front defined by three solutions relative to a reference point.
The hypervolume, represented by the shaded region, quantifies the performance of a Pareto front defined by three solutions relative to a reference point.

Deterministic Reweighting: Simplicity at a Cost

Rather than relying on iterative optimization techniques like genetic algorithms, deterministic reweighting establishes sample weights through direct analysis of dataset characteristics. This approach assesses inherent qualities within the data – such as feature distributions or label frequencies – to assign importance to each sample. By quantifying these characteristics, the method creates a weighting scheme that prioritizes under-represented or challenging examples without the computational burden of searching for optimal weights. Consequently, deterministic weights offer a streamlined path to bias mitigation, directly translating observable data properties into a quantifiable system for improved model generalization and performance.

Traditional methods of dataset reweighting often rely on complex optimization algorithms, such as Genetic Algorithms, to determine the optimal weights for each data point – a process demanding significant computational resources and tuning. Deterministic Weights present a compelling alternative by bypassing this iterative optimization entirely. Instead of searching for the best weights, this approach derives them directly from quantifiable characteristics of the dataset itself, effectively translating data properties into weighting factors. This direct calculation not only dramatically reduces computational cost, eliminating the need for lengthy and potentially unstable genetic searches, but also streamlines the reweighting process into a more predictable and manageable procedure. The result is a simpler, faster, and more efficient method for addressing dataset bias, offering a pragmatic solution without sacrificing the core benefit of reweighting.

Although optimization-based reweighting methods offer greater adaptability in complex scenarios, Deterministic Weights present a compelling alternative by prioritizing stability and computational efficiency. This approach forgoes the iterative search for ideal weights – a process often demanding significant resources – in favor of directly calculating weights based on readily available dataset characteristics. The resulting system, while potentially less capable of fine-tuning to highly specific biases, delivers a robust solution for mitigating common forms of dataset bias, particularly in situations where speed and simplicity are paramount. This trade-off between flexibility and efficiency positions Deterministic Weights as a practical tool for researchers and practitioners seeking a reliable and easily implementable bias reduction technique, especially when dealing with large datasets or limited computational power.

The pursuit of fairness metrics, as explored in this work regarding evolved sample weights, feels predictably Sisyphean. The paper highlights how optimizing for one definition of ‘fairness’-demographic parity, for instance-can inadvertently degrade performance, and vice versa. It’s a familiar dance; each attempt to simplify the complexities of biased data introduces further abstraction. As Henri Poincaré observed, “Mathematics is the art of giving reasons.” But even the most elegant mathematical formulation of fairness cannot account for the chaotic realities of production data. The Genetic Algorithm may refine the weights, but it merely reshuffles the deck – the underlying biases, and the inevitable trade-offs, remain. This isn’t progress; it’s merely a sophisticated form of technical debt.

What’s Next?

The pursuit of fairness through reweighting schemes feels less like progress and more like shifting the burden. This work, while demonstrating a capacity to navigate the performance-fairness trade-off, merely clarifies which compromises are being made explicit. The bug tracker, in time, will fill with cases where ‘demographic parity’ simply masked a new, subtler form of error. The optimization objective, it turns out, is always the convenient fiction.

Future iterations will undoubtedly focus on automated metric selection, attempting to algorithmically define ‘fairness’. This feels…optimistic. The real problem isn’t computational; it’s philosophical. A Genetic Algorithm can evolve weights, but it cannot resolve the inherent contradictions in expecting a model to simultaneously maximize accuracy and satisfy abstract notions of equity.

The field will continue to refine these techniques, chasing diminishing returns. It doesn’t deploy – it lets go, hoping the resulting wreckage is manageable. The next step isn’t a better algorithm; it’s a more honest accounting of what’s being sacrificed at the altar of optimization.


Original article: https://arxiv.org/pdf/2511.20909.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-12-01 04:45