Fighting Fraud, Together and Privately

Author: Denis Avetisyan

A new approach to detecting payment fraud leverages the power of multiple institutions without sharing sensitive customer data.

Collaborative fraud detection benefits from a federated learning approach, enabling model training across decentralized datasets without direct data exchange, thus preserving privacy while leveraging collective intelligence.

Federated learning with NVIDIA FLARE enables high-performance fraud detection while preserving privacy, even with non-IID data and offering model interpretability through Shapley values.

Rising financial losses due to fraud increasingly clash with stringent data privacy regulations hindering centralized detection systems. This paper, ‘Privacy-Preserving Federated Fraud Detection in Payment Transactions with NVIDIA FLARE’, addresses this challenge by demonstrating a multi-institutional proof-of-concept for federated learning applied to payment fraud. Our results show that a federated deep learning model, trained using the NVIDIA FLARE framework, achieves near-centralized performance (F1-score of 0.903) while preserving data sovereignty and exhibiting strong interpretability via Shapley values. Could this approach unlock a new era of collaborative, privacy-preserving fraud prevention in the financial sector?

The Illusion of Centralized Control

Conventional fraud detection systems, often built on rule-based engines or early machine learning algorithms, are increasingly challenged by the modern financial landscape. These systems typically operate on data confined within a single institution, creating isolated ‘silos’ that hinder a holistic view of fraudulent activity. Simultaneously, financial transactions have become extraordinarily complex, involving multiple parties, diverse payment methods, and intricate international flows. This combination of data fragmentation and escalating complexity significantly reduces the effectiveness of traditional models, as they struggle to identify patterns that span institutional boundaries or account for the nuances of modern financial schemes. Consequently, a growing proportion of fraudulent transactions evade detection, necessitating a shift towards more adaptable and collaborative approaches.

The sharing of sensitive financial data amongst institutions remains a considerable obstacle in the fight against fraud, largely due to a complex interplay of regulatory restrictions and competitive anxieties. Stringent privacy regulations, such as GDPR and CCPA, impose significant limitations on cross-institutional data transfer, requiring extensive anonymization or secure multi-party computation techniques which can be both costly and computationally intensive. Beyond legal constraints, financial institutions understandably hesitate to share data that could reveal competitive advantages – insights into customer behavior, transaction patterns, or emerging fraud schemes. This reluctance creates isolated data silos, hindering the development of comprehensive fraud detection models that benefit from the collective intelligence of the entire financial ecosystem. Consequently, collaborative efforts often rely on limited data sharing or the exchange of aggregated statistics, which may not fully capture the nuances of fraudulent activities and impede the effectiveness of preventative measures.

The efficacy of centralized machine learning models in fraud detection is fundamentally challenged by the widespread issue of Non-IID (Non-Independent and Identically Distributed) data across financial institutions. Each bank or credit provider inherently possesses a unique customer base and transaction profile, resulting in datasets where the statistical properties – the ‘distribution’ of data – differ significantly. This means a model trained on the combined data will likely perform suboptimally on any single institution’s data, as it struggles to generalize beyond the biases present in the overall distribution. Simply pooling data doesn’t solve the problem; it exacerbates it by introducing conflicting patterns and skewing the model’s ability to accurately identify fraudulent activities specific to each institution’s clientele. Consequently, strategies that account for these inherent data heterogeneities – such as federated learning or transfer learning – are crucial for building robust and reliable fraud detection systems in a collaborative environment.

Effectively combating modern financial fraud demands a paradigm shift beyond conventional methods, necessitating innovative techniques that simultaneously safeguard data privacy and maintain high detection accuracy. Current strategies often falter due to the sensitive nature of financial data and increasingly stringent regulations; therefore, research is actively exploring federated learning and differential privacy as potential solutions. These approaches allow models to be trained on decentralized datasets without directly exchanging raw information, preserving confidentiality while still leveraging the collective intelligence of multiple institutions. The development of homomorphic encryption and secure multi-party computation further enhances these capabilities, enabling complex analytical operations on encrypted data. Ultimately, the future of fraud detection hinges on the ability to balance the need for robust analytical power with an unwavering commitment to data protection, fostering trust and collaboration within the financial ecosystem.

Analysis of data distributions across sites reveals variations in fraud rates and transaction amounts, alongside differing anomaly type distributions between training and evaluation datasets.

Distributed Intelligence: A Necessary Compromise

Federated Learning (FL) enables model training on a distributed network of devices or servers holding local data samples, without exchanging those data samples. This is achieved by training models locally on each device, then aggregating model updates – such as gradients or model weights – rather than the raw data itself. The central server receives these updates, applies an aggregation algorithm, and distributes the updated global model back to the participating devices. This approach addresses data privacy concerns and regulatory requirements by minimizing data transfer and keeping sensitive information localized. FL is particularly applicable in scenarios where data is inherently decentralized, such as mobile devices, healthcare institutions, or financial networks, and where data sharing is restricted due to privacy or legal constraints.

NVIDIA FLARE is a Python-based federated learning framework designed to simplify the complexities of distributed model training. It provides a comprehensive suite of tools for managing federated learning workflows, including client and server simulation, data partitioning, and secure aggregation protocols. FLARE supports various machine learning frameworks, such as PyTorch and TensorFlow, and offers customizable components for defining client behaviors and aggregation strategies. The framework facilitates the evaluation of different federated learning algorithms and provides features for monitoring performance metrics, such as model accuracy, communication costs, and client participation rates. Furthermore, FLARE integrates with other tools in the NVIDIA ecosystem, including NVFlare API, for enhanced scalability and deployment capabilities.

Several optimization algorithms address the challenges of model aggregation in federated learning. FedAvg (Federated Averaging) computes the weighted average of model updates from participating clients, offering a baseline approach to global model construction. FedProx builds upon FedProx by introducing a proximal term to the local objective function, mitigating the impact of heterogeneous data distributions across clients and improving convergence. FedOpt further refines this process by incorporating second-order information via the KFAC (Kronecker-factored Approximate Curvature) approximation, enabling faster and more stable convergence, particularly in scenarios with highly non-i.i.d. data. These algorithms differ in their computational complexity and convergence properties, influencing their suitability for diverse federated learning deployments.

MLflow is a critical component in federated learning workflows due to its capabilities in tracking experiment parameters, metrics, and artifacts across distributed training processes. Specifically, MLflow’s tracking component records hyperparameters used in each federated learning round, such as learning rate, batch size, and the number of participating clients. It also logs key performance indicators like model accuracy, loss, and communication costs for each client and the globally aggregated model. Furthermore, MLflow facilitates model versioning, allowing for easy comparison of different federated learning strategies – including variations in aggregation algorithms like FedAvg, FedProx, or FedOpt – and enables reproducibility of experiments. The platform’s visualization tools provide insights into the convergence behavior of models and identify potential issues with client drift or data heterogeneity, contributing to robust analysis of federated learning performance and optimization.

Federated learning with central or local updates achieves superior convergence and precision-recall performance, as demonstrated by higher F1-scores and AUPRC values compared to a no-skill baseline, with performance variability across clients indicated by shaded standard deviations ± 1.

The Illusion of Anonymity: Adding Noise to the System

Differential privacy addresses data privacy concerns in federated learning by intentionally adding statistical noise during the model training process. This noise, calibrated to the sensitivity of the data and the training algorithm, obscures individual contributions while preserving the utility of the overall model. The core principle involves ensuring that the output of an analysis remains essentially unchanged if any single individual’s data is removed from the dataset. This is formally achieved by limiting the influence of any single data point on the learning process, thereby preventing the inference of private information. The amount of noise added is carefully controlled by a privacy parameter, ε, which quantifies the level of privacy protection; lower values of ε indicate stronger privacy guarantees but potentially reduced model accuracy.

Rényi Differential Privacy (RDP) is a formal privacy definition and accounting method used to track the cumulative privacy loss incurred during iterative machine learning processes, such as federated learning. Unlike basic ε-differential privacy, RDP allows for tighter privacy bounds, particularly when composing multiple privacy mechanisms. It achieves this by tracking privacy loss using Rényi divergence, a measure of statistical distinguishability between adjacent datasets. The RDP accountant computes a privacy budget based on the sensitivity of each query and accumulates this loss over the entire training process. This accumulated loss, expressed as a function of α and ε, then provides a quantifiable guarantee of privacy, enabling precise tracking and management of privacy risk throughout the model’s lifecycle.

Combining Federated Learning (FL) with Differential Privacy (DP) enables collaborative model training across multiple institutions without directly sharing sensitive data. FL allows each institution to train a model locally on its own dataset, then share only model updates – such as gradients – with a central server. DP adds calibrated noise to these updates before they are shared, ensuring that the contribution of any single data point is obscured. This process limits the ability to infer information about individual records, thereby satisfying requirements of regulations like GDPR and CCPA. The noise level is carefully controlled to balance privacy protection with model utility, allowing for statistically valid results while minimizing the risk of data leakage. This approach facilitates data collaboration in scenarios where data sharing is legally restricted or practically infeasible.

Practical deployments of federated learning incorporating differential privacy have validated the potential for privacy-preserving fraud detection. Recent trials within financial institutions have shown that models trained on decentralized datasets, with added noise calibrated using Rényi Differential Privacy, can achieve comparable accuracy to centrally trained models, while demonstrably limiting the risk of individual data leakage. These implementations utilize techniques such as clipping gradients and adding Laplacian noise to the model updates, effectively obscuring the contribution of any single data point. Results from these deployments indicate that a balance can be achieved between model utility and privacy guarantees, enabling collaborative fraud detection without compromising data confidentiality and satisfying regulatory requirements such as GDPR.

The Quest for Explainability: Beyond Black Boxes

Deep neural networks formed the core of the fraud detection system within this federated learning framework due to their proven capacity to model complex, non-linear relationships inherent in financial transactions. These networks, comprised of multiple layers of interconnected nodes, automatically learn hierarchical representations of the data, effectively identifying subtle patterns indicative of fraudulent activity. The architecture’s flexibility allowed it to accommodate the high dimensionality and varied feature types commonly found in transaction datasets, while its ability to generalize from learned patterns minimized false positives and improved overall detection accuracy. By distributing the training process across multiple decentralized datasets – without directly exchanging sensitive transaction information – the system leveraged the collective intelligence of the network to build a robust and scalable fraud detection model.

Fraud detection systems frequently grapple with a significant class imbalance, where legitimate transactions vastly outnumber fraudulent ones; this disparity can severely hinder model performance, as algorithms often prioritize the majority class. To counter this, the implementation of Focal Loss proves highly effective. This loss function strategically down-weights the contribution of easily classified examples – primarily the numerous legitimate transactions – and focuses learning on the hard-to-classify instances, namely the rare fraudulent activities. By concentrating on these critical cases, Focal Loss enables the model to develop a more nuanced understanding of fraudulent patterns, ultimately leading to a substantial improvement in detection accuracy and a reduction in false negatives – a vital outcome in financial security applications.

Shapley Value Analysis offers a rigorous approach to understanding the reasoning behind a model’s predictions, moving beyond simple feature importance rankings. Rooted in cooperative game theory, this method considers all possible combinations of features to determine each feature’s marginal contribution to the prediction. By calculating the average marginal contribution across all combinations, it provides a fair and consistent attribution score for each feature – quantifying how much each feature genuinely influenced the outcome for a specific instance. This granular level of explanation is particularly valuable in fraud detection, where understanding why a transaction was flagged as fraudulent is crucial for both investigators and customers, fostering increased trust and allowing for more informed decision-making. The technique doesn’t simply identify important features overall, but rather how those features interact to drive predictions on a case-by-case basis, offering a powerful tool for model debugging, fairness assessment, and ultimately, building more reliable and transparent fraud detection systems.

The implementation of federated learning in this fraud detection system yielded a mean F1-score of 0.903, a result remarkably close to the performance of a traditionally centralized learning approach, which achieved 0.925. This near-equivalence is particularly significant given the privacy-preserving nature of federated learning. Furthermore, the system demonstrated a substantial 40% improvement in fraud detection capability when compared to models trained solely on local data – a score of 0.643 – highlighting the benefits of collaborative learning without direct data sharing. These findings suggest that federated learning offers a viable and effective solution for maintaining high accuracy in fraud detection while simultaneously addressing data privacy concerns.

Averaged across clients, differentially private Federated Averaging (DP-FedAvg) achieves comparable F1-scores to standard Federated Averaging (FedAvg) for all fraud types, as indicated by the overlapping standard deviation bands.

The pursuit of decentralized fraud detection, as outlined in this work, feels predictably hopeful. It attempts to build a system resilient to data silos while maintaining privacy – a noble goal, yet one destined for iterative patching. As John von Neumann observed, “There is no possibility of absolute certainty.” The architecture isn’t about pristine algorithms; it’s the compromise that survives deployment. The paper’s exploration of non-IID data and Shapley values is merely acknowledging the inevitable: real-world data is messy, and fairness metrics will always be a moving target. Everything optimized will one day be optimized back, and this framework, while promising, will eventually require resuscitation.

What’s Next?

The demonstration of near-centralized performance in federated fraud detection, as presented, feels less like a revolution and more like an expensive proof-of-concept. The paper rightly highlights the challenges of non-IID data-a polite way of saying production data is always messier than the lab. Expect the next wave of effort to focus not on clever algorithms, but on the unglamorous work of data harmonization and drift mitigation. Shapley values offer a compelling narrative for interpretability, but translating those explanations into actionable insights for fraud analysts, consistently, will prove far more difficult.

The current framework, while functional, introduces a substantial operational overhead. Each new institution onboarded isn’t simply adding data; it’s adding a new failure domain, a new set of network dependencies, and a new source of potential model skew. The cost of maintaining this distributed system, of guaranteeing differential privacy against increasingly sophisticated attacks, and of auditing the entire process will likely eclipse the initial development expense.

If code looks perfect, no one has deployed it yet. The real test will be years down the line, when the initial enthusiasm has faded and the system is battling concept drift, adversarial attacks, and the relentless pressure to reduce latency. The promise of privacy-preserving machine learning is alluring, but the long-term sustainability of these architectures remains, predictably, an open question.

Original article: https://arxiv.org/pdf/2603.13617.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Centralized Control

Distributed Intelligence: A Necessary Compromise

The Illusion of Anonymity: Adding Noise to the System

The Quest for Explainability: Beyond Black Boxes

What’s Next?

See also: