Shielding AI: How Data and Activation Functions Impact Model Resilience

Author: Denis Avetisyan

A new review explores the crucial interplay between activation functions, data distribution, and adversarial robustness in both centralized and federated machine learning.

The distribution of data across ten clients within a federated learning environment-either identically and independently distributed (IID) or non-IID-demonstrates how partitioning a dataset like CIFAR-10, comprised of images spanning ten distinct classes, fundamentally shapes the learning process and subsequent system behavior.

This paper examines the effects of non-IID data and activation functions on deep neural network robustness, proposing a data sharing strategy to mitigate performance degradation in federated learning scenarios.

Despite advances in adversarial training, machine learning model robustness remains vulnerable to both architectural choices and data heterogeneity. This is explored in ‘Studying Various Activation Functions and Non-IID Data for Machine Learning Model Robustness’, which investigates the impact of ten activation functions and data distribution (independent and identically distributed vs. non-IID) on model resilience against adversarial attacks in centralized and federated learning environments. The study demonstrates that while ReLU often performs best in centralized settings, performance significantly degrades with non-IID data in federated learning-a challenge mitigated by a proposed data sharing strategy. Could strategically sharing data unlock more robust and reliable machine learning models for real-world applications facing increasingly complex and decentralized datasets?

The Fragile Foundation of Distributed Intelligence

Conventional machine learning algorithms are fundamentally built upon the assumption of identically and independently distributed (IID) data – meaning each data point is drawn from the same probability distribution and is unrelated to others. However, this premise rarely aligns with the complexities of real-world data landscapes. Consider scenarios involving user data across diverse demographics, sensor readings from varied environments, or medical records reflecting differing patient populations; these datasets inherently exhibit variations in distribution. This departure from IID introduces biases and inconsistencies that can severely degrade model performance, leading to inaccurate predictions and limited generalization capabilities. Consequently, addressing this non-IID challenge is crucial for deploying robust and reliable machine learning solutions in practical applications, necessitating the development of techniques that can effectively handle data heterogeneity.

Federated Learning represents a paradigm shift in machine learning, moving away from centralized datasets to enable model training directly on distributed devices – such as smartphones or IoT sensors – thereby preserving data privacy and reducing communication costs. However, this decentralized approach introduces significant statistical challenges stemming from non-Independent and Identically Distributed (non-IID) data. Unlike traditional machine learning which assumes a uniform data distribution, real-world decentralized data often exhibits substantial variations across devices; one user’s data may heavily favor a specific class, while another’s presents a drastically different pattern. This statistical heterogeneity can lead to model divergence, where locally trained models perform well on their respective devices but fail to generalize effectively when aggregated into a global model, ultimately hindering the overall performance and reliability of the federated learning system. Addressing these non-IID challenges is therefore crucial for realizing the full potential of federated learning in practical applications.

Statistical heterogeneity, arising from non-IID data, presents a core challenge to decentralized machine learning systems. When data distributions vary significantly across devices – for instance, differing user behaviors or sensor placements – a model trained on this fragmented data struggles to generalize effectively. This discrepancy leads to a phenomenon where a model performs well on the data it was trained on, but poorly on unseen data from other devices. The severity of this impact is directly related to the degree of statistical heterogeneity; greater variance in data distributions results in larger performance drops and diminished model accuracy. Addressing this requires advanced techniques that account for these differing distributions, such as personalized model training or sophisticated aggregation strategies, to ensure robust and reliable performance across the entire decentralized network.

Federated adversarial training augments each client's local training set to improve generalization and robustness. — Federated adversarial training augments each client’s local training set to improve generalization and robustness.

Fortifying Models Against Subtle Perturbations

Adversarial Training improves model robustness by augmenting the training dataset with intentionally perturbed inputs designed to mislead the model. These perturbations, while often imperceptible to humans, can cause standard machine learning models to make incorrect predictions. By exposing the model to these “adversarial examples” during training, the model learns to identify and mitigate the effects of such inputs, increasing its resilience to both malicious attacks and naturally occurring noisy data. This process effectively shifts the decision boundary of the model, making it less susceptible to small changes in the input space and improving generalization performance on unseen, potentially corrupted, data.

Adversarial training enhances model robustness by exposing the learning process to intentionally modified input data. This is achieved by generating perturbed examples using algorithms such as the Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), Carlini & Wagner (C&W) attacks, and DeepFool. These algorithms introduce small, often imperceptible, changes to the input data designed to mislead the model. Training on these perturbed examples forces the model to learn features that are less sensitive to these subtle alterations, effectively improving its ability to correctly classify inputs even when subjected to adversarial noise. The resulting model demonstrates increased resilience to attacks exploiting vulnerabilities present in models trained solely on clean data.

Adversarial training improves model generalization by exposing the model to inputs designed to maximize loss, effectively smoothing the decision boundary and reducing overfitting to the training data. Standard training optimizes performance on clean examples, but often fails to account for the manifold structure of the input space, leaving models vulnerable to small, intentionally crafted perturbations. By incorporating adversarial examples into the training set, the model learns to correctly classify inputs even when subjected to these perturbations, mitigating vulnerabilities that arise from the model’s reliance on potentially fragile features and enhancing performance on out-of-distribution data.

The efficacy of adversarial training is directly contingent on the selection of appropriate attack methods used during the training process. Different attack algorithms, such as the Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), Carlini & Wagner (C&W), and DeepFool, generate adversarial perturbations with varying characteristics in terms of magnitude, transferability, and attack surface. Training solely with perturbations generated by a weaker attack, like FGSM with a small $\epsilon$ value, may not provide sufficient defense against stronger, more sophisticated attacks like C&W. Conversely, using excessively strong attacks can lead to overfitting to the specific perturbation structure, reducing performance on clean examples. Therefore, a comprehensive adversarial training strategy often involves utilizing a diverse set of attack methods, potentially with adaptive or mixed strategies, to ensure robust generalization across a wider range of potential threats.

L∞ Projected Gradient Descent attacks successfully generated adversarial images with a perturbation of 8/255 that misclassified images correctly identified by a ResNet-18 network.

The Architectural Underpinnings: ResNet-18

ResNet-18 is frequently utilized as a starting point for research into adversarial training due to its relatively small size and established performance on image classification tasks. This allows for quicker experimentation and reduces computational costs compared to larger, more complex architectures. The model’s 18 layers provide sufficient depth to demonstrate the effects of adversarial perturbations while remaining manageable for iterative training procedures. Furthermore, the availability of pre-trained ResNet-18 models simplifies the implementation of adversarial training pipelines, offering a readily available baseline for comparison and analysis of different defense mechanisms against adversarial attacks.

Residual connections, a key component of ResNet-18, address the vanishing gradient problem encountered when training very deep neural networks. These connections allow the gradient to flow more directly through the network during backpropagation, enabling the training of networks with significantly more layers. By adding the input of a layer to its output – effectively creating a “shortcut” – the network can learn residual mappings instead of attempting to learn the complete underlying function directly. This simplifies the optimization process and improves both the speed of learning and the ability of the network to generalize to unseen data, as deeper networks can represent more complex functions without suffering from performance degradation due to gradient issues.

The fundamental building block of the ResNet-18 architecture, known as the residual block, utilizes non-linear activation functions following each linear transformation to enable the modeling of complex, non-linear relationships within input data. Specifically, ReLU (Rectified Linear Unit) is commonly employed as the activation function, introducing non-linearity through the function $f(x) = \max(0, x)$. This non-linearity is critical; without it, multiple linear layers would simply collapse into a single linear transformation, limiting the network’s capacity to learn intricate patterns. The application of these activation functions after each convolutional layer and batch normalization step allows ResNet-18 to approximate any continuous function, improving its ability to represent and classify data effectively.

The CIFAR-10 dataset, consisting of 60,000 32×32 color images in 10 classes, is widely used for benchmarking ResNet-18 performance due to its established difficulty and manageable size. It’s split into 50,000 training images and 10,000 test images, providing a standardized evaluation protocol. Reported accuracy on CIFAR-10 allows for direct comparison of different adversarial training methods and hyperparameter configurations applied to the ResNet-18 architecture. The dataset’s relatively low resolution and limited number of classes necessitate careful consideration of overfitting, making it a robust test for generalization capabilities.

The modified ResNet-18 architecture replaces the initial 7x7 convolutional kernel with a 3x3 version and removes down-sampling operations within residual blocks to preserve feature map dimensions. — The modified ResNet-18 architecture replaces the initial 7×7 convolutional kernel with a 3×3 version and removes down-sampling operations within residual blocks to preserve feature map dimensions.

Measuring the Impact: Robustness and Limitations

Adversarial training represents a powerful defense against adversarial attacks, demonstrably enhancing a model’s robust accuracy. This technique intentionally exposes the learning algorithm to subtly altered inputs – known as adversarial examples – during the training process. By learning to correctly classify these perturbed samples, the model develops a greater resilience to malicious inputs designed to mislead it. The core principle is to minimize the loss function not just on clean data, but also on these carefully crafted adversarial examples, effectively smoothing the decision boundary and reducing the model’s susceptibility to even slight input variations. Consequently, models trained with this approach exhibit a significantly improved ability to maintain accurate predictions when confronted with real-world noise or intentionally deceptive inputs, providing a crucial step toward deploying reliable machine learning systems in security-sensitive applications.

The proposed adversarial training methodology demonstrates a marked improvement in defending against Fast Gradient Sign Method (FGSM) attacks, achieving a robust accuracy of 67.96%. This figure represents a substantial advancement over previously established benchmarks; notably, Lin et al. reported a robust accuracy of only 18.41% when subjected to the same attack vector. This considerable gain indicates the effectiveness of the current approach in fortifying machine learning models against deliberately crafted, adversarial inputs designed to induce misclassification, and suggests a significant step forward in the development of more resilient artificial intelligence systems.

The study reveals a marked advancement in defending against the DeepFool adversarial attack, achieving a robust accuracy of 83.0%. This represents a significant leap forward when contrasted with previously published results from Lin et al., which reported a robust accuracy of only 47.0% against the same attack. DeepFool poses a particular challenge due to its ability to craft minimal perturbations that are difficult for models to detect; therefore, this improved performance suggests a substantial strengthening of the model’s resilience. The findings indicate that the proposed approach effectively addresses vulnerabilities exploited by DeepFool, providing a considerably more secure and reliable system against this sophisticated form of adversarial manipulation.

A key achievement of this research lies in its ability to simultaneously bolster a model’s resilience against adversarial attacks and preserve its performance on standard, unaltered data. The methodology successfully attains a natural accuracy of 92.0% when evaluating the model on clean examples, indicating minimal compromise to its core functionality while enhancing its security. This is a critical finding, as many defenses against adversarial examples often incur a substantial drop in accuracy on legitimate inputs, rendering them impractical for real-world applications; this work demonstrates a pathway to robust defenses without sacrificing reliable performance in typical scenarios.

A common challenge in enhancing model robustness against adversarial attacks is the potential decrease in performance on standard, clean data. However, strategies such as employing soft labels and data augmentation techniques offer promising solutions to this trade-off. Soft labels, which assign probabilistic targets instead of hard, definitive classifications, encourage the model to learn more generalized representations, reducing overconfidence and improving adaptability. Simultaneously, data augmentation – specifically, the introduction of Gaussian noise – effectively expands the training dataset with subtly perturbed examples. This not only simulates potential adversarial attacks during training, thereby bolstering defenses, but also acts as a regularizer, preventing overfitting and preserving natural accuracy. The combined effect allows models to achieve both heightened robustness and sustained performance on unaltered data, representing a significant advancement in the field of adversarial machine learning.

A ResNet-18 model trained with centralized adversarial training achieves both high natural (clean data) and robust (adversarially perturbed data with ϵ=8/255) accuracy on the CIFAR-10 test set.

The pursuit of robust machine learning models, as detailed in this study of activation functions and data distributions, echoes a fundamental principle of systems-their inevitable confrontation with decay. This research, focused on mitigating adversarial attacks and the challenges of non-IID data, attempts to engineer graceful aging into these complex networks. It is reminiscent of David Hilbert’s assertion: “We must be able to answer the question: What are the ultimate constituents of reality?”-for in this context, the ‘reality’ is the model’s performance, and the constituents are the very components-activation functions, data sharing strategies-examined within. The study acknowledges that models aren’t static entities, but systems evolving within the medium of data, constantly tested and refined against the entropy of adversarial inputs and distribution shifts.

What Lies Ahead?

The pursuit of robustness in deep neural networks, as illuminated by this work, reveals a fundamental truth: defenses are rarely absolute, merely shifts in the landscape of vulnerability. Each layer of protection, each carefully chosen activation function, introduces a new form of technical debt. The system remembers its defenses, and that memory manifests as potential failure modes, perhaps subtly, perhaps catastrophically, in response to unforeseen inputs. The exploration of non-IID data in federated learning highlights this acutely; data heterogeneity isn’t a bug, it’s the natural state, and attempts to homogenize carry a cost – a loss of information, a smoothing of the true distribution.

Future efforts will likely focus not on eliminating adversarial vulnerability – an asymptotic goal – but on gracefully accommodating it. Systems that acknowledge their inherent fragility, that incorporate mechanisms for self-diagnosis and adaptive repair, will prove more resilient in the long run. The proposed data-sharing strategy represents a step in this direction, but further refinement is needed to balance privacy concerns with the need for distributional alignment.

Ultimately, the question isn’t whether a model can be made impervious to attack, but how long it can delay the inevitable entropy. Time isn’t the metric of progress here; it’s the medium in which all defenses decay. The challenge lies in building systems that age gracefully, acknowledging their limitations, and adapting to the ever-shifting currents of adversarial pressure.

Original article: https://arxiv.org/pdf/2512.04264.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Fragile Foundation of Distributed Intelligence

Fortifying Models Against Subtle Perturbations

The Architectural Underpinnings: ResNet-18

Measuring the Impact: Robustness and Limitations

What Lies Ahead?

See also: