Evolving Defenses: Neuroevolution Tackles Adversarial Attacks

Author: Denis Avetisyan

A new approach uses genetic algorithms to directly design convolutional neural networks that are more resilient to carefully crafted inputs designed to fool them.

The progression of optimal solutions across successive generations reveals a clear inflection point, demarcated by the <span class="katex-eq" data-katex-display="false">\tau_{auth}</span> threshold, signifying a notable shift in the evolutionary trajectory. — The progression of optimal solutions across successive generations reveals a clear inflection point, demarcated by the $\tau_{auth}$ threshold, signifying a notable shift in the evolutionary trajectory.

This paper introduces NERO-Net, a neuroevolutionary method that optimizes CNN architectures for adversarial robustness by incorporating robustness metrics into the fitness function.

Despite advances in neural network training, evolved architectures often remain vulnerable to adversarial perturbations, hindering deployment in safety-critical applications. This paper introduces ‘NERO-Net: A Neuroevolutionary Approach for the Design of Adversarially Robust CNNs’, a novel neuroevolutionary method that directly optimizes convolutional neural network architectures for inherent robustness to adversarial attacks. By prioritizing post-attack accuracy within the fitness function-without relying on adversarial training during evolution-NERO-Net discovers architectures exhibiting both clean and adversarial resilience. Demonstrated on CIFAR-10, the resulting networks achieve promising robustness, raising the question of whether neuroevolution can become a primary tool for designing intrinsically secure deep learning systems?

The Fragility of Perception: Adversarial Examples and Modern Networks

Despite significant progress in artificial intelligence, Artificial Neural Networks exhibit a surprising vulnerability to what are known as Adversarial Examples. These are subtly modified inputs – images, sounds, or text – designed to intentionally mislead the network. While appearing normal to human perception, these crafted inputs contain carefully calculated perturbations that can cause a neural network to misclassify data with high confidence. This isn’t a matter of simply overwhelming the system with noise; the alterations are often imperceptible, making detection exceedingly difficult. The existence of Adversarial Examples highlights a fundamental difference in how humans and machines perceive information, raising critical security concerns for applications ranging from self-driving cars and facial recognition to medical diagnosis and fraud detection, as even minor manipulations can have substantial consequences.

The subtle yet potent threat of adversarial examples highlights a critical vulnerability in modern Artificial Neural Networks. These maliciously crafted inputs, designed to appear identical to legitimate data for human observers, can reliably fool even highly accurate systems. A slightly altered pixel in an image, a nearly inaudible modification to an audio file, or a carefully chosen synonym in a text can trigger a complete misclassification – causing a self-driving car to misinterpret a stop sign, a medical diagnosis system to overlook a critical anomaly, or a fraud detection algorithm to approve a fraudulent transaction. This susceptibility isn’t merely a theoretical concern; it poses significant security risks in any application where reliable performance is paramount, demanding a proactive shift toward robust and resilient network designs.

Existing network security protocols and defensive algorithms often prove inadequate when confronted with adversarial examples, highlighting a critical gap in modern cybersecurity. These traditional methods, frequently relying on the assumption of well-behaved inputs, fail to account for the subtle, intentionally crafted perturbations that can fool even highly accurate Artificial Neural Networks. Consequently, systems employing these defenses remain vulnerable to attacks that bypass standard security measures, necessitating the development of novel strategies focused on input validation, adversarial training, and robust feature extraction. The pursuit of genuinely robust networks demands a paradigm shift – moving beyond simply detecting malicious inputs to actively neutralizing their impact, ensuring reliable performance even under adversarial conditions.

Mapping the Attack Surface: Common Adversarial Strategies

The Fast Gradient Sign Method (FGSM) is a single-step adversarial attack used to efficiently estimate a model’s vulnerability to adversarial perturbations. It operates by calculating the gradient of the loss function with respect to the input image, and then adding a small perturbation in the direction of the sign of the gradient. This perturbation is constrained by the $L_{\in fty}$ norm, limiting the maximum change to any single pixel. Specifically, the perturbation η is calculated as $\eta = \epsilon \cdot sign( \nabla_{x} J(x,y) )$ , where ε is a small scalar value controlling the perturbation magnitude, $\nabla_{x}$ represents the gradient with respect to the input $x$ , and $J(x,y)$ is the loss function. While computationally inexpensive, FGSM’s single-step nature makes it relatively weak compared to iterative attacks, but serves as a useful initial test for adversarial robustness.

Projected Gradient Descent (PGD) attacks represent an iterative approach to adversarial example generation, improving upon single-step methods by repeatedly refining the input perturbation. Instead of calculating a single, maximum-norm perturbation like the Fast Gradient Sign Method (FGSM), PGD performs multiple small steps, each calculated using the gradient of the loss function with respect to the input. After each step, the perturbation is projected back onto an $L_p$ ball – commonly the $L_\in fty$ or $L_2$ norm – to ensure the perturbation remains within defined bounds. This iterative process allows PGD to find smaller, more subtle perturbations that are often more effective at causing misclassification while being less easily detectable by defenses designed to identify large, obvious perturbations. The number of iterations and the step size are key parameters influencing the attack’s strength and the resulting adversarial example’s transferability.

AutoAttack is an evaluation framework designed to comprehensively assess the robustness of machine learning models against adversarial examples. It distinguishes itself by employing an ensemble of diverse attack algorithms, including $L_{\in fty}$ based methods like PGD, $L_2$ attacks, and boundary attacks, applied in a cascading manner. This multi-attack approach addresses limitations inherent in evaluating with a single attack strategy, as a model resilient to one attack may still be vulnerable to another. AutoAttack automates the process of performing these attacks with carefully tuned parameters, providing a more reliable and rigorous assessment of a model’s true resilience than single-attack evaluations, and is often used as a benchmark for reporting robust accuracy.

Neuroevolution: Cultivating Robustness Through Architectural Design

Neuroevolution provides an automated approach to Convolutional Neural Network (CNN) architecture design, differing from manual or gradient-based methods. This technique utilizes evolutionary algorithms to search for optimal network structures, iteratively refining populations of networks based on a defined fitness function. The potential for discovering architectures resistant to adversarial attacks arises because neuroevolution is not constrained by pre-defined architectural biases or the data distribution used in traditional training. By exploring a wider range of possible structures, neuroevolution can identify networks with inherent robustness characteristics not easily achieved through conventional methods, offering a pathway to improved generalization and resilience against maliciously crafted inputs.

The development of robust convolutional neural networks (CNNs) can be achieved through neuroevolution by optimizing a specifically designed Fitness Function. This function evaluates network performance based on both accuracy on clean, unperturbed data and accuracy against adversarial examples. A key metric used is the Harmonic Robustness Score, which balances these two performance aspects. By maximizing this score during the evolutionary process, NERO-Net effectively prioritizes the creation of networks that maintain high accuracy in both standard and adversarial conditions, resulting in a demonstrable improvement in robustness without requiring adversarial training techniques.

The NERO-Net framework builds upon the Fast-DENSER algorithm by integrating neuroevolution to enhance adversarial robustness. Specifically, NERO-Net achieves an adversarial accuracy of 33.25% when evaluated against the Fast Gradient Sign Method (FGSM) attack utilizing L∞ perturbations. Critically, this level of performance is obtained without employing adversarial training techniques, demonstrating the potential of neuroevolution to discover inherently robust network architectures through optimization of network weights and connections alone.

Architectural Building Blocks for Resilience: A Modular Approach

Convolutional Neural Networks (CNNs) commonly employ repeating units, termed Macro-Nodes, to build hierarchical feature representations. Each Macro-Node consists of a sequence of convolutional layers, activation functions, and potentially other operations like batch normalization. This modular design allows the network to learn features at increasing levels of abstraction; initial layers within a Macro-Node detect low-level features like edges and corners, while subsequent layers combine these into more complex patterns. By stacking multiple Macro-Nodes, the CNN progressively extracts higher-level, more semantically meaningful features from the input data, enabling robust performance across varied inputs and conditions. The depth of these stacked Macro-Nodes is a key determinant of the network’s capacity to model complex relationships within the data.

Skip connections, also known as residual connections, directly add the input of a layer to its output, creating an alternate pathway for gradient flow during backpropagation. This bypass allows gradients to propagate more effectively through deep networks, alleviating the vanishing gradient problem – a key obstacle in training very deep convolutional neural networks. By providing a direct route for gradients, skip connections reduce the impact of successive multiplications in backpropagation, enabling more stable and efficient learning. Furthermore, this architecture improves network robustness by allowing the network to learn identity mappings, facilitating the training of deeper and more complex models without significant degradation in performance.

Transition Blocks within a Convolutional Neural Network (CNN) systematically reduce spatial resolution, typically through operations like strided convolutions or pooling layers. This reduction in spatial dimensions directly lowers the computational demands of subsequent layers, enabling processing of larger inputs with constrained resources. While decreasing resolution, these blocks retain crucial feature information, providing a balance between computational efficiency and representational capacity. This controlled dimensionality reduction also contributes to network resilience by reducing sensitivity to minor input variations and preventing overfitting, effectively creating a more robust feature hierarchy.

The revised implementation allows layer 3 to directly receive input from layer 1, bypassing the requirement for layer 2 as an intermediary, as illustrated by the removal of the dashed red connection.

A Paradigm Shift: Designing for Intrinsic Resilience

NERO-Net represents a significant departure from conventional approaches to artificial intelligence robustness, offering a proactive strategy for building resilient neural networks. Instead of reacting to adversarial attacks after they occur, this system automatically discovers network architectures intrinsically resistant to manipulation. Through an evolutionary process, NERO-Net optimizes not just for accuracy on clean data, but also for sustained performance under adversarial conditions. The resulting architecture, boasting approximately 26.72 million parameters, demonstrates a compelling balance between complexity and resilience, achieving an adversarial accuracy of 40.40% even when subjected to targeted attacks, and maintaining a strong clean accuracy of 84.08% post-training-suggesting a future where AI systems are designed to withstand, rather than succumb to, malicious inputs.

Conventional artificial intelligence security often relies on reactive defenses – patching vulnerabilities after attacks are discovered or developing systems to detect and mitigate threats as they arise. However, a fundamentally different strategy centers on proactively building resilience directly into the network’s architecture. This approach, exemplified by NERO-Net, shifts the focus from responding to attacks to preventing them by designing networks inherently capable of withstanding adversarial perturbations. Rather than adding layers of protection on top of a potentially fragile system, the core structure is evolved to be robust, reducing reliance on post-hoc defenses and creating a more stable and reliable artificial intelligence. This intrinsic resilience represents a paradigm shift, offering a path toward AI systems less susceptible to manipulation and more dependable in real-world applications.

The evolved NERO-Net architecture demonstrates a compelling balance between robustness and performance. Following adversarial training – a process designed to expose and correct vulnerabilities – the network achieved an adversarial accuracy of 40.40%, indicating a significantly improved ability to correctly classify images even when intentionally perturbed. Importantly, this resilience was attained while maintaining a clean accuracy of 84.08% on standard, unperturbed images, a slight decrease from its initial 93.47% prior to adversarial training. This suggests that NERO-Net’s intrinsic resilience doesn’t come at the cost of general performance. Notably, the network’s architecture comprises 26.72 million parameters, representing a considerable increase in complexity compared to the 3.37 million parameters found in the NSGA-Net model – highlighting the trade-off between model size and achieved robustness.

The development of NERO-Net exemplifies a holistic design philosophy, mirroring the interconnectedness of complex systems. Just as a single component cannot be optimized in isolation, the pursuit of adversarial robustness demands consideration of the entire network architecture. Grace Hopper observed, “It’s easier to ask forgiveness than it is to get permission.” This sentiment resonates with NERO-Net’s approach; rather than rigidly adhering to pre-defined structures, the neuroevolutionary process explores a vast design space, iteratively refining networks based on a fitness function directly tied to robustness. The system’s architecture, therefore, isn’t imposed but emerges from the evolutionary search, demonstrating that structure fundamentally dictates behavior and successful adversarial defense.

Future Directions

The demonstration of NERO-Net’s capacity to evolve robust convolutional networks offers a compelling, if predictable, confirmation: directly optimizing for a desired characteristic – in this instance, adversarial resilience – yields networks possessing that characteristic. However, this success merely re-frames the central challenge. The fitness function, while effective, remains a proxy for true robustness – a simplification of a complex, high-dimensional landscape. Future work must confront the inherent limitations of such proxies, acknowledging that improved performance on current adversarial attacks does not guarantee resilience against unforeseen, more sophisticated perturbations.

A critical consideration lies in the architecture itself. The evolutionary process, while capable of discovering robust topologies, operates within the constraints of the chosen representation. It is tempting to assume that the ‘best’ network will emerge solely from the search process, but this neglects the profound influence of the initial scaffolding. A more holistic approach demands co-evolution of both network structure and the search algorithm itself, allowing for adaptation at multiple levels.

Ultimately, the pursuit of adversarial robustness is not simply a technical exercise. It is an exploration of the fundamental principles governing intelligence and perception. The current paradigm, focused on brittle, easily deceived networks, hints at a deeper flaw. Perhaps true robustness arises not from complex defenses, but from a simpler, more elegant understanding of the underlying structure of the data itself.

Original article: https://arxiv.org/pdf/2603.25517.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/