Hidden Signals: Injecting Backdoors into Wireless Communication with Deep Learning

Author: Denis Avetisyan


Researchers demonstrate a novel attack that subtly manipulates radio frequency signals to compromise deep learning-based modulation classification systems.

Models subjected to legitimate use, physical compromise, and digital attacks demonstrate vulnerabilities across all operational domains, highlighting the interconnectedness of security considerations in modern systems.
Models subjected to legitimate use, physical compromise, and digital attacks demonstrate vulnerabilities across all operational domains, highlighting the interconnectedness of security considerations in modern systems.

A physical backdoor attack leveraging power amplifier distortions enables covert control of wireless communication networks.

While deep learning has greatly advanced radio frequency (RF) signal classification, its vulnerability to subtle adversarial manipulation remains a critical concern. This paper, ‘Physical Backdoor Attack Against Deep Learning-Based Modulation Classification’, introduces a novel attack that exploits power amplifier nonlinearities to embed a physical backdoor into modulation classification models. By manipulating RF signal amplitudes during training, the authors demonstrate successful misclassification upon reactivation of the backdoor, even under varying noise conditions and against existing defense mechanisms. Could this physical layer attack represent a significant, and largely overlooked, threat to the security of wireless communication systems?


The Escalating Threat to Signal Integrity

Modulation classification, a cornerstone of spectrum monitoring and cognitive radio systems, faces a growing threat from increasingly sophisticated adversarial attacks. Traditionally, these systems rely on identifying signal modulation types – such as QAM or PSK – to ensure efficient spectrum usage and avoid interference. However, malicious actors are now capable of crafting subtle signal manipulations designed to mislead classifiers, either by disguising the true modulation or injecting false information. This vulnerability extends beyond simple jamming; attackers can potentially create hidden backdoors or disrupt communication networks by subtly altering signals to appear benign while carrying malicious payloads. The escalating sophistication of these attacks necessitates a shift towards more robust and resilient modulation classification techniques capable of detecting and mitigating adversarial interference, safeguarding the integrity of wireless communication systems.

Historically, systems designed to identify wireless signals – a process crucial for spectrum monitoring and defense – operated under the assumption of a predictable, ‘clean’ radio frequency environment. This reliance on benign signal conditions creates a critical vulnerability; malicious actors can subtly manipulate these signals, introducing carefully crafted noise or modifications that bypass traditional detection methods. These adversarial attacks aren’t simply about jamming a signal, but rather about injecting hidden commands or creating backdoors within the communication itself. Because existing algorithms aren’t designed to recognize these intentionally deceptive signals, they can be fooled into misclassifying the modulation scheme, granting unauthorized access, or allowing malicious code to be transmitted undetected. This fundamental flaw highlights the need for robust, security-aware signal processing techniques capable of functioning reliably even in the presence of sophisticated interference and deliberate manipulation.

The growing adoption of deep learning in automatic modulation classification, while enhancing performance, simultaneously introduces novel vulnerabilities exploitable by malicious actors. These systems, trained on datasets of known signals, can be deceived by subtly crafted adversarial examples – intentionally modified signals designed to be misclassified. Unlike traditional methods, which rely on signal characteristics, deep learning models are susceptible to perturbations imperceptible to humans but capable of triggering incorrect classifications, potentially opening backdoors or disrupting communication. Furthermore, the complexity of these models makes it difficult to ascertain their internal workings, hindering the detection of such manipulations and demanding robust defense strategies focused on adversarial training and input validation to ensure reliable spectrum monitoring and secure wireless communication.

The escalating vulnerabilities in modulation classification present a tangible and growing threat to the integrity of wireless communication systems. Compromised signal identification isn’t merely an inconvenience; it opens pathways for malicious actors to inject false data, disrupt critical infrastructure, or intercept sensitive information. Beyond direct attacks, subtle manipulations can create hidden backdoors, allowing persistent and undetectable access to networks. As reliance on wireless technology expands across sectors – from finance and healthcare to transportation and national security – the potential consequences of these vulnerabilities become increasingly severe, demanding proactive defense mechanisms and robust security protocols to ensure both the confidentiality and operational reliability of communication channels.

Classification accuracy decreases with decreasing signal-to-noise ratio (SNR) for both legitimate <span class="katex-eq" data-katex-display="false">f_{\theta}</span> and backdoored <span class="katex-eq" data-katex-display="false">f_{\theta^{*}}</span> models at an input backoff of 3dB.
Classification accuracy decreases with decreasing signal-to-noise ratio (SNR) for both legitimate f_{\theta} and backdoored f_{\theta^{*}} models at an input backoff of 3dB.

Unveiling the Mechanisms of Adversarial Interference

Modulation classifiers, despite their increasing deployment in communication systems, are vulnerable to adversarial attacks that exploit the inherent sensitivity of machine learning models. Techniques such as the Fast Gradient Sign Method (FGSM) and the Carlini-Wagner Attack (CWA) introduce carefully crafted, imperceptible perturbations to valid signals. These perturbations, calculated using the signal’s gradient with respect to the classifier’s loss function, are sufficient to cause misclassification. The effectiveness of these attacks stems from the high dimensionality of the signal space and the non-linear nature of the classification models, allowing for subtle changes to have a disproportionate impact on the output. Consequently, even small, intentionally designed signal distortions can reliably fool the classifier, compromising its accuracy and potentially enabling malicious activity.

Backdoor attacks represent a significant threat to modulation classifiers by subtly manipulating the training data. These attacks involve embedding hidden triggers within the training set, allowing an attacker to reliably control the classifier’s output when the trigger is present in a subsequently transmitted signal. Recent evaluations demonstrate a high Attack Success Rate (ASR) of approximately 95%, indicating a strong ability to induce misclassification. Critically, this level of control can be achieved with a relatively low poisoning ratio of only 5%, meaning only 5% of the training data needs to be maliciously modified to compromise the system’s integrity.

Physical backdoor attacks manifest by introducing covert triggers during the over-the-air transmission of signals. These attacks specifically target the radio frequency (RF) chain, commonly exploiting non-linearities within the Power Amplifier (PA) and signal clipping mechanisms. By carefully crafting input signals, an attacker can induce predictable distortions that act as the trigger. These distortions, imperceptible to standard signal analysis, are then learned by the modulation classifier during training, associating the physical distortion with a specific, attacker-chosen class. Subsequent signals exhibiting the same distortion are consistently misclassified, enabling targeted control over system behavior. This approach differs from digital poisoning as it operates on the physical layer, making detection more challenging and bypassing many software-based security measures.

Conventional security measures for modulation classification systems often rely on detecting anomalies in signal characteristics or validating data integrity at specific points. Backdoor attacks circumvent these defenses by subtly manipulating the training data itself, embedding triggers that remain dormant during normal operation. This means standard anomaly detection may not flag the poisoned data, and the system will function normally until the trigger is presented. Consequently, the malicious behavior – misclassification dictated by the attacker – occurs after initial security checks, making immediate detection exceptionally difficult. The lack of readily apparent signal distortions or deviations from expected parameters allows these attacks to compromise system integrity without triggering existing safeguards, creating a sustained vulnerability.

Simulated attacks demonstrate that the attack success rate <span class="katex-eq" data-katex-display="false"> (ASR) </span> increases with signal-to-noise ratio <span class="katex-eq" data-katex-display="false"> (SNR) </span> when operating at a backoff of <span class="katex-eq" data-katex-display="false"> 3 </span> dB.
Simulated attacks demonstrate that the attack success rate (ASR) increases with signal-to-noise ratio (SNR) when operating at a backoff of 3 dB.

Strategies for Fortifying Signal Classification

STRIP (Statistical Robustness Test for Inputs) employs Shannon Entropy as a metric to identify potentially malicious inputs designed to trigger backdoors in machine learning models. The technique operates on the premise that backdoored inputs often exhibit statistically unusual characteristics compared to legitimate data, resulting in differing entropy values. Specifically, STRIP calculates the entropy of the input signal and flags samples that deviate significantly from the expected distribution of clean data. This analysis focuses on identifying anomalous signal patterns, such as unexpected frequency components or unusual distributions of pixel values, which may indicate the presence of a backdoor trigger. The core principle relies on the assumption that backdoors introduce subtle but detectable perturbations to the input signal, making entropy a viable indicator of malicious intent.

Activation Clustering employs dimensionality reduction via Principal Component Analysis (PCA) followed by K-means Clustering to detect backdoors within neural networks. PCA reduces the high-dimensional space of internal model activations to a lower-dimensional representation, preserving the most significant variance. Subsequently, K-means Clustering groups these reduced activation patterns. Backdoors often manifest as distinct clusters due to the specific, anomalous features they introduce during input processing. By identifying these outlier clusters, which deviate from the typical activation patterns of benign inputs, the presence of a backdoor can be inferred without requiring prior knowledge of the trigger itself. This approach analyzes the internal representations learned by the model, focusing on the behavioral differences induced by malicious inputs.

Neural Cleanse is a technique used to identify and mitigate backdoor vulnerabilities in neural networks by reconstructing the minimal input trigger that activates the malicious behavior. The method operates by formulating an optimization problem that searches for the smallest perturbation to a clean input that causes the model to misclassify it according to the backdoor’s target label. This reconstructed trigger, often visually interpretable as a pattern or texture, provides insight into the backdoor’s mechanism and allows for targeted removal, either by retraining the model with filtered data or by directly modifying the model’s weights to counteract the trigger’s influence. The efficacy of Neural Cleanse relies on the assumption that backdoors are activated by relatively simple, low-dimensional triggers.

Evaluations of defenses such as STRIP, which utilize Shannon Entropy for backdoor detection, have revealed limited efficacy in distinguishing between benign and maliciously triggered inputs. Specifically, analysis using the Silhouette Score – a metric measuring the separation of clusters – demonstrates a significant overlap in the entropy distributions of clean and triggered samples, registering approximately 0.07. This low score indicates that STRIP frequently misclassifies triggered inputs as clean, and vice versa, suggesting a substantial failure rate in accurately identifying backdoored signals and highlighting the need for improved detection methodologies.

The application of techniques such as STRIP, Activation Clustering, and Neural Cleanse represents a developing area of research aimed at improving the resilience of modulation classifiers. These methods address vulnerabilities to sophisticated attacks, including those employing backdoored inputs or hidden trigger patterns. While current implementations, like STRIP, exhibit limitations in reliably distinguishing between benign and malicious samples – as indicated by low Silhouette Scores – the underlying principles demonstrate potential for enhancing classifier robustness. Ongoing research focuses on refining these techniques and developing novel approaches to effectively detect and mitigate adversarial manipulations targeting modulation classification systems.

Towards Resilient Communication in Hostile Environments

Convolutional Neural Networks (CNNs), notably the VT-CNN2 architecture, have emerged as a powerful tool for automatic modulation classification, even when faced with extremely challenging signal conditions. Unlike traditional methods reliant on hand-crafted features, CNNs learn directly from the raw signal data, enabling them to identify subtle patterns indicative of different modulation schemes. This capability is particularly crucial in low Signal-to-Noise Ratio (SNR) environments, where signals are weak and easily obscured by noise; VT-CNN2, for example, consistently achieves high classification accuracy even as SNR drops significantly. The network’s convolutional layers effectively extract relevant features, while its deep architecture allows for complex non-linear relationships to be modeled, ultimately providing a robust solution for identifying modulation types in adverse wireless communication scenarios.

Despite the demonstrated effectiveness of Convolutional Neural Networks (CNNs) in challenging wireless environments, these systems are not invulnerable. Recent research reveals CNNs, even high-performing architectures like VT-CNN2, can be compromised through carefully crafted adversarial attacks and the subtle insertion of backdoors. These attacks, designed to mislead the classification process, maintain a high success rate across a broad range of signal-to-noise ratios, indicating a significant vulnerability. This susceptibility underscores the critical need for layered defense strategies – combining robust network architectures with proactive security mechanisms – to ensure the reliability and integrity of modulation classification in the face of increasingly sophisticated threats. Simply relying on a single line of defense is insufficient; a multi-faceted approach is necessary to protect against both overt attacks and hidden compromises.

Evaluations reveal a consistently high attack success rate – exceeding 92% – across a practical range of signal-to-noise ratios, from -8dB to 10dB. This finding underscores the attacks’ effectiveness not merely in ideal conditions, but within the fluctuating and often degraded environments characteristic of real-world wireless communication. Maintaining such a high success rate despite varying channel conditions demonstrates a significant resilience, indicating these attacks are less susceptible to being mitigated by typical signal propagation effects or ambient noise. The consistency of the attacks’ performance across this SNR spectrum highlights a critical vulnerability requiring layered defensive strategies to ensure reliable modulation classification in challenging operational scenarios.

The convergence of resilient Convolutional Neural Network (CNN) architectures and sophisticated defense strategies promises a substantial uplift in the security and dependability of modern wireless communication. Current modulation classification systems, while effective, remain vulnerable to targeted attacks and subtle backdoors; integrating robust CNN designs – those inherently resistant to noise and interference – with layers of proactive defense mechanisms creates a far more secure paradigm. This combined approach doesn’t merely detect malicious signals, but actively mitigates their impact, ensuring accurate signal identification even amidst adversarial interference and hidden threats. The resulting systems are poised to maintain reliable communication links in increasingly complex and hostile radio frequency environments.

A critical advancement in wireless communication security lies in the ability to maintain accurate modulation classification despite deliberate attempts at disruption. Recent research demonstrates a system capable of reliably identifying signal modulation types-such as QPSK or 8PSK-even when subjected to malicious interference or subtly implanted ‘backdoor’ attacks. This resilience is achieved not through simply detecting overt threats, but by building inherent robustness into the classification process itself, allowing the system to discern legitimate signals from adversarial manipulations across a broad range of signal-to-noise ratios. The implications are significant; such a capability safeguards communication integrity by preventing attackers from injecting false data or concealing their presence, ultimately bolstering the dependability of wireless networks in increasingly hostile environments.

The pursuit of robust systems, as demonstrated in this work on physical backdoor attacks, reveals a fundamental truth: complexity breeds vulnerability. The researchers cleverly exploit power amplifier distortions to introduce a subtle, physical-layer manipulation, bypassing defenses focused on the digital domain. This echoes Hilbert’s assertion, “We must be able to answer the question: What are the ultimate limits of our knowledge?” – for even sophisticated deep learning models, seemingly secure, possess inherent limits when confronted with attacks that operate outside conventional boundaries. If the system looks clever, it’s probably fragile; here, the elegance of the attack lies in its simplicity and physical grounding, a reminder that architecture dictates behavior, even at the radio frequency level.

Beyond the Signal

The demonstrated susceptibility of deep learning-based modulation classification to physical-layer manipulation highlights a fundamental truth: a system is only as secure as its weakest link, and that link often resides not in the algorithmic complexity, but in the analog world it interfaces with. To address this vulnerability is not merely to build a better detector, but to fundamentally reconsider the entire signal chain as a holistic entity. One cannot simply replace the amplifier without understanding the implications for the classifier – the distortion becomes a conduit, a subtle language spoken between attacker and machine.

Future work must move beyond treating distortion as noise, and instead view it as a potential communication channel. The resilience shown against standard defenses suggests that adversarial training, while useful, is a local fix to a systemic problem. A more robust solution likely lies in designing classifiers intrinsically invariant to specific amplifier characteristics, or even incorporating physical-layer models directly into the learning process. This is not simply an engineering challenge, but a question of architectural philosophy – designing for robustness requires acknowledging the inherent imperfections of the physical world.

Ultimately, the exploration of such attacks forces a reckoning: the pursuit of increasingly complex machine learning models may, paradoxically, create more vulnerabilities if not coupled with a deeper understanding of the underlying physical realities. The elegance of a solution will not be found in added complexity, but in a return to fundamental principles – simplicity, clarity, and a holistic view of the system as an interconnected whole.


Original article: https://arxiv.org/pdf/2603.25304.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-28 20:50