Unmasking Bitcoin: A New Approach to Transaction Tracing

Author: Denis Avetisyan


Researchers have developed novel methods for identifying the origins of Bitcoin transactions by analyzing network traffic patterns.

Collaborative analysis reveals how differing perspectives, when integrated, can refine understanding and expose previously hidden facets of a complex problem.
Collaborative analysis reveals how differing perspectives, when integrated, can refine understanding and expose previously hidden facets of a complex problem.

This review details NTSSL and NTSSL+, semi-supervised learning techniques that improve the accuracy of Bitcoin transaction deanonymization through transaction clustering and node fingerprinting.

Despite the pseudonymity of Bitcoin, complete user privacy remains elusive due to inherent network characteristics. This paper, ‘Deanonymizing Bitcoin Transactions via Network Traffic Analysis with Semi-supervised Learning’, introduces novel methods-NTSSL and NTSSL+-that leverage semi-supervised learning and transaction clustering to enhance the accuracy of Bitcoin node deanonymization. Experimental results demonstrate a 1.6\times performance improvement over existing machine learning approaches by integrating network traffic analysis with cross-layer collaborative techniques. Could these advances ultimately necessitate more robust privacy-preserving technologies within decentralized cryptocurrency systems?


The Illusion of Privacy: Bitcoin’s Transparent Ledger

Despite often being portrayed as a tool for financial secrecy, Bitcoin’s fundamental design inherently compromises user privacy. Every transaction is recorded on the blockchain, a publicly accessible and immutable ledger. While identities aren’t directly linked to addresses, the complete transaction history – detailing the amount sent and received – is visible to anyone. This transparency allows for sophisticated tracking of funds, as all Bitcoin activity can be mapped and analyzed. Consequently, the system doesn’t offer true anonymity, but rather a pseudonymity that relies on obscuring the link between real-world identities and Bitcoin addresses – a layer of privacy increasingly vulnerable to advanced investigative techniques.

The initial promise of Bitcoin as a truly anonymous financial system is facing increasing scrutiny due to the development of advanced deanonymization techniques. While Bitcoin transactions aren’t directly linked to real-world identities, sophisticated analysis can often reveal connections between seemingly disparate transactions and, ultimately, to the users controlling them. Researchers employ methods like transaction clustering, identifying common inputs and outputs, and analyzing network propagation delays to trace the flow of funds. Furthermore, linking Bitcoin addresses to information gleaned from exchanges – where users are required to provide identifying details – and correlating on-chain activity with off-chain data, such as IP addresses, significantly erodes the perceived anonymity. These evolving techniques demonstrate that Bitcoin’s public ledger, while offering pseudonymity, does not guarantee true anonymity, and users should be aware of the growing potential for their transactions to be linked back to their identities.

Sophisticated analysis of Bitcoin transactions doesn’t target individual keys, but rather identifies patterns within the network to de-anonymize users. Researchers observe how coins are mixed, spent, and clustered – recognizing that most individuals don’t randomly spend their Bitcoin, but instead exhibit traceable behaviors. By mapping transaction graphs and correlating them with network timings – such as when and from where transactions originate – analysts can link seemingly disparate addresses to a single entity. This is further refined by recognizing ‘change’ outputs – the small amounts returned to the user after a transaction – which often reveal connections between multiple transactions controlled by the same person. Even metadata from internet connections, combined with transaction timing, can narrow down the possible physical locations of users, creating a surprisingly detailed picture of their Bitcoin activity despite the pseudonymous nature of the cryptocurrency.

Layered Attacks: Peeling Back the Pseudonymity

Transaction layer deanonymization relies on the principle of clustering Bitcoin transactions to reveal common ownership. This method analyzes the transaction graph, identifying inputs and outputs that are controlled by the same entity. Techniques include identifying change addresses – where outputs of a transaction are sent back to the sender’s control – and tracking coinjoins, which attempt to obscure transaction origins by mixing multiple inputs. While coinjoins increase complexity, heuristic analysis can still link inputs and outputs based on transaction amounts, timing, and network patterns. Successful deanonymization at this layer allows for the creation of a “wallet graph” connecting multiple addresses under common control, despite the pseudonymous nature of Bitcoin.

Network Layer Deanonymization involves correlating Bitcoin transaction data with the Internet Protocol (IP) addresses of nodes participating in the network. This is achieved by monitoring the Bitcoin network for transaction broadcasts and associating those broadcasts with the IP address of the relaying node. Because Bitcoin transactions are not inherently private at the network level, an attacker observing network traffic can link a specific IP address to a Bitcoin address involved in a transaction. While IP addresses do not directly reveal user identity, they can be used in conjunction with other data sources, such as Internet Service Provider (ISP) records or website logs, to potentially deanonymize Bitcoin users. This technique differs from transaction clustering, which focuses solely on on-chain transaction patterns, by adding an off-chain network-level component to the analysis.

Research conducted by Biryukov et al. and Apostolaki et al. has shown a significant increase in the effectiveness of Bitcoin transaction tracking when combining transaction clustering with network-level analysis. Biryukov’s work demonstrated that linking transaction outputs to originating IP addresses, even with a small percentage of successfully correlated data, drastically improves the ability to identify the same user across multiple transactions. Apostolaki et al. further refined these techniques by exploiting timing correlations between network connections and transaction broadcasts. These combined attacks leverage the weaknesses of both layers – transaction patterns and network behavior – to de-anonymize Bitcoin users more reliably than either technique applied in isolation, posing a greater risk to user privacy.

NTSSL: A New Paradigm, or Just More Sophisticated Surveillance?

NTSSL employs a semi-supervised learning approach to deanonymize network traffic by combining the strengths of both unsupervised and supervised machine learning techniques. This methodology initially utilizes unsupervised anomaly detection algorithms – including Isolation Forest, One-Class SVM, and Auto Encoders – to identify unusual patterns within network transactions without requiring pre-labeled data. Subsequently, the system incorporates a supervised transaction classification stage using the XGBoost algorithm, trained on labeled transaction data to improve the accuracy of identifying transaction origins. This hybrid approach allows NTSSL to leverage the benefits of both methods – the ability to detect novel, previously unseen anonymization techniques via unsupervised learning, and the precision of supervised learning when classifying known transaction types – resulting in a more robust and effective deanonymization system compared to purely unsupervised or supervised methods.

NTSSL utilizes a two-stage process to enhance deanonymization accuracy. Initially, unsupervised anomaly detection algorithms – including Isolation Forest, One-Class SVM, and Auto Encoders – are employed to identify unusual patterns within network transaction data without requiring pre-labeled examples. Subsequently, these potentially anomalous transactions are fed into a supervised transaction classification model built using the XGBoost algorithm. This supervised component is trained on labeled data to categorize transactions, leveraging the features identified during the anomaly detection phase. The combination of these unsupervised and supervised techniques allows NTSSL to capitalize on both the discovery of novel, suspicious activity and the precise categorization of known transaction types, resulting in improved identification of transaction origins compared to methods relying solely on either approach.

Anomaly detection, the initial phase of the NTSSL methodology, employs unsupervised machine learning techniques to identify unusual patterns within network transaction data. Specifically, algorithms such as Isolation Forest, which isolates anomalies by randomly partitioning the data space, One-Class Support Vector Machines (SVM), which define a boundary around normal data and flag outliers, and Auto Encoders, a type of neural network trained to reconstruct input data and identify deviations, are utilized. These methods operate without requiring pre-labeled data, instead learning the characteristics of typical transactions to highlight those that significantly deviate, thereby flagging potentially deanonymizing activity for further analysis within the NTSSL framework.

NTSSL+ represents an advancement in deanonymization techniques through the integration of transaction clustering, utilizing data from both network and transaction layers. This combined approach yields a recall rate of up to 72.7% and a precision rate approaching 60% when identifying the origins of Bitcoin transactions. Performance benchmarks demonstrate that NTSSL+ surpasses the capabilities of existing methods, such as PERIMETER, in accurately linking transactions to their initiating sources. The inclusion of transaction clustering allows for a more nuanced analysis, improving the ability to discern patterns and identify previously obscured connections within the Bitcoin network.

NTSSL demonstrates a performance level of 0.74 F1-score when all network connections are intercepted and analyzed. The enhanced NTSSL+ variant builds upon this baseline, achieving a 20% to 40% performance improvement over NTSSL under the same 100% connection interception conditions. This indicates NTSSL+’s ability to more accurately identify and classify network traffic origins compared to the original NTSSL methodology, while maintaining complete network visibility.

NTSSL+ demonstrates a significant performance advantage over purely unsupervised learning methodologies when operating with limited network visibility. Specifically, at a 25% level of connection control – meaning only 25% of network traffic is intercepted and analyzed – NTSSL+ achieves a 1.6-fold increase in deanonymization accuracy as compared to standard unsupervised learning techniques. This improvement indicates that the integration of semi-supervised learning, combining anomaly detection with supervised transaction classification using XGBoost, effectively leverages available data even with constrained network observation, resulting in substantially enhanced identification of Bitcoin transaction origins.

NTSSL and NTSSL+ consistently outperform PERIMETER across recall, false positive rate, precision, and <span class="katex-eq" data-katex-display="false">F_1</span>-score as the number of intercepted connections increases, demonstrating particularly strong recall performance.
NTSSL and NTSSL+ consistently outperform PERIMETER across recall, false positive rate, precision, and F_1-score as the number of intercepted connections increases, demonstrating particularly strong recall performance.

A Perpetual Arms Race: Privacy Measures and the Inevitable Countermeasures

A suite of technologies actively works to shield the origins of transactions and bolster user privacy within blockchain systems. Tools like Tor and Virtual Private Networks (VPNs) mask the user’s IP address, creating an initial layer of anonymity. Specialized blockchain protocols, such as Erlay and Dandelion++, further obfuscate transaction pathways; Erlay streamlines transaction propagation, reducing identifiable network patterns, while Dandelion++ employs a ‘stem’ and ‘fluff’ model to broadcast transactions through a randomized network of nodes before reaching the wider blockchain. These systems don’t operate in isolation, but rather aim to create multiple layers of indirection, making it significantly more difficult to trace a transaction back to its initiating address and, ultimately, the user.

Peer-to-peer (P2P) encryption represents a crucial advancement in bolstering transaction privacy within decentralized networks. By directly encrypting communications between nodes – the computers participating in the network – it circumvents the need for centralized intermediaries that could potentially monitor or log data. This direct encryption ensures that only the communicating nodes can decipher the information exchanged, shielding transaction details from eavesdropping and unauthorized access. Unlike systems where data might be briefly exposed during routing, P2P encryption establishes secure, end-to-end communication channels, effectively minimizing the potential for data breaches and enhancing the confidentiality of each transaction. The implementation of P2P encryption, therefore, acts as a foundational element in creating a more secure and private decentralized ecosystem.

Despite the increasing sophistication of privacy-enhancing technologies, complete anonymity remains an elusive goal. Current defenses, such as Tor and VPNs, are not impervious to advanced deanonymization techniques, including traffic analysis, timing attacks, and correlation methods. Adversaries are continually developing new strategies to link transactions to their origins, necessitating a proactive and iterative approach to security. Consequently, ongoing research and development are crucial; simply deploying existing countermeasures is insufficient. The landscape demands continuous refinement of these tools, alongside the exploration of novel cryptographic protocols and network architectures, to maintain a meaningful level of privacy in the face of evolving threats and increasingly powerful surveillance capabilities.

The pursuit of robust digital privacy necessitates a shift toward synergistic defenses, rather than isolated solutions. Current countermeasures – such as Tor, VPNs, and P2P encryption – offer valuable but incomplete protection, each vulnerable to increasingly sophisticated deanonymization techniques. Consequently, future research is prioritizing the integration of these existing technologies, aiming to create layered privacy systems where the failure of one component doesn’t compromise the entire network. Beyond combination, exploration into entirely novel approaches is crucial; this includes investigating advanced cryptographic methods, decentralized mixing protocols, and perhaps even harnessing the potential of zero-knowledge proofs to obscure transaction details without revealing underlying data. This continuous cycle of innovation and integration is essential to maintain a meaningful level of privacy in an evolving digital landscape.

Diffusion's transaction propagation exhibits differing random delays between outbound and inbound connections, indicating asynchronous communication patterns.
Diffusion’s transaction propagation exhibits differing random delays between outbound and inbound connections, indicating asynchronous communication patterns.

What’s Next?

The presented methods, while achieving incremental gains in transaction attribution, merely relocate the problem. Increased accuracy in node fingerprinting does not address the fundamental tension: any system designed to reveal origins will inevitably be subverted by those motivated to obscure them. The field chases increasingly sophisticated obfuscation techniques, then more sophisticated deanonymization-a perpetually escalating arms race. It’s a predictable pattern.

Future work will undoubtedly explore adversarial learning, attempting to build systems resilient to counter-strategies. However, this assumes an optimal adversary – a simplification. The true challenge isn’t defeating a rational, well-resourced attacker, but the countless, low-effort attempts at privacy preservation that collectively erode the utility of attribution. The signal will continue to degrade.

Ultimately, the focus should shift from attempting to ‘solve’ anonymity – a statistically impossible task – to quantifying and managing the inherent uncertainty. The question isn’t whether transactions can be deanonymized, but at what cost, and with what level of confidence. The pursuit of absolute certainty is a known error. Perhaps the real innovation lies not in better deanonymization, but in accepting the limitations of attribution, and designing systems that function effectively despite it.


Original article: https://arxiv.org/pdf/2603.17261.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2026-03-20 04:46