Unlocking Insights from Payment Streams

Author: Denis Avetisyan

A new foundation model leverages the power of Transformers to understand and analyze high-volume transaction data, opening doors for improved fraud detection and personalized recommendations.

The system employs a dual-prediction module, leveraging the output of a Transformer decoder-designated HH-to concurrently estimate current signals and anticipate subsequent transaction attributes, effectively mirroring its internal state to project future behavior and manage systemic decay.

TREASURE is a Transformer-based model designed to generate powerful embeddings from sequential transaction data, outperforming existing approaches on standalone tasks and downstream applications.

Modern commerce generates vast volumes of transactional data, yet effectively modeling this information for nuanced insights remains a significant challenge. This paper introduces TREASURE: A Transformer-Based Foundation Model for High-Volume Transaction Understanding, a novel foundation model designed to capture both consumer behavior and network signals within transaction data. TREASURE demonstrably outperforms existing systems, achieving up to 111% improvement in anomaly detection and 104% enhancement in recommendation accuracy when used as an embedding provider. Will this approach unlock a new era of personalized financial services and proactive fraud prevention?

The Erosion of Temporal Context in Sequential Data

Recurrent neural networks, specifically Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) architectures, were initially designed to process sequential data by maintaining an internal state representing past information. However, these models often encounter difficulties when discerning relationships between data points separated by many steps – a phenomenon known as the vanishing gradient problem. As information propagates through numerous recurrent connections, the gradient used to update the network’s weights can diminish exponentially, effectively erasing earlier inputs from the internal state. This limitation hinders the ability of LSTMs and GRUs to model truly long-range dependencies, impacting their performance in tasks requiring the understanding of complex patterns spanning extended sequences, such as natural language processing or time series forecasting. Consequently, critical contextual information can be lost, leading to inaccurate predictions and a reduced capacity to capture the full complexity inherent in sequential data.

The escalating volume and intricacy of modern transactional datasets – encompassing everything from financial transactions to user behavior streams – present a formidable challenge to conventional predictive modeling techniques. Existing approaches, while effective on smaller, simpler datasets, often encounter limitations in scalability and computational efficiency when confronted with these massive dataflows. The sheer number of variables and interactions within these datasets demands exponentially increasing processing power and memory, frequently exceeding the capabilities of standard hardware and software configurations. This computational burden not only restricts the ability to train complex models but also impedes real-time or near real-time prediction, ultimately diminishing the practical utility and predictive power derived from the data. Consequently, innovative methodologies are needed to effectively harness the information contained within these complex transactional environments.

Analyzing grouped transactions reveals that cardholder shopping behavior is characterized by both consistent, static attributes and evolving, dynamic ones.

TREASURE: A Foundation Model Built on the Sands of Time

TREASURE utilizes the Transformer architecture, originally developed for natural language processing, to analyze sequential tabular data such as transaction histories. This approach allows the model to process each transaction within a sequence while considering the context of preceding transactions, unlike traditional tabular models that treat each record independently. The self-attention mechanism within the Transformer enables TREASURE to weigh the importance of different transactions in the sequence when making predictions, effectively capturing temporal dependencies and intricate relationships within the transaction data. This capability is particularly beneficial for tasks requiring an understanding of cardholder behavior over time, such as fraud detection or risk assessment, where the order and context of transactions are critical signals.

TREASURE is pre-trained on extensive datasets of transactional data, establishing a generalized understanding of payment patterns and cardholder behavior. This pre-training process allows for transfer learning, where the model can be adapted to a variety of downstream tasks – such as fraud detection, risk scoring, or anomaly detection – with significantly reduced requirements for task-specific labeled data. By leveraging the knowledge acquired during pre-training, TREASURE minimizes the need for extensive fine-tuning, accelerating model deployment and reducing computational costs associated with training from scratch for each new application.

TREASURE integrates both static and dynamic transaction attributes into its model architecture to comprehensively represent cardholder behavior. Static attributes encompass fixed cardholder characteristics and merchant details, such as card type, merchant category code, and location. Dynamic attributes consist of time-varying transaction features including purchase amount, transaction time, and frequency. By combining these attribute types, TREASURE captures a complete picture of each transaction and the overall behavioral patterns of cardholders within the payment network, enabling more accurate modeling and analysis of financial activity.

TREASURE employs a unified model architecture integrating retrieval and generation for enhanced performance.

The Architecture of Efficiency: Sculpting TREASURE for Scalability

TREASURE utilizes next-transaction prediction during training to improve its ability to model sequential user behavior. This technique involves presenting the model with a sequence of transactions and tasking it with predicting the subsequent transaction. By learning to anticipate future actions based on past behavior, the model develops a stronger understanding of user patterns and dependencies within transactional data. This approach allows TREASURE to move beyond treating each transaction in isolation and instead recognize the temporal relationships that influence user choices, ultimately enhancing its predictive accuracy and ability to generalize to new sequences.

TREASURE utilizes negative sampling to mitigate the computational burden associated with high-cardinality categorical features during training. This technique addresses the expense of calculating probabilities across a large number of possible values for each feature by approximating the softmax function. Instead of computing the probability for every possible value, negative sampling randomly selects a small number of negative examples – incorrect values – for each positive example. The model then learns to discriminate between the positive example and these sampled negatives, significantly reducing the computational complexity from $O(V)$ to $O(K)$, where $V$ represents the vocabulary size and $K$ is the number of negative samples. This approach maintains model accuracy while substantially decreasing training time and memory requirements when dealing with features containing numerous unique values.

TREASURE utilizes a loss aggregation strategy to manage the contributions of multiple loss terms during training. This strategy involves weighted summation of individual losses – including those for prediction accuracy, feature representation, and regularization – to optimize overall model performance. The weights are dynamically adjusted based on the relative magnitudes and importance of each loss term, preventing any single term from dominating the learning process and causing instability. This balanced aggregation ensures consistent convergence and facilitates effective learning across the entire model, particularly when dealing with complex datasets and high-dimensional feature spaces. Specifically, the aggregated loss $L_{agg}$ is calculated as $L_{agg} = \sum_{i=1}^{n} w_i L_i$, where $w_i$ represents the weight for the $i$-th loss term $L_i$ and $n$ is the total number of loss terms.

TREASURE significantly reduces training memory usage by sharing negative samples across samples and time steps within a batch, as demonstrated by the observed efficiency improvements.

Unlocking the Narrative Within: TREASURE’s Impact on Financial Intelligence

TREASURE demonstrably elevates the accuracy of abnormal behavior detection within financial transactions, offering a significant leap forward in fraud prevention and overall security. The system achieves this through the creation of detailed transaction embeddings, allowing for the identification of subtle anomalies that would otherwise go unnoticed. Independent evaluations reveal a remarkable 111% improvement in performance compared to the currently deployed fraud detection system, meaning fewer false positives and, crucially, a dramatically increased capacity to identify and neutralize fraudulent activity before financial loss occurs. This heightened accuracy translates directly into enhanced protection for both financial institutions and their customers, fostering greater trust and security within the digital financial landscape.

TREASURE’s capacity to construct detailed embeddings from transactional data significantly enhances merchant recommendation systems. These embeddings, effectively numerical representations of spending habits and preferences, allow for a far more nuanced understanding of customer behavior than traditional methods. Consequently, businesses can deliver highly relevant and personalized merchant suggestions, fostering increased customer engagement and ultimately driving revenue growth. Rigorous testing demonstrates a substantial 104% performance increase in recommendation accuracy when utilizing TREASURE-generated embeddings, indicating a powerful tool for businesses seeking to optimize customer experiences and maximize profitability through targeted offerings.

TREASURE fundamentally alters how transactional data is understood by constructing detailed, contextualized representations of each customer’s financial history. Rather than treating transactions as isolated events, the model weaves together patterns of spending, timing, and merchant interactions to create a holistic profile. This nuanced understanding enables significantly more accurate risk assessment, moving beyond simple fraud detection to identify subtle indicators of financial vulnerability or changing customer behavior. Consequently, financial institutions can offer truly personalized services, such as tailored credit lines, proactive financial advice, or customized reward programs, fostering stronger customer relationships and promoting financial well-being. The ability to interpret the story behind the transactions, rather than just the transactions themselves, unlocks a new level of insight and value.

TREASURE-generated embeddings outperform supervised-learning embeddings in recommendation tasks.

The pursuit of robust transaction understanding, as exemplified by TREASURE, acknowledges an inherent truth about all systems: they are not static entities but rather processes subject to the relentless flow of time. The model’s capacity to generate embeddings for downstream tasks, specifically anomaly detection, highlights a pragmatic acceptance of eventual decay. As Grace Hopper famously stated, “It’s easier to ask forgiveness than it is to get permission.” This sentiment mirrors the approach taken by TREASURE; rather than striving for perfect, immutable predictions, it focuses on adaptable representations capable of responding to the inevitable shifts within transaction data, offering a flexible, rather than rigid, defense against emerging anomalies.

What Lies Ahead?

The introduction of TREASURE, a foundation model sculpted for the specific gravity of transaction data, feels less like an arrival and more like a well-marked starting point. The model’s efficacy as both a direct solver and an embedding provider confirms a suspicion long held: that structured sequential data, even in its high-volume form, yields to the representational power of Transformer networks. However, performance gains, however substantial, are merely deferred costs. Each simplification inherent in embedding creation-each reduction of dimensionality, each abstracted feature-creates a future debt, a loss of fidelity against which all downstream tasks must contend.

The true measure of this work will not be its current benchmark scores, but its resistance to decay. Transactional landscapes are not static; they shift with economic tides, regulatory pressures, and the relentless innovation of fraud. A model trained on today’s patterns will inevitably encounter anomalies it was never designed to recognize. The challenge, then, lies not in achieving peak performance, but in building systems that age gracefully, that can accommodate novelty without catastrophic failure.

Future work must address the question of systemic memory. How can a foundation model retain a historical understanding of transactional behavior, allowing it to differentiate between genuine shifts and malicious deviations? Furthermore, exploring the interplay between model size and data provenance-understanding where the training data came from, and what biases it carries-will be critical. The accumulation of knowledge, after all, is only valuable if that knowledge remains relevant, and reliably reflects the world it purports to represent.

Original article: https://arxiv.org/pdf/2511.19693.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Erosion of Temporal Context in Sequential Data

TREASURE: A Foundation Model Built on the Sands of Time

The Architecture of Efficiency: Sculpting TREASURE for Scalability

Unlocking the Narrative Within: TREASURE’s Impact on Financial Intelligence

What Lies Ahead?

See also: