Counting on User Behavior: A New Approach to Sequential Modeling

Author: Denis Avetisyan

Researchers have developed a self-supervised learning method that leverages the frequency of user actions to build more accurate models of online behavior.

Abacus integrates multi-task learning across Abacus, MSM, and BT pretext tasks during pretraining, subsequently refining both event and sequence embeddings for enhanced performance on downstream applications.

Abacus aligns pretraining with counting signals from event sequences, improving user modeling for display advertising and accelerating convergence in sequential recommendation.

Modeling user behavior in display advertising remains challenging due to sparse positive signals and the inherent stochasticity of user actions. This paper introduces Abacus: Self-Supervised Event Counting-Aligned Distributional Pretraining for Sequential User Modeling, a novel approach that leverages self-supervised learning to align pretraining with the statistical properties of user event sequences. By predicting the empirical distribution of event counts, Abacus enhances deep sequential models, accelerating convergence and improving performance on downstream tasks – achieving up to a 6.1% AUC increase. Could this counting-aligned pretraining paradigm unlock more robust and interpretable user models across diverse sequential recommendation systems?

The Illusion of Measurement: Beyond Simple Counts

The foundation of much display advertising rests on counting features – simple metrics like the number of times a user visits a website, clicks on an ad, or makes a purchase. While easily quantifiable, these approaches offer a remarkably limited understanding of user behavior. They treat individuals as collections of aggregated statistics, obscuring the nuances of why a user interacts with content. A user who views ten product pages but buys nothing is treated similarly to one who views one and immediately completes a transaction, masking potentially critical differences in intent and engagement. This reliance on surface-level counts fails to capture the context, sequence, and motivations driving user actions, ultimately hindering the development of truly personalized and effective advertising strategies.

Traditional user modeling techniques often treat each user action as an isolated event, overlooking the crucial context of how those actions unfold over time. This sequential disregard significantly limits the potential for truly personalized experiences; understanding that a user viewed a product after reading a specific review, or abandoned a shopping cart following a particular promotional offer, provides invaluable insight. These patterns reveal intent and preference far beyond simple click counts or demographic data. Consequently, systems reliant on aggregated statistics struggle to anticipate needs or offer relevant recommendations, instead delivering generalized content that fails to resonate with individual user journeys. Capturing the order and relationships between interactions is therefore paramount to building models that accurately reflect user behavior and enable meaningful personalization.

Truly effective user modeling transcends the limitations of simply counting behaviors; it demands an investigation into the underlying motivations driving those actions. Rather than focusing solely on what a user does – the pages visited, the items clicked – advanced models seek to understand why. This necessitates incorporating contextual information, such as the user’s immediate goals, past experiences, and even subtle cues within the interaction itself. By shifting the emphasis from aggregated statistics to a more nuanced understanding of intent, these models can predict future behavior with greater accuracy and deliver genuinely personalized experiences. This approach moves beyond superficial personalization – offering relevant products based on past purchases – toward anticipating user needs and proactively providing value, ultimately fostering stronger engagement and loyalty.

The Ghosts in the Machine: Learning from Unlabeled Sequences

Self-Supervised Learning (SSL) addresses the scarcity and cost associated with labeled datasets by leveraging the inherent structure within unlabeled data to create predictive tasks. Traditional supervised learning requires manually annotated data, which is often a bottleneck in many applications. SSL circumvents this by formulating pretext tasks – problems designed to be solved using the unlabeled data itself – forcing the model to learn meaningful representations of sequential patterns. These representations are then transferable to downstream tasks with limited labeled data, effectively reducing the reliance on expensive annotation efforts and enabling the utilization of vast amounts of readily available raw data. The core principle involves predicting portions of the input sequence from other parts, thus creating a learning signal without external labels.

Next Event Prediction and Next K Events Histogram Prediction are self-supervised learning tasks designed to leverage unlabeled sequential data, specifically raw user behavior logs. In Next Event Prediction, the model is trained to predict the immediately following event in a user’s sequence, given the preceding events as context. Next K Events Histogram Prediction extends this by training the model to predict a distribution over the next K events, effectively learning which events are most likely to occur within a defined future window. Both methods eliminate the need for manually labeled datasets by constructing predictive tasks directly from the inherent structure of the sequential data, allowing models to learn representations of user behavior through prediction rather than classification or regression against external labels.

User embeddings generated through self-supervised learning methods represent users as dense vectors in a high-dimensional space, capturing patterns of sequential behavior. These embeddings are created by training models to predict future events or event distributions based on past user interactions, effectively encoding individual user journeys into a numerical representation. The robustness of these embeddings stems from their derivation from extensive, unlabeled user logs, allowing for the capture of subtle behavioral nuances that might be missed with limited labeled data. Consequently, these embeddings can be utilized for various downstream tasks such as user clustering, recommendation systems, and anomaly detection, offering a more comprehensive understanding of user behavior than traditional methods.

The Alchemist’s Dream: Aligning Counting and Sequential Modeling

Abacus introduces a pretext task designed to integrate the strengths of both counting-based methods and sequential encoding architectures. Traditional counting methods excel at capturing aggregate statistics within a sequence, but lack the ability to model the sequential relationships between events. Conversely, powerful sequential encoders, such as recurrent neural networks or transformers, effectively capture sequential dependencies but often require large labeled datasets to learn accurate representations of event frequencies. The Abacus pretext task addresses this limitation by requiring the model to predict the distribution of event types within a sequence, thereby forcing it to simultaneously learn both aggregate statistics – the overall counts of each event type – and the sequential patterns that govern their occurrences. This combined learning approach aims to improve the efficiency and generalization capabilities of sequential models, particularly in scenarios with limited labeled data.

The Abacus model utilizes prediction of the Event-Type Distribution as a core training objective, compelling the model to develop representations sensitive to both the overall statistical composition of event sequences and the sequential relationships within them. This distribution, representing the relative frequency of each event type, provides a global statistical summary. Simultaneously, the prediction task requires the model to consider the order and context of events to accurately estimate these frequencies, thereby implicitly encoding sequential information. This dual focus ensures that learned representations are not solely based on aggregate counts, but also reflect the dynamic patterns and dependencies present in the sequential data.

The implementation of data augmentation techniques, specifically Random Permutation and Segment Event Masking, improves model performance by increasing the diversity of training data. Random Permutation randomly shuffles the order of events within a sequence, forcing the model to learn position-invariant representations and enhancing its ability to generalize to different event orderings. Segment Event Masking randomly masks contiguous segments of events, compelling the model to infer missing information and become more robust to incomplete or noisy input sequences. These techniques collectively reduce overfitting and improve the model’s capacity to generalize to unseen data, leading to increased robustness and predictive accuracy.

The Echo of Prediction: Validation and Impact on Purchase Prediction

Rigorous experimentation across two distinct datasets – the publicly available Taobao Dataset and a large, private e-commerce dataset – confirms that the Abacus model substantially enhances purchase prediction accuracy. These evaluations showcase Abacus’s ability to generalize beyond specific platforms and datasets, suggesting a robust approach to modeling user behavior. The model consistently outperformed baseline methods in predicting future purchases, indicating its potential for practical application in optimizing recommendation systems and targeted marketing campaigns. This improved accuracy translates directly to better user engagement and potentially increased revenue for e-commerce businesses, demonstrating the real-world impact of incorporating self-supervised learning into user modeling.

Integrating Abacus’s multi-task learning framework-which leverages techniques like masked modeling and Barlow Twins-with established architectures such as Transformer and GRU Encoder networks substantially boosts purchase prediction accuracy. Evaluations conducted on a large, private dataset reveal an impressive improvement of up to +6.1% in Area Under the Curve (AUC). This indicates that the self-supervised learning components within Abacus effectively enhance the model’s ability to discern subtle patterns in user behavior, leading to more reliable predictions. The observed gains suggest a promising pathway for augmenting traditional user modeling techniques with the power of contemporary self-supervised learning approaches, ultimately creating more insightful and effective predictive systems.

Analysis reveals that the implementation of Abacus facilitates a notably accelerated convergence during model training when contrasted with initiating a GRU encoder from its foundational state, a phenomenon visually demonstrated in Figure 2. This expedited learning process suggests that the self-supervised learning techniques integrated within Abacus – specifically, the pre-training phase – effectively equips the model with a robust initial understanding of user behavior. Consequently, fewer training iterations are required to achieve optimal performance, streamlining development and potentially reducing computational costs. The observed synergy between self-supervised learning and established architectures underscores a promising pathway for building more efficient and accurate user models, offering a substantial improvement over traditional training methodologies.

The pursuit of increasingly granular user modeling, as exemplified by Abacus and its counting-aligned pretraining, reveals a fundamental truth about complex systems. It isn’t simply about capturing more data points, but recognizing the inherent fragility woven into interconnectedness. As Claude Shannon observed, “The most important thing in communication is to convey the right message, not the most information.” Abacus attempts to distill signal from the noise of user event sequences, but each added layer of complexity-each attempt to perfectly predict behavior-increases the potential for cascading failure. The system, striving for ever-finer resolution, unwittingly prophecies its own eventual entanglement in the unpredictable currents of user action. It splits the signal, but not its fate.

What’s Next?

The pursuit of distributional pretraining, even when ‘aligned’ with seemingly concrete signals like event counts, remains a negotiation with entropy. Abacus demonstrates acceleration, yet acceleration merely delays the inevitable drift toward unforeseen states. The model’s efficacy is predicated on the assumption that past events adequately predict future ones – a comfortable fiction. The architecture doesn’t solve sequential user modeling; it postpones the reckoning with its inherent indeterminacy.

Future work will undoubtedly explore scaling – larger datasets, deeper networks. But such endeavors address symptoms, not causes. The more pressing question concerns the very ontology of the ‘user’ being modeled. Is this a stable entity, or merely a transient configuration of impulses? Counting events captures behavior, but says little about the underlying generative process, which remains, stubbornly, a black box. A guarantee of predictive power is simply a contract with probability, and the terms are always subject to revision.

The illusion of stability caches well, but the system will, inevitably, encounter novel user behaviors – ‘black swans’ beyond the scope of current training. The real challenge isn’t building a more accurate model, but designing systems that gracefully degrade – that embrace chaos not as failure, but as nature’s syntax.

Original article: https://arxiv.org/pdf/2512.16581.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Measurement: Beyond Simple Counts

The Ghosts in the Machine: Learning from Unlabeled Sequences

The Alchemist’s Dream: Aligning Counting and Sequential Modeling

The Echo of Prediction: Validation and Impact on Purchase Prediction

What’s Next?

See also: