Predicting What You’ll Do Next on Social Media

Author: Denis Avetisyan

A new approach accurately forecasts user behavior, from everyday actions to the surprisingly rare, on platforms like Bluesky.

A database analysis of 10,000 messages revealed a concentrated region of highly influential content-messages garnering over ten votes where a single action consistently dominated with a winner percentage exceeding 90% across all identified clusters-suggesting a predictable pattern of consensus within the system.

This paper details a hybrid model combining lookup strategies, tabular data, and neural networks to achieve state-of-the-art performance in social media action prediction and rare event classification.

Predicting the full spectrum of user behavior on social media remains a challenge, as most approaches prioritize frequent actions while overlooking rarer, yet potentially impactful, engagements. This paper, ‘Social-Media Based Personas Challenge: Hybrid Prediction of Common and Rare User Actions on Bluesky’, introduces a novel hybrid methodology combining lookup strategies, tabular models, and neural networks to accurately forecast both common and infrequent user actions. Achieving state-of-the-art results in the SocialSim challenge, our approach demonstrates that tailored modeling-recognizing fundamental differences between action types-is crucial for effective social media behavior prediction. Can these findings pave the way for more nuanced and responsive platform design and content recommendation systems?

Predicting the Inevitable: Why Social Signals Remain Elusive

The promise of truly personalized experiences on social media platforms hinges on accurately anticipating user actions, but current predictive models consistently fall short due to the inherent complexity of human behavior. While algorithms can identify broad trends, discerning individual preferences and forecasting nuanced reactions proves remarkably difficult. These systems often rely on simplified representations of user activity, overlooking the subtle interplay of factors that drive engagement – from the specific content consumed to the context of social interactions. Consequently, predictions are frequently inaccurate, leading to irrelevant recommendations, ineffective advertising, and a diminished user experience. Improving these forecasts requires moving beyond superficial patterns and embracing more sophisticated approaches that capture the richness and unpredictability of human social dynamics.

Current predictive models on social media often stumble when attempting to anticipate user behavior due to an oversimplification of the factors at play. These approaches frequently treat interactions as isolated events, neglecting the complex web of individual preferences, evolving relationships, and contextual cues that shape online activity. Consequently, predictions based solely on superficial patterns – such as frequently shared content or broad demographic data – consistently fall short of accuracy. A user’s history isn’t simply a collection of likes and shares; it’s a dynamic record of changing interests, nuanced opinions, and the subtle influences of their social network. Failing to account for these intricacies results in a skewed understanding of user intent and, ultimately, unreliable predictions that hinder truly personalized experiences.

A central difficulty in forecasting social media activity stems from the need to simultaneously interpret what is being communicated and when it is being shared. Existing models often prioritize message content – the words, images, and links – while underestimating the significance of interaction timing. However, the precise moment a user responds, shares, or comments provides crucial insight into their engagement and intent. Capturing these temporal dynamics requires sophisticated analytical techniques capable of discerning patterns hidden within sequences of interactions. Effectively representing both the semantic content of messages and the precise timing of user responses is not merely a technical hurdle, but a fundamental step towards building predictive models that accurately reflect the complexities of human social behavior and ultimately deliver truly personalized online experiences.

LightGBM and Persona Clusters: Pragmatic Approaches to Prediction

A LightGBM model is utilized for the prediction of frequently occurring user actions due to its computational efficiency when processing tabular data and features derived through engineering. This gradient boosting framework demonstrates a macro-F1 score of 0.78, indicating a balanced performance across all classes when identifying these common actions. The model’s effectiveness stems from its ability to rapidly iterate on decision trees, optimizing for predictive accuracy while minimizing computational cost associated with high-dimensional, structured datasets.

Prediction models are customized through the utilization of Persona Clusters to account for individual user tendencies. These clusters, derived from user behavior, enable the creation of tailored models that move beyond generalized predictions. By segmenting the user base, the models can learn and apply patterns specific to each persona, resulting in improved prediction accuracy compared to a single, universal model. This approach allows for a more nuanced understanding of user behavior and facilitates more relevant and effective predictions.

A dedicated Rare Action Classification model addresses the challenge of predicting infrequent user actions that are not well-represented in training data. This model is specifically designed to identify nuanced behavioral patterns associated with these rare events, utilizing techniques suited for imbalanced datasets. By isolating these actions, the model can improve prediction accuracy beyond what is achievable with a generalized model trained on the majority class of frequent actions, allowing for a more comprehensive understanding of user behavior and enabling targeted interventions or personalized experiences.

Fusing Time and Text: A Hybrid Architecture for Contextual Understanding

The Hybrid Neural Architecture represents user interactions by integrating two distinct feature sets: Temporal Features and Textual Features. Temporal Features capture the timing and sequence of user actions, including inter-message times and session durations, providing context regarding when interactions occur. Textual Features are derived from the content of user messages via techniques like word embeddings and transformer networks, representing what is being communicated. These features are not processed in isolation; rather, they are combined to create a holistic representation that considers both the content and timing of user behavior. This combined representation enables the model to discern patterns that might be missed when analyzing either feature set independently, improving the accuracy of interaction understanding and prediction.

Cross-Attention Fusion is implemented to enable direct interaction between temporal feature representations and textual message content. Specifically, the model employs attention mechanisms to allow each token in the message sequence to attend to the entire sequence of temporal features – capturing dependencies between message content and the timing of interactions. This process generates context vectors that incorporate temporal information directly into the textual representation, allowing the model to weigh the importance of different temporal patterns when interpreting message semantics. The resulting fused representation is then used for downstream classification tasks, improving the model’s ability to recognize patterns that rely on both what was said and when it was said.

Rare action prediction tasks are often hampered by significant class imbalance, where the majority of interactions represent common actions and only a small fraction represent the target rare actions. To mitigate this, we implemented Focal Loss, a dynamically scaled cross-entropy loss that down-weights the contribution of easily classified examples – primarily the common actions – and focuses training on hard, misclassified examples, specifically the rare actions. The Focal Loss function includes a modulating factor $ (1 – p_t)^{\gamma} $, where $p_t$ is the model’s estimated probability for the correct class and $\gamma$ is a focusing parameter. By reducing the loss contribution from well-classified examples, Focal Loss increases the model’s sensitivity to rare actions, resulting in improved performance metrics for this critical, imbalanced class.

The model’s ability to integrate temporal data with message content significantly improves rare action classification performance. By considering the timing of messages alongside their textual content, the architecture moves beyond solely semantic understanding to incorporate contextual awareness of user behavior. This fusion of temporal and textual features enables the identification of subtle patterns indicative of infrequent but critical actions, resulting in a macro-F1 score of 0.56 for rare action classification – a metric indicating a balanced precision and recall for these less frequent events.

From Prediction to Proactivity: Shaping the Conversational Flow

The system functions by first anticipating a user’s subsequent action within a conversation, and then strategically uses this prediction to shape the generated reply. Rather than simply responding to the immediate input, the model considers the likely conversational trajectory, allowing it to craft responses that are not only relevant but also proactively address anticipated needs or questions. This is achieved through integration with GPT-4.1-mini, a powerful language model that translates the predicted actions into coherent and contextually appropriate text. By effectively ‘thinking ahead’, the system moves beyond superficial interactions, fostering a more dynamic and engaging conversational experience for the user, and ultimately improving the perceived intelligence and helpfulness of the AI.

The capacity to anticipate a user’s subsequent action fundamentally alters how conversational AI can construct replies. Rather than simply responding to the immediate input, the system predicts likely user responses and tailors its output accordingly, fostering a sense of genuine interaction. This proactive approach moves beyond generic answers, enabling the model to generate more personalized and engaging dialogue. By considering the conversational trajectory, the system can maintain context more effectively, offer relevant suggestions, and even proactively address potential user needs, ultimately creating a more fluid and satisfying user experience. The result is a conversation that feels less like an exchange with a machine and more like a natural back-and-forth between individuals.

Rigorous testing of the developed system utilized the Bluesky Dataset, revealing a significant performance advantage over conventional transformer models. The hybrid approach, integrating predicted actions into the reply generation process, achieved a 16-point improvement in macro-F1 score, a key metric for evaluating classification accuracy and balance across different classes. This substantial gain underscores the effectiveness of incorporating action prediction to enhance contextual understanding and response relevance. The results demonstrate that the model not only generates replies, but does so with a demonstrably higher degree of accuracy and nuanced comprehension of the conversational context, paving the way for more natural and engaging interactions.

The system’s ability to maintain conversational coherence is demonstrably high, as evidenced by an average cosine similarity of 0.83 when evaluating text generated across 1000 distinct conversations. This metric assesses the alignment between the generated response and the preceding conversational context, effectively quantifying semantic relevance. A score approaching 0.83 indicates a strong correlation, suggesting the model consistently produces replies that are not only grammatically sound but also logically connected to the ongoing dialogue. This level of consistency is crucial for creating truly engaging and natural-feeling interactions, moving beyond simple keyword matching towards genuine understanding and response.

The pursuit of predicting user behavior, as demonstrated in this hybrid approach to modeling actions on Bluesky, feels less like engineering and more like applied archeology. One digs through layers of temporal features and model architectures hoping to unearth a pattern before the platform shifts again. As Brian Kernighan observed, “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.” This rings true; the elegance of combining lookup strategies, tabular models, and neural networks will inevitably succumb to the chaotic reality of production data and the ever-shifting sands of user trends. Tests offer a fleeting illusion of control, but certainty remains stubbornly out of reach.

What’s Next?

The pursuit of predictive accuracy invariably reveals the brittleness of ‘persona.’ This work demonstrates a functional synthesis – lookup tables, tabular models, and neural networks aligned to forecast user behavior. However, the very act of prediction introduces a feedback loop; successful targeting alters the landscape, rendering yesterday’s insights obsolete. Everything optimized will one day be optimized back, a perpetual churn in the feature space.

The challenge, then, isn’t merely improving classification metrics for rare actions, but acknowledging their inherent ephemerality. A model that anticipates novelty must also account for its own influence on that novelty. Future iterations will likely grapple with continual learning paradigms, models that actively decay old assumptions and prioritize adaptation over static representation. The architecture isn’t a diagram; it’s a compromise that survived deployment – and even that survival is temporary.

The current focus on individual user prediction may also obscure a more fundamental dynamic: collective behavior. It’s not simply about anticipating what a user will do, but why – and how those motivations shift within a network. The field will likely need to move beyond feature engineering and towards a more holistic understanding of social systems – a task where the logs are always more truthful than the theory.

Original article: https://arxiv.org/pdf/2511.17241.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Predicting the Inevitable: Why Social Signals Remain Elusive

LightGBM and Persona Clusters: Pragmatic Approaches to Prediction

Fusing Time and Text: A Hybrid Architecture for Contextual Understanding

From Prediction to Proactivity: Shaping the Conversational Flow

What’s Next?

See also: