Author: Denis Avetisyan
A new AI framework fuses social media sentiment with stock data to identify coordinated efforts to artificially inflate or deflate share prices.

This paper introduces AIMM, an AI-driven multimodal system for detecting social-media-influenced stock market manipulation through risk scoring based on natural language processing and time series analysis.
Traditional financial surveillance struggles to identify coordinated market manipulation originating from online sources. This paper introduces AIMM: An AI-Driven Multimodal Framework for Detecting Social-Media-Influenced Stock Market Manipulation, a novel system fusing Reddit activity, bot indicators, and market data into a daily risk score. Preliminary results demonstrate AIMM’s ability to flag potentially manipulative events, including a 22-day advance warning prior to the January 2021 GameStop surge, and the release of a labeled dataset to support further research. Can this multimodal approach ultimately provide regulators and investors with the tools needed to proactively detect and mitigate the growing threat of social media-driven market manipulation?
The Illusion of Control: Why Market Surveillance Always Lags
Conventional techniques for uncovering market manipulation frequently depend on readily exploitable metrics, inadvertently providing openings for resourceful actors. These methods, often centered on volume spikes or price fluctuations, prove susceptible to strategic, yet subtle, interventions designed to mimic genuine market forces. Sophisticated manipulators skillfully navigate these thresholds, employing techniques like layering orders or coordinating activity across multiple accounts to create the appearance of legitimate trading while subtly influencing asset prices. Consequently, regulatory bodies and financial institutions face a continuous challenge in distinguishing between authentic market dynamics and carefully orchestrated attempts at deception, highlighting the limitations of relying solely on historically-used indicators.
The proliferation of social media has fundamentally altered the landscape of financial markets, creating a torrent of data that overwhelms traditional surveillance methods. No longer can regulators or analysts rely solely on tracking easily identifiable keywords or hashtags to detect manipulation; the speed and complexity of online conversations necessitate automated systems capable of nuanced sentiment analysis and contextual understanding. These systems must move beyond simple positive or negative scoring to discern sarcasm, irony, and coded language often employed to influence market behavior. Effectively monitoring this data requires advanced natural language processing techniques, capable of identifying subtle shifts in online narratives and correlating them with actual trading activity – a task far exceeding the capabilities of manual review or rudimentary keyword searches.
Effective detection of market manipulation increasingly depends on integrating traditionally disparate data sources into a unified analytical framework. Rather than focusing on isolated indicators, a comprehensive risk assessment necessitates the fusion of real-time market data – including trade volumes and price fluctuations – with social sentiment extracted from platforms like Twitter and Reddit. Crucially, this must be coupled with detailed account activity, examining network connections, posting frequency, and behavioral patterns that deviate from established norms. This holistic approach allows for the identification of coordinated campaigns and subtle forms of influence that would otherwise remain hidden, moving beyond simple anomaly detection to reveal manipulative intent and its impact on market stability. The resulting risk profiles enable regulators and exchanges to proactively address emerging threats and maintain investor confidence.

AIMM: Stitching Together the Fragments of Deception
The Automated Intelligence for Market Manipulation (AIMM) system is designed to detect potentially manipulative events by consolidating and analyzing data from three primary sources: market data, social media activity, and individual account behavior. Market data, including trade volumes and price fluctuations, provides a baseline for identifying unusual activity. Social media signals are incorporated to assess public sentiment and detect potentially misleading narratives. Account behavior, such as posting frequency, network connections, and trading patterns, is monitored to identify coordinated or suspicious actions. By integrating these data streams, AIMM aims to provide a holistic view of market activity and flag instances indicative of manipulative practices that might otherwise go unnoticed.
AIMM employs sentiment analysis to assess public opinion regarding financial instruments by utilizing a dual-methodology approach. Lexicon-based methods, specifically VADER (Valence Aware Dictionary and sEntiment Reasoner), assign sentiment scores based on pre-defined word lists and grammatical structures. Complementing this, AIMM integrates transformer models such as FinBERT, a BERT model fine-tuned on financial text, to capture contextual nuances and more accurately interpret sentiment expressed in financial discussions. This combination allows for a robust evaluation of public perception, identifying potentially manipulative narratives by analyzing the emotional tone of online content related to specific assets.
Coordination detection within the AIMM system employs TF-IDF vectorization to identify accounts exhibiting similar messaging patterns, indicative of potential manipulation. This process converts textual content from each account into a numerical vector, weighting terms based on their frequency within a specific account and their rarity across the entire dataset. These vectors are then compared using cosine similarity; accounts with highly similar vectors suggest coordinated behavior. Specifically, TF-IDF prioritizes terms that are frequent in a given account’s posts but infrequent across all posts, highlighting unique, potentially orchestrated messaging. A high degree of similarity between multiple accounts’ TF-IDF vectors signals a statistically significant likelihood of coordinated activity, serving as a key input for overall manipulation risk assessment.
The AIMM Risk Score (AMRS) is a normalized, scalar value ranging from 0 to 100, generated by combining weighted scores from multiple signal analyses. These analyses include sentiment scores derived from both lexicon-based and transformer-model natural language processing, as well as metrics quantifying coordinated account behavior identified through TF-IDF vectorization. The weighting of each component within the AMRS calculation is determined through a supervised learning model trained on historical manipulation events; higher AMRS values indicate an increased probability of manipulative activity. The score is continuously updated, providing a real-time assessment of risk associated with specific financial instruments or accounts.

Validation: Ground Truth in a World of Noise
The AIMM-GT dataset serves as the primary benchmark for evaluating the performance of the Anomaly and Intent Manipulation Monitor (AIMM). This dataset is a meticulously curated and labeled collection of actual manipulation events occurring in online financial forums. It provides ground truth for assessing AIMM’s ability to correctly identify instances of market manipulation, enabling quantitative measurement of its precision and recall. The labeling process involves expert analysis to confirm the presence and nature of manipulative activity, ensuring the dataset’s reliability for performance evaluation and model refinement. The size and composition of the AIMM-GT dataset are continually updated to reflect evolving manipulation tactics and maintain the relevance of the evaluation process.
AIMM’s data pipeline relies on external APIs to gather information necessary for manipulation event detection. Historical stock price data is obtained through the Yahoo Finance API, providing the time-series data used to assess market behavior. Previously, the system incorporated data from the Pushshift API, which allowed access to Reddit comments and submissions; this data was used to identify potential discussion-based manipulation attempts. The integrity and availability of data from these external sources are critical to AIMM’s operational performance, and the system is designed to accommodate changes or disruptions in data feeds.
AIMM employs bot detection techniques to mitigate the impact of automated accounts on manipulation signal identification. These methods analyze user behavior, including posting frequency, account age, and content patterns, to identify and filter out bot-like activity. The rationale is that artificially inflated activity from bots can distort the assessment of genuine manipulative intent, leading to false positives or reduced precision. By excluding these automated accounts, AIMM aims to improve the accuracy of manipulation detection by focusing on signals originating from authentic user interactions and market activity.
The AIMM system utilizes a modular architecture to facilitate the integration of diverse data streams and analytical methods. This design allows for the seamless addition of new financial data sources beyond the current Yahoo Finance API and, previously, the Pushshift API, as well as the implementation of updated or alternative analytical techniques without requiring substantial code refactoring. Each component, including data ingestion, feature extraction, and classification, is treated as an independent module with well-defined interfaces, promoting flexibility and scalability. This adaptability is critical for maintaining the system’s efficacy in response to evolving market dynamics and the availability of novel data types or analytical approaches.
Evaluation of the Artificial Intelligence for Market Manipulation (AIMM) system on a limited test set of three manipulation events yielded perfect classification performance. Specifically, AIMM achieved a precision score of 1.0 and a recall score of 1.0, indicating that all instances of manipulation were correctly identified without any false positives. While this evaluation is based on a small dataset, the results demonstrate AIMM’s potential capability for the early and accurate detection of manipulative events, suggesting a high degree of sensitivity and specificity in its current configuration.

Beyond Detection: A Proactive Defense, Not Just a Post-Mortem
The proactive identification of coordinated manipulation campaigns represents a significant step towards safeguarding financial markets and investor interests. Automated systems like AIMM move beyond simply reacting to market anomalies; instead, they offer regulators the opportunity to intervene before abusive practices fully materialize. By pinpointing coordinated efforts to artificially inflate or deflate asset prices, these systems enable timely investigations and preventative measures, such as temporary trading halts or increased scrutiny of suspicious actors. This shift from reactive enforcement to proactive monitoring promises not only to reduce financial losses for individual investors, but also to bolster overall market integrity and public trust, fostering a more stable and equitable investment landscape.
A significant advancement lies in the potential to seamlessly incorporate AIMM’s risk scoring directly into existing financial infrastructure. This integration allows trading platforms and investment firms to augment their standard due diligence protocols with a dynamic, data-driven assessment of manipulation risk. Rather than reacting to detected schemes, institutions can proactively flag potentially problematic assets or trading patterns, enabling informed decision-making before market distortions occur. The system doesn’t dictate investment strategy, but instead functions as an early warning system, providing a crucial additional layer of scrutiny that complements existing analytical tools and helps protect against financial losses stemming from coordinated manipulation campaigns. This preemptive capability represents a shift towards more resilient and transparent market operations.
The analytical power of AIMM is poised to grow significantly through the incorporation of broader data streams beyond traditional market data. Current development prioritizes integrating alternative data sources, such as real-time news articles, regulatory filings like 10-K reports, and sentiment analysis derived from corporate communications. This expansion will allow the system to identify subtle precursors to manipulation that might otherwise be missed, providing a more holistic view of potential market abuse. By cross-referencing social media signals with information contained in official company disclosures and media coverage, AIMM aims to improve the accuracy of its risk scoring and further extend its lead time in detecting manipulative activities, ultimately bolstering investor protection and market integrity.
Continued development of the Artificial Intelligence for Market Manipulation (AIMM) system prioritizes refining its analytical core to address increasingly complex deception. Future iterations will move beyond current machine learning models, exploring techniques like deep reinforcement learning and graph neural networks to identify nuanced manipulation patterns. These advancements aim to detect not just overt schemes, but also subtle, coordinated activity that previously evaded detection-such as the strategic dissemination of misinformation across multiple platforms or the exploitation of emerging market dynamics. By focusing on adaptability and the capacity to learn from evolving tactics, the system intends to stay ahead of manipulators and ensure ongoing protection against market abuse, even as strategies become more sophisticated and difficult to discern.
The Artificial Intelligence Market Manipulation (AIMM) system exhibits a noteworthy proactive capability, consistently identifying pre-market social signals indicative of potential manipulation campaigns with a lead time of nine days. This temporal advantage is crucial, as it allows for intervention before manipulative activity significantly impacts market prices and investor confidence. By analyzing the velocity and sentiment of online discussions, AIMM can pinpoint coordinated efforts to artificially inflate or deflate asset values, providing a critical early warning system for regulators and financial institutions. The nine-day window represents a substantial improvement over traditional reactive monitoring methods, which typically respond to market anomalies after they have already occurred, and offers the potential to mitigate financial harm and maintain market integrity.

The pursuit of automated financial surveillance, as detailed in this framework, feels predictably optimistic. AIMM attempts to fuse social media sentiment with market data, generating a risk score to flag manipulation – a noble goal, certainly. However, the system’s reliance on identifying ‘potentially manipulative activity’ highlights a fundamental truth: these models aren’t oracles. As Bertrand Russell observed, “The difficulty lies not so much in developing new ideas as in escaping from old ones.” Each new signal processed, each algorithm refined, simply adds another layer to the inevitable technical debt. One can anticipate the moment when a clever actor finds a way to game the risk score, forcing another round of ‘innovation’ to chase a moving target. The system may detect manipulation, but it won’t prevent it, only delay the inevitable adaptation.
The Road Ahead
This pursuit of quantifiable manipulation – fusing the ephemeral chatter of Reddit with the cold logic of market data – feels predictably optimistic. The system, as presented, identifies potential activity. One suspects production environments will rapidly demonstrate the difference between statistical anomaly and actual coordinated crime. Anything labeled ‘scalable’ hasn’t encountered a determined adversary, or a sufficiently large data pipeline failure. The current focus on Reddit is also… quaint. As if manipulation schemes announce themselves on a single platform. The next iteration will undoubtedly involve scraping every obscure forum and encrypted messaging app, chasing shadows until the signal is lost in noise.
The construction of a ‘risk score’ is, of course, the siren song of all financial modeling. Assign a number, and suddenly complexity yields to control. But markets are rarely rational, and human behavior is notoriously difficult to predict. A high score will likely trigger more false positives than actual interventions. Better one monolithic risk model, honestly assessed, than a hundred lying microservices each claiming to understand the ‘true’ intent of market actors.
Ultimately, this work adds another layer to the increasingly elaborate game of cat and mouse. The manipulators will adapt, finding new vectors, new platforms, new ways to obfuscate their intentions. The system will chase, endlessly refining its algorithms. And the logs will fill with exceptions. Such is progress.
Original article: https://arxiv.org/pdf/2512.16103.pdf
Contact the author: https://www.linkedin.com/in/avetisyan/
See also:
- Silver Rate Forecast
- Gold Rate Forecast
- Красный Октябрь акции прогноз. Цена KROT
- Navitas: A Director’s Exit and the Market’s Musing
- Unlocking Text Data with Interpretable Embeddings
- 2026 Stock Market Predictions: What’s Next?
- VOOG vs. MGK: Dividend Prospects in Growth Titans’ Shadows
- Ethereum’s Fate: Whales, ETFs, and the $3,600 Gambit 🚀💰
- XRP’s Wrapped Adventure: Solana, Ethereum, and a Dash of Drama!
- Itaú’s 3% Bitcoin Gambit: Risk or Reward?
2025-12-19 08:45