Trading in the Dark: Can AI Conquer Markets Without Seeing the Data?

Author: Denis Avetisyan

Researchers have developed a novel framework that allows large language models to perform portfolio optimization using only anonymized financial data, mitigating risks of bias and memorization.

The BlindTrade pipeline establishes a robust framework for financial data analysis by first anonymizing sensitive information, then leveraging multi-agent large language models to generate pertinent features, validating these insights with an Independent Component (IC) analysis, encoding them via a Semantic Graph Attention Network (SemGAT), refining trading strategies through Intent-conditioned Reinforcement Learning utilizing Proximal Policy Optimization with Diversity-Seeking Reward (PPO-DSR), and ultimately evaluating performance through rigorous backtesting.

This work introduces BlindTrade, an anonymization-first LLM-GNN-RL framework for robust and interpretable financial trading in volatile markets.

Despite the promise of large language models (LLMs) in financial trading, ensuring genuine understanding of market dynamics-rather than mere memorization of ticker associations-remains a critical challenge. This work, ‘Can Blindfolded LLMs Still Trade? An Anonymization-First Framework for Portfolio Optimization’, introduces BlindTrade, a novel framework that anonymizes financial data and employs a LLM-GNN-RL pipeline to mitigate memorization and survivorship biases. Experiments demonstrate a Sharpe ratio of 1.40 +/- 0.22, alongside evidence of robust signal legitimacy and a performance dependency on market volatility-excelling in turbulent conditions but diminishing in strong bull trends. Can this anonymization-first approach unlock truly interpretable and reliable LLM-driven portfolio management across diverse market regimes?

The Limitations of Static Financial Modeling

Conventional financial modeling frequently operates on the premise of stable relationships, utilizing static features like historical averages or pre-defined risk factors. However, this approach often fails to capture the intricate and ever-shifting dependencies inherent in modern markets. Analyses built on these static foundations can miss crucial signals arising from non-linear interactions between assets, or the influence of external factors not explicitly included in the model. The result is a potentially incomplete picture of risk and return, where subtle but significant relationships are overlooked in favor of simplified, easily quantifiable metrics. This limitation becomes increasingly pronounced in complex systems where emergent behaviors – patterns arising from the interactions of many components – dictate market dynamics, rendering static analyses increasingly unreliable for accurate prediction and effective portfolio management.

Modern financial markets are characterized by intricate webs of interconnectedness, where the relationships between assets are rarely static and consistently shift in response to global events, investor sentiment, and even seemingly unrelated economic indicators. Traditional analytical techniques, built on assumptions of linear correlation and consistent behavior, often fail to adequately capture these evolving dependencies. A dynamic approach, leveraging techniques like time-varying correlation matrices and network analysis, becomes essential to model the changing landscape of asset relationships. These methods allow for a more granular understanding of how risk propagates through the system and can identify previously hidden vulnerabilities or opportunities, ultimately leading to more robust and adaptive investment strategies. Ignoring this dynamic interplay risks mispricing assets and underestimating the true extent of systemic risk.

Financial modeling frequently suffers from insidious biases, most notably survivorship bias, which systematically distorts performance evaluations. This occurs because analyses often focus solely on entities that have survived a particular period – successful companies or funds – while ignoring those that have failed or ceased to exist. Consequently, reported returns are artificially inflated, presenting an overly optimistic view of past performance and leading to unreliable strategic projections. The omission of failed cases creates a skewed dataset, obscuring the true risk landscape and potentially encouraging investment in strategies with a higher probability of failure than indicated by the incomplete data. Addressing this requires diligent efforts to incorporate data from defunct entities, a challenging task given the inherent difficulty in obtaining comprehensive historical records for those no longer operating.

The policy exhibits adaptive behavior by adjusting intent probabilities to market conditions while maintaining a stable intent distribution across training, validation, and out-of-sample datasets, indicating robust generalization.

BlindTrade: An Anonymization-First Intelligent System

BlindTrade utilizes a multi-component framework to generate alpha through the integration of three core technologies: Large Language Models (LLMs) functioning as agents for data analysis and insight generation, Graph Neural Networks (GNNs) to model relationships between entities and propagate information, and a Reinforcement Learning (RL) policy to optimize trading strategies based on the outputs of the LLM and GNN components. The LLM agents process financial data and news, the GNN constructs a knowledge graph representing interconnected market information, and the RL policy learns to navigate this graph to identify and execute profitable trades. This combined approach aims to leverage the strengths of each technology – the LLM’s reasoning capabilities, the GNN’s relational understanding, and the RL policy’s decision-making process – to achieve superior investment performance.

Anonymization within the BlindTrade framework is implemented to address the risks of memorization and data leakage inherent in training machine learning models on sensitive financial data. Specifically, ticker symbols are systematically replaced with unique, randomly generated identifiers prior to model training. This process prevents the model from directly associating specific tickers with predictive signals, thereby reducing the potential for information leakage and protecting the confidentiality of the underlying data. The mapping between anonymized identifiers and actual tickers is maintained separately and applied only during the final output stage to ensure the model generalizes based on financial characteristics rather than specific stock names, improving robustness and mitigating potential regulatory concerns.

Reasoning Embeddings within the BlindTrade system function as numerical representations of insights generated by the Large Language Model (LLM) agents. These embeddings are created by mapping the semantic content of LLM-produced rationales – explanations for predicted price movements – into a high-dimensional vector space. This vectorization allows for the quantification of complex financial reasoning and enables the construction of a knowledge graph where nodes represent concepts or entities (e.g., companies, sectors, events) and edges represent the relationships between them, as determined by the similarity of their corresponding Reasoning Embeddings. The resulting graph facilitates the identification of patterns, dependencies, and latent connections within financial data, supporting more informed investment decisions.

The reinforcement learning policy network determines portfolio weights by first selecting a market strategy (<span class="katex-eq" data-katex-display="false"> ext{defensive/neutral/aggressive}</span>) based on large language model statistics and graph neural network market state, then using stock-level scores to parameterize a Dirichlet distribution. — The reinforcement learning policy network determines portfolio weights by first selecting a market strategy ( $ext{defensive/neutral/aggressive}$ ) based on large language model statistics and graph neural network market state, then using stock-level scores to parameterize a Dirichlet distribution.

Inferring Inter-Stock Relationships with Graph Networks

The Graph Neural Network (GNN) constructs a market representation by combining two data sources: Sector Connections, which define relationships between industry sectors based on co-movement, and Reasoning Embeddings generated by a Large Language Model (LLM). The LLM analyzes news and financial reports to produce vector embeddings that capture qualitative reasoning about stock performance. These embeddings are integrated with the Sector Connections within the GNN architecture, allowing the model to learn not only statistical correlations but also contextual relationships. This integration results in a dynamic representation of the market that evolves as new information becomes available and sector relationships shift, providing a more nuanced understanding of inter-stock dependencies than traditional methods.

The Semantic Graph Attention Network (SemGAT) refines the analysis of inter-stock relationships by implementing an attention mechanism that weights the importance of different connections within the graph. This allows the model to prioritize the most relevant relationships when propagating information between stocks, effectively filtering out noise and focusing on impactful linkages. The attention weights are determined dynamically based on the features of connected nodes and the edges between them, enabling the SemGAT to adapt to changing market conditions and identify subtle but significant correlations. This focused approach improves the accuracy of the GNN’s representation of the market and enhances the performance of downstream tasks, such as portfolio optimization.

The reinforcement learning (RL) policy leverages the output of the Graph Neural Network (GNN) to calculate portfolio weights, modulated by Intent Variables that define the desired risk profile – defensive, neutral, or aggressive. Operational data indicates a clear correlation between risk posture and portfolio turnover; the defensive strategy exhibits a daily turnover rate of 2.9%, signifying frequent asset adjustments to mitigate potential losses. This is substantially higher than the neutral strategy’s 1.8% daily turnover and considerably exceeds the aggressive strategy’s low turnover rate of 0.4%. These turnover rates demonstrate the RL policy’s dynamic rebalancing behavior, actively managing portfolio composition to align with the specified intent variable.

An intent-conditioned policy demonstrates a higher turnover rate in defensive mode with active rebalancing (2.9%/day) and significantly different maximum weight and concentration (<span class="katex-eq" data-katex-display="false">Effective\ N</span>) based on intent (Kruskal p=0.000). — An intent-conditioned policy demonstrates a higher turnover rate in defensive mode with active rebalancing (2.9%/day) and significantly different maximum weight and concentration ( $Effective\ N$ ) based on intent (Kruskal p=0.000).

Validating Predictive Power and Mitigating Bias

The predictive capability of the language model’s signals underwent rigorous evaluation utilizing Information Coefficient (IC) Analysis, a technique based on Spearman Rank Correlation. This analysis measured the correlation between the model’s predictions and subsequent actual returns, yielding an IC of 0.015. This result indicates a statistically significant, albeit modest, predictive power. To establish a baseline for comparison, a randomized dataset was subjected to the same IC Analysis, resulting in a negligible IC of 0.0004. The substantial difference between these values confirms that the observed predictive power isn’t attributable to chance, but rather stems from the information embedded within the language model’s analysis of market data and signals.

The framework incorporates rigorous procedures to ensure the reliability of its performance evaluations by directly confronting prevalent backtesting biases. Specifically, meticulous data handling prevents lookahead bias, wherein future information inappropriately influences past predictions; this is achieved through strict temporal partitioning of data and careful feature engineering. Furthermore, the impact of survivorship bias – the tendency for backtests to disproportionately include successful entities while overlooking failures – is mitigated by utilizing comprehensive datasets that account for delisted or defunct assets, thus providing a more realistic assessment of the strategy’s true performance and minimizing the risk of overoptimistic results.

The framework’s optimization centers on the Differential Sharpe Ratio (DSR) integrated within its reinforcement learning policy, a methodology designed to prioritize returns adjusted for risk and cultivate sustained profitability. Rigorous out-of-sample testing demonstrates the efficacy of this approach, yielding an annualized Sharpe Ratio of 1.40-a notable improvement over the performance of the SPY index. This translates into substantial gains, with the system achieving a cumulative return of 32.22% year-to-date in 2025 and 22.8% across the extended out-of-sample period from 2024 to 2025, a significant outperformance compared to SPY’s 6.1% return during the same timeframe.

Randomizing predictions eliminates the information content (IC) and causes performance to collapse, demonstrating that the observed performance is not due to spurious correlations.

The pursuit of reliable financial models, as demonstrated by BlindTrade, echoes a fundamental tenet of mathematical rigor. The framework’s emphasis on anonymization to mitigate memorization and bias directly addresses the need for reproducible results, a cornerstone of provable systems. G. H. Hardy aptly stated, “The essence of mathematics lies in its certainty.” BlindTrade’s innovative approach, combining LLMs, GNNs, and reinforcement learning, doesn’t merely aim for profitable trades, but for a demonstrably sound methodology, reducing the influence of spurious correlations and ensuring that observed performance isn’t simply a product of data leakage or lookahead bias. This dedication to a verifiable foundation is paramount in a field demanding both precision and predictability.

Beyond the Veil: Future Directions

The presented framework, while demonstrably effective in mitigating certain pitfalls of applying large language models to financial markets, does not, of course, solve the underlying problem. The elimination of memorization and lookahead bias, achieved through rigorous anonymization, merely shifts the focus to the truly difficult question: can an algorithm, divorced from spurious correlations, genuinely discover profitable strategies? The observed performance gains, while encouraging, remain contingent on the specific market conditions tested; extrapolation to entirely novel regimes demands caution. A proof of generalizability, demonstrating consistent outperformance across diverse asset classes and temporal scales, remains elusive.

Future research must address the inherent limitations of reinforcement learning itself. The reward function, even when carefully constructed, is still a simplification of the complex realities of financial valuation. The pursuit of “intent-driven portfolio management” requires a deeper understanding of how to encode true economic rationale into algorithmic decision-making, moving beyond mere pattern recognition. A formal verification of the framework’s behavior – a mathematical guarantee of stability and convergence – would represent a significant advancement, surpassing the current reliance on empirical validation.

Ultimately, the challenge lies not in building more complex models, but in constructing simpler, more elegant solutions grounded in first principles. The quest for alpha, it seems, will continue to demand not just computational power, but a renewed commitment to mathematical rigor and a healthy skepticism of purely data-driven approaches.

Original article: https://arxiv.org/pdf/2603.17692.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Limitations of Static Financial Modeling

BlindTrade: An Anonymization-First Intelligent System

Inferring Inter-Stock Relationships with Graph Networks

Validating Predictive Power and Mitigating Bias

Beyond the Veil: Future Directions

See also: