Uncovering Hidden Signals in Finance with AI

Author: Denis Avetisyan


A new approach combines the power of large language models and evolutionary algorithms to identify more reliable and understandable investment strategies.

CogAlpha cultivates investment strategies through a seven-level agent hierarchy that distills initial alphas from OHLCV data, subsequently subjecting each candidate to rigorous quality assessment and predictive power evaluation-using five distinct metrics-before iteratively refining and recombining qualified strategies via deeper reasoning facilitated by large language models.
CogAlpha cultivates investment strategies through a seven-level agent hierarchy that distills initial alphas from OHLCV data, subsequently subjecting each candidate to rigorous quality assessment and predictive power evaluation-using five distinct metrics-before iteratively refining and recombining qualified strategies via deeper reasoning facilitated by large language models.

This paper introduces Cognitive Alpha Mining, a framework for discovering interpretable and robust financial alphas using LLM-driven code-based evolution.

Despite advances in deep learning and symbolic regression, discovering consistently profitable and interpretable financial signals-or ‘alphas’-remains a significant challenge due to the high dimensionality and noise inherent in market data. This paper introduces the ‘Cognitive Alpha Mining via LLM-Driven Code-Based Evolution’ framework, which synergistically combines large language models with evolutionary search to represent and refine alpha candidates as executable code. Our approach demonstrably expands the search space and yields alphas with superior predictive accuracy, robustness, and economic interpretability compared to existing methods. Could this alignment of LLM reasoning with evolutionary optimization unlock a new era of automated and explainable alpha discovery in quantitative finance?


The Inevitable Decay of Handcrafted Alpha

For decades, the pursuit of alpha – those elusive signals predicting superior financial returns – was largely a handcrafted endeavor. Analysts, grounded in economic theory and market intuition, would meticulously construct factors believed to indicate mispricing or future performance. This process, while capable of identifying genuine predictive relationships – as exemplified by early successes like the Fama-French factors focusing on size and value – suffered from inherent limitations. Each signal required extensive human effort for design, backtesting, and ongoing maintenance. Moreover, these manually-built alphas often proved brittle, losing predictive power as market conditions shifted and relationships decayed. The reliance on subjective judgment and limited computational power meant that a vast landscape of potentially valuable signals remained unexplored, hindering the scalability and adaptability crucial for sustained success in increasingly complex financial ecosystems.

The Fama-French factors – size and value – represented a pivotal early success in systematic investing, demonstrating that these characteristics could reliably predict stock returns. However, the initial models, while effective, faced inherent limitations when attempting broader application. Scaling these strategies beyond a relatively small universe of stocks proved difficult, as the predictive power diminished with increased portfolio size and transaction costs. More critically, the relationships identified were not static; market dynamics shift, and the factors’ efficacy decayed over time, requiring constant re-evaluation and adjustments. This lack of adaptability underscored the need for more dynamic and automated approaches to alpha generation, capable of identifying and exploiting evolving market signals beyond these initial, manually constructed factors.

Financial markets are no longer governed by static relationships; instead, they exhibit increasingly intricate and rapidly shifting dynamics. Consequently, traditional methods of identifying profitable investment signals – those elusive ‘alphas’ – are struggling to maintain efficacy. The sheer volume of data, coupled with the accelerating pace of information flow and the emergence of novel market microstructures, demands a paradigm shift towards automated alpha discovery. These systems leverage computational power and algorithmic techniques to scan vast datasets, identify subtle patterns, and adapt to changing conditions with a speed and scale unattainable through manual analysis. This automation isn’t simply about efficiency; it’s about survival in a landscape where fleeting opportunities and unforeseen events can quickly erode the value of even the most carefully constructed strategies, necessitating constant recalibration and innovation.

The Tools of Automated Discovery

Machine Learning Alpha Discovery leverages the capacity of algorithms, particularly Neural Networks, to identify non-linear relationships within financial datasets that traditional statistical methods may miss. These models, trained on historical data encompassing price movements, volume, and potentially alternative datasets, can discern complex patterns indicative of future price direction. Unlike rule-based systems requiring explicit definition of trading signals, Machine Learning models infer these relationships automatically, adapting to changing market dynamics. Common architectures employed include Recurrent Neural Networks (RNNs) for time-series analysis and Convolutional Neural Networks (CNNs) for pattern recognition in price charts. The output of these models typically generates predictive signals, often probabilities, used to construct and execute trading strategies. The effectiveness of these strategies is heavily dependent on data quality, feature engineering, and robust backtesting procedures.

Formula-based alpha identification leverages computational methods to discover and refine trading strategies based on quantifiable factors. Techniques like Genetic Programming (GP) automate the process of creating and testing formulas – essentially, evolving mathematical expressions – to identify predictive relationships in historical data. GP operates by creating a population of candidate formulas, evaluating their performance against specified criteria (such as Sharpe ratio or annualized return), and then selectively breeding and mutating the most successful formulas to create new generations. Reinforcement Learning (RL) further extends this by framing strategy development as a sequential decision-making process; an agent learns to optimize trading signals by receiving rewards or penalties based on the outcome of each trade. Both GP and RL require a clearly defined reward function and robust backtesting procedures to prevent the optimization of spurious correlations and ensure generalization to unseen data. The output is a set of trading rules, expressed as formulas, that can be implemented algorithmically.

The implementation of machine learning and formula-based alpha discovery techniques necessitates significant computational investment, primarily due to the high dimensionality of typical financial datasets and the iterative nature of model training and backtesting. Furthermore, these methods are susceptible to overfitting, where models perform well on historical data but generalize poorly to unseen data; mitigating this requires careful parameter tuning via techniques like cross-validation, regularization, and the use of out-of-sample testing sets. The complexity of these processes often demands specialized hardware, such as GPUs, and substantial engineering effort to optimize performance and ensure the robustness of derived trading strategies.

CogAlpha's performance varies depending on the fitness threshold used for evaluation.
CogAlpha’s performance varies depending on the fitness threshold used for evaluation.

CogAlpha: A Framework for Evolving Intelligence

LLMAlphaMining represents a novel approach to quantitative finance, utilizing large language models (LLMs) to identify potential alpha factors – sources of excess return. These models are applied to extensive financial datasets, including news articles, regulatory filings, and alternative data sources, to detect patterns and relationships not readily apparent through traditional methods. The core principle relies on the LLM’s ability to process unstructured textual data and extract meaningful signals indicative of future asset price movements. Unlike traditional factor-based investing which relies on pre-defined metrics, LLMAlphaMining allows for the discovery of previously unknown factors, potentially leading to improved portfolio performance and diversification. Initial results demonstrate the feasibility of this approach, highlighting the LLM’s capacity to generate investment signals with statistical significance.

CogAlpha employs a SevenLevelAgentHierarchy to structure the alpha factor discovery process. This hierarchical system decomposes the search for predictive signals into increasingly specific tasks, beginning with broad market analysis and progressing through stages of feature engineering, signal generation, risk assessment, and portfolio construction. Each level within the hierarchy operates as an independent agent, communicating results to higher levels for integration and refinement. This staged approach allows for a more directed and exploratory search compared to unstructured LLM-based methods, facilitating the identification of both well-established and novel alpha signals. The hierarchy enables focused reasoning at each stage, improving the efficiency and interpretability of the generated alphas.

CogAlpha employs a technique called ThinkingEvolution to enhance the reasoning capabilities of the underlying large language models (LLMs) during alpha factor generation. This process iteratively refines the LLM’s thought processes through a series of prompting and evaluation cycles, leading to the discovery of alphas that demonstrate greater robustness and interpretability. Quantitative results indicate CogAlpha outperforms 19 benchmark alpha mining methods, suggesting the induced deeper reasoning translates into improved predictive performance and a more reliable signal for investment strategies. The methodology focuses on evolving the LLM’s internal reasoning chain, rather than simply optimizing for surface-level correlations.

The Rigor of Validation and Measurement

To guarantee the reliability of its generated trading strategies, CogAlpha incorporates a sophisticated MultiAgentQualityChecker. This system doesn’t simply produce alpha code; it subjects each strategy to a rigorous gauntlet of tests performed by multiple independent agents, each designed to assess different aspects of validity and robustness. These agents evaluate the code for logical errors, potential overfitting to historical data, and adherence to specified trading constraints. The system proactively identifies and flags strategies that exhibit weaknesses or inconsistencies, preventing the deployment of potentially flawed algorithms. This multi-layered quality control process ensures that only high-confidence, robust alpha signals are delivered, minimizing risk and maximizing the potential for consistent, positive returns, and contributing to the overall dependability of the platform.

A comprehensive evaluation of CogAlpha’s generated alpha strategies relies on a suite of established financial metrics designed to dissect performance characteristics. The system’s success isn’t simply judged on overall gains, but on the quality of those returns. Information Ratio, calculated as excess return over the risk-free rate divided by tracking error, quantifies risk-adjusted returns. Furthermore, Annualized Excess Return establishes the magnitude of outperformance relative to a benchmark. To delve deeper, the Information Coefficient – the correlation between the portfolio’s excess returns and the benchmark’s excess returns – assesses consistency in outperforming the market. Finally, Rank Information Coefficient provides a more nuanced view, measuring the ability to consistently rank higher than peers. These metrics, considered in concert, offer a robust and multi-faceted understanding of CogAlpha’s alpha generation capabilities and its potential for sustained, risk-adjusted profitability.

Rigorous testing reveals that CogAlpha consistently generates superior alpha strategies when compared to established benchmarks. Performance evaluations, utilizing metrics such as the Information Ratio, Annualized Excess Return, and various Information Coefficient measures, all demonstrate a significant advantage for CogAlpha. This isn’t simply incremental improvement; the system consistently achieves higher scores across all tested metrics, indicating a robust and reliable capacity for identifying profitable trading opportunities. The observed outperformance suggests that CogAlpha’s approach to alpha generation effectively captures market inefficiencies and translates them into quantifiable gains, establishing it as a leading solution in the field of automated investment strategies and offering potential for substantial returns.

The pursuit of quantitative signals, as detailed in this work, echoes a fundamental truth about complex systems. It isn’t about constructing a perfect alpha, but rather cultivating one through iterative refinement – an evolutionary process guided by Large Language Models. As Henri Poincaré observed, “Mathematics is the art of giving reasons.” This research doesn’t seek to define profitability, but to discover the underlying reasons why certain signals emerge, and then evolve those signals toward greater robustness. The framework acknowledges that any initial design is provisional, destined for modification as the system adapts to the ever-shifting landscape of financial data. Every discovered factor is a temporary equilibrium, a point in a continuous drift toward eventual dependency and necessary adaptation.

What Lies Ahead?

The pursuit of algorithmic advantage, framed here as Cognitive Alpha Mining, feels less like engineering and more like tending a garden. Each discovered signal, each profitable factor, is a temporary bloom. The system doesn’t scale; it diversifies, fracturing into a multitude of fragile strategies. Scalability, it appears, is merely the word applied to justify accumulating complexity. The very act of optimization narrows the path, and everything optimized will, eventually, lose flexibility.

The promise of interpretability is a particularly poignant challenge. The LLMs, acting as both explorers and explainers, offer a veneer of understanding, but the ‘why’ behind a profitable signal remains elusive. This isn’t a flaw in the method, but a fundamental characteristic of complex adaptive systems. One wonders if true transparency is even possible, or simply a comforting illusion. The perfect architecture is a myth to keep us sane.

Future work will inevitably focus on meta-strategies-systems that evolve the evolutionary process itself. But this feels like an infinite regress. Perhaps the real frontier lies not in finding more signals, but in understanding the inherent limitations of signal discovery. The edge, after all, is not a destination, but a fleeting moment-a temporary reprieve from the inevitable mean reversion.


Original article: https://arxiv.org/pdf/2511.18850.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

See also:

2025-11-25 07:25