Can AI Find the Next Hot Stock?

Author: Denis Avetisyan

New research shows that generative artificial intelligence, when properly informed, can identify previously unseen factors driving stock performance.

Large language models, utilizing retrieval-augmented generation and prompt engineering, demonstrate potential for alpha generation in cross-sectional equity strategies.

Traditional quantitative investment strategies often struggle to efficiently incorporate insights from unstructured data sources. This research, ‘Generative AI for Stock Selection’, explores the potential of large language models to automate the discovery of predictive features from diverse financial data, including analyst reports, options pricing, and historical market data. The study demonstrates that AI-generated features can consistently deliver competitive, and often superior, risk-adjusted returns compared to traditional methods, with Sharpe ratio improvements up to 91%. Could this approach herald a new era of AI-driven alpha generation, reducing reliance on manual feature engineering and unlocking previously inaccessible investment signals?

The Illusion of Insight: Why Most Financial Models Miss the Forest for the Trees

Historically, financial forecasting has frequently depended on models functioning as ‘black boxes’ – intricate systems where the internal logic remains hidden, and crucially, are fed surprisingly narrow datasets. This reliance on limited feature sets – things like simple moving averages or basic price ratios – restricts a model’s ability to discern subtle yet significant patterns within the vastness of financial data. Consequently, predictive power remains constrained, as these models struggle to account for the multifaceted interactions driving market behavior. The opacity of these systems further exacerbates the issue, making it difficult to identify shortcomings or validate the reasoning behind predictions, ultimately hindering both accuracy and trust in financial forecasts.

The transformation of raw data into actionable insights necessitates a nuanced approach to feature engineering, moving beyond simple data ingestion to actively sculpt variables that reveal underlying relationships. This process isn’t merely about selecting existing data points; it demands the creation of new features – combinations, transformations, and interactions of existing variables – that can effectively capture the complexities inherent in financial systems. Sophisticated techniques, such as polynomial features to model non-linear effects, or the incorporation of domain expertise to build interaction terms, are often crucial. Successfully extracting meaningful signals requires a deep understanding of the data’s generating process and the ability to translate that understanding into variables that empower predictive models to discern genuine patterns from noise, ultimately enhancing their ability to forecast and interpret financial phenomena.

The predictive strength of even the most complex algorithms is fundamentally limited by the quality of the features used to train them. Without diligent feature construction, models frequently identify patterns that appear statistically significant but lack genuine causal relationships – a phenomenon known as spurious correlation. This can lead to overfitting, where the algorithm learns the noise within the training data rather than the underlying signal, resulting in poor generalization to new, unseen data. Consequently, a model might perform exceptionally well on historical data but fail spectacularly when applied to real-world scenarios. Robust feature engineering, therefore, isn’t simply about providing more data; it’s about carefully crafting inputs that accurately represent the relevant relationships, mitigating the risk of misleading patterns and ensuring the model learns true predictive factors.

Automated Feature Discovery: Trading Complexity for a False Sense of Control

DSPy automates prompt engineering for Large Language Models (LLMs) by treating prompt design as a program synthesis problem. The framework utilizes a declarative approach where developers define the desired behavior through a combination of “Programs” – Python functions composing LLM calls – and a search algorithm that iteratively refines these programs based on observed outcomes. This process optimizes feature engineering by automatically discovering effective prompting strategies, including the selection of relevant data and the formulation of instructions, without requiring manual trial and error. DSPy employs techniques such as gradient-based search and evolutionary algorithms to explore the vast prompt space and identify configurations that maximize performance on specified metrics, thereby streamlining the creation of LLM-powered applications.

Combining DSPy with Retrieval-Augmented Generation (RAG) facilitates the injection of external information into the prompt engineering and model response generation process. RAG systems retrieve relevant data from knowledge sources – such as databases, documents, or APIs – and provide this context to the language model alongside the initial prompt. DSPy then automates the optimization of prompts to effectively utilize this retrieved information, learning how to best integrate the external knowledge into the model’s reasoning. This integration allows models to address tasks requiring up-to-date or specialized data that is not inherent in their pre-training, significantly enhancing their performance and adaptability without requiring model retraining.

The integration of DSPy with Retrieval-Augmented Generation (RAG) facilitates the automated development of features specifically designed for financial applications, demonstrably outperforming those created manually. DSPy’s programmatic prompt engineering, combined with RAG’s ability to access and incorporate external data, allows for the iterative refinement of feature definitions based on performance metrics. This process yields features exhibiting improved predictive power and robustness, as evidenced by benchmark testing against hand-engineered alternatives across various financial modeling tasks, including credit risk assessment and fraud detection. The automated approach reduces development time and minimizes reliance on domain expertise in prompt engineering, allowing for rapid adaptation to changing market conditions and data availability.

Gradient Boosted Pipelines: A Black Box with Extra Steps

The predictive model utilizes a Gradient Boosted Tabular Pipeline as its central component, processing a set of pre-engineered features to generate trading signals. This pipeline employs a gradient boosting algorithm, iteratively combining weak prediction models – typically decision trees – to create a strong predictive model. The engineered features, derived from a combination of historical price data, fundamental indicators, and alternative data sources, serve as the input to this pipeline. The tabular format allows for efficient processing and scalability, enabling the model to handle large datasets and generate predictions in a timely manner. This approach facilitates the identification of complex, non-linear relationships within the data, improving the accuracy and robustness of the resulting trading strategy.

To prevent look-ahead bias in model training, the pipeline exclusively utilizes forward-shifted returns and point-in-time data. Forward-shifting involves delaying the use of current-period returns as inputs, ensuring the model only considers information available at the time of prediction. Point-in-time data refers to constructing features and training the model using data exclusively from a prior period; the model is then evaluated on subsequent, unseen data. This methodology strictly adheres to the temporal order of events, preventing the model from inadvertently using future information to make predictions about the past, and thus ensuring the robustness and reliability of the generated signals.

Sector neutrality and volatility normalization are implemented to improve the predictive model’s stability and clarity. Sector neutrality adjusts feature weights to remove systematic biases related to industry classifications, reducing exposure to broad market movements within specific sectors. Volatility normalization scales feature values by each asset’s historical volatility, mitigating the impact of high-volatility assets on overall signal generation and ensuring that predictions are not disproportionately influenced by price swings. These techniques collectively enhance the robustness of the model by reducing spurious correlations and improve interpretability by providing more consistent and comparable feature contributions.

Backtesting demonstrates the Gradient Boosted Tabular Pipeline consistently generates statistically significant alpha with a net Sharpe Ratio of 1.615 after accounting for transaction costs. This performance represents a 47% improvement compared to baseline strategies employed in the same testing environment. The Sharpe Ratio, a measure of risk-adjusted return, indicates that the pipeline delivers a substantial increase in returns relative to the risk undertaken, confirming its effectiveness in identifying profitable trading opportunities. This consistent alpha generation has been validated through rigorous historical testing procedures.

The Information Coefficient (IC) quantifies the predictive power of a feature with respect to future returns; an average absolute IC of 0.00343, calculated across the top 225 features selected by the pipeline, indicates a statistically significant relationship between these features and subsequent price movements. This value suggests that, on average, these features can explain a portion of the variance in future returns, exceeding the performance typically observed in purely random selections. While IC values are scale-dependent and interpretation requires context, a value of 0.00343 demonstrates a demonstrable level of predictive capability within the model’s feature set.

A Sharpe Ratio of 0.559, calculated across the top 225 AI-generated features, indicates a consistently positive risk-adjusted return. This metric, defined as the average return earned in excess of the risk-free rate per unit of volatility, demonstrates the model’s ability to generate returns relative to the level of risk undertaken. A value of 0.559 is considered respectable within quantitative finance, signifying a robust and stable performance profile. The calculation utilizes historical returns and standard deviation to quantify this relationship, providing a standardized measure for evaluating the effectiveness of the predictive features.

Analysis indicates a low correlation – ranging from 0.07 to 0.14 – between the features generated by the artificial intelligence and those used in existing baseline strategies. This limited overlap suggests the AI is identifying predictive signals not already captured by conventional methods, providing diversification benefits to the overall investment approach. The observed low correlation contributes to a more robust model by reducing redundancy and potentially improving generalization performance across different market conditions.

Beyond the Numbers: The Illusion of Control and the Search for Truly Novel Insights

The established analytical pipeline delivers more than simply enhanced forecasting capabilities; it provides a robust, adaptable system for investigating a multitude of feature interactions. Rather than relying on pre-defined variables, the framework allows researchers and practitioners to methodically assess the predictive power of countless combinations, uncovering potentially valuable signals previously obscured by conventional approaches. This systematic exploration isn’t limited to adding more data; it facilitates the creation of entirely new, complex features – with an average operational complexity exceeding that of standard academic factors – and rigorously tests their efficacy. Consequently, the pipeline functions as a generative engine for investment strategies, enabling continuous refinement and adaptation to evolving market dynamics and offering a pathway towards more nuanced and potentially profitable trading methodologies.

A significant challenge in translating financial research into actionable trading strategies lies in the disconnect between idealized model evaluations and the realities of market friction. This work directly addresses this issue by incorporating transaction costs – including brokerage fees, slippage, and market impact – into the backtesting process. By penalizing strategies based on their actual cost to implement, the evaluation becomes far more representative of real-world performance. This refinement moves beyond simply identifying profitable signals to determining which strategies remain viable after accounting for the expenses associated with capturing those gains, effectively bridging the gap between academic exploration and practical portfolio construction. The result is a more robust and realistic assessment of a strategy’s potential, offering insights that are directly applicable to live trading environments.

Continued development will prioritize broadening the scope of this analytical pipeline, initially by incorporating a more extensive range of datasets to enhance its robustness and generalizability. Researchers intend to move beyond current equity-focused analysis and adapt the feature engineering process for application across diverse asset classes, including fixed income, commodities, and foreign exchange. Furthermore, the pipeline’s performance will be rigorously tested under varying market regimes – encompassing both bullish and bearish trends, periods of high and low volatility, and distinct economic cycles – to ensure its adaptability and consistent predictive power, ultimately aiming for a universally applicable framework for financial analysis.

The incorporation of these novel features significantly elevates the potential of cross-sectional analysis in investment strategy development. By moving beyond traditional, simpler factors, analysts can now dissect securities with greater granularity, identifying subtle relationships and mispricings previously obscured. This enhanced analytical capability enables the construction of more nuanced portfolios, potentially capturing alpha from complex interactions between variables and refining risk management through a deeper understanding of asset sensitivities. Consequently, investment strategies can be tailored not only to broad market trends but also to specific characteristics within and across sectors, promising a more sophisticated and potentially rewarding approach to portfolio construction and active management.

The predictive power of financial models often hinges on the sophistication of the features used to represent market data. Recent advancements demonstrate a substantial leap in feature complexity, with generated factors requiring an average of 14.2 computational operations – a figure notably higher than the 2-4 operations characteristic of traditional academic factors. This increased complexity allows for the capture of more nuanced and potentially non-linear relationships within the data, moving beyond simple, easily interpretable signals. While demanding more computational resources, this heightened sophistication suggests a pathway toward uncovering previously hidden predictive patterns and refining investment strategies for improved performance, potentially unlocking alpha that remains inaccessible to simpler models.

The pursuit of automated alpha generation, as detailed in this research, feels predictably optimistic. The article highlights how large language models can uncover diversifying signals in equity markets – a claim that echoes throughout the history of quantitative finance. It’s a clever application of retrieval-augmented generation, certainly, but one can’t help but recall Galileo Galilei’s assertion: “You cannot teach a man anything; you can only help him discover it within himself.” The models aren’t creating alpha; they’re merely surfacing patterns already present, and those patterns, inevitably, will degrade. It’s not a failure of the technology, but a testament to the relentless entropy of financial markets. The elegance of the prompt engineering will, undoubtedly, become tomorrow’s technical debt.

What’s Next?

The enthusiasm for applying large language models to stock selection will, predictably, outpace any actual understanding of why it sometimes works. The current work demonstrates a capacity for automated feature engineering, but it sidesteps the rather inconvenient truth that correlation is not causation, and backtests are, at best, optimistic simulations. The field will likely devolve into an arms race of prompt complexity, with diminishing returns and an ever-increasing reliance on data that, by the time it’s incorporated, has already migrated to other, less-crowded trades.

A more fruitful, if less glamorous, direction lies in rigorous error analysis. It would be useful to determine precisely what kinds of market conditions invalidate these models, and what spurious correlations are most likely to generate false positives. Expect a surge in research attempting to ‘explain’ alpha generated by LLMs – explanations that will invariably rely on post-hoc rationalization.

Ultimately, the challenge isn’t building a system that finds signals, but building one that survives production. Better one well-understood, painstakingly engineered factor than a hundred ‘novel’ features conjured from the ether. The logs will, as always, have the final word.

Original article: https://arxiv.org/pdf/2602.00196.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Insight: Why Most Financial Models Miss the Forest for the Trees

Automated Feature Discovery: Trading Complexity for a False Sense of Control

Gradient Boosted Pipelines: A Black Box with Extra Steps

Beyond the Numbers: The Illusion of Control and the Search for Truly Novel Insights

What’s Next?

See also: