AI Takes the Reins: Automating Factor Investing for Alpha

Author: Denis Avetisyan

A new framework uses artificial intelligence to autonomously discover and implement profitable investment strategies, potentially reshaping quantitative portfolio management.

Agentic AI facilitates factor generation through a workflow designed to leverage autonomous action and iterative refinement, enabling the system to navigate complex problem spaces and produce desired outcomes with minimal external guidance.

This paper presents an agentic AI system for systematic factor discovery and portfolio construction, demonstrating superior performance compared to traditional methods.

Traditional factor investing often relies on pre-defined signals, limiting its adaptability to evolving market dynamics. This paper, ‘Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI’, introduces a novel, self-directed framework leveraging agentic AI to autonomously discover and implement profitable investment factors. Applying this methodology to U.S. equities yields a portfolio with an annualized Sharpe ratio of 3.11 and a return of 59.53%, demonstrating a scalable and interpretable paradigm for data-driven investment. Could this approach unlock a new era of automated alpha generation and portfolio construction?

The Factor Zoo: Navigating a Sea of Spurious Correlations

The landscape of quantitative finance is often described as a ‘Factor Zoo’ – a rapidly expanding collection of variables proposed to predict asset returns. This proliferation, however, presents a significant challenge. With countless potential predictors – ranging from common valuation ratios to obscure macroeconomic indicators and even sentiment analysis – the risk of discovering spurious correlations dramatically increases. Researchers, in their pursuit of alpha, often inadvertently ‘overfit’ their models to historical data, identifying patterns that appear statistically significant but lack true predictive power in future market conditions. This data mining creates the illusion of skill, leading to investment strategies that ultimately fail when exposed to unseen data, and highlights the critical need for more rigorous and robust factor selection techniques.

Quantitative investment strategies often founder not due to a lack of data, but because distinguishing genuine predictive power from random noise proves remarkably difficult. Traditional statistical techniques, while valuable, frequently identify correlations that appear significant within historical datasets but fail to hold up when applied to future, unseen data – a phenomenon known as overfitting. This susceptibility to spurious correlations arises from the sheer number of potential factors examined – countless variables, combinations, and transformations – increasing the probability of finding a seemingly robust relationship purely by chance. Consequently, investment strategies built upon these unreliable factors can deliver promising backtested results only to falter in live trading, highlighting the critical need for more rigorous methods to validate predictive signals and ensure long-term investment success.

The sheer volume of potential investment factors presents a significant hurdle for quantitative finance; effectively navigating this expansive search space demands innovative approaches. Traditional methods often falter when faced with countless combinations of variables, increasing the risk of identifying factors that appear predictive in historical data but fail to generalize to future market conditions. Building truly robust factors requires not only statistical rigor to avoid spurious correlations, but also a focus on interpretability – understanding why a factor works is crucial for maintaining confidence and adapting strategies as market dynamics evolve. This necessitates techniques that can efficiently prune the factor zoo, prioritizing factors grounded in economic rationale and demonstrably stable across different time periods and market regimes.

LightGBM factor aggregation consistently yields higher cumulative returns compared to linear aggregation across multi-factor portfolios.

Autonomous Factor Discovery: An Agentic AI Framework

The Agentic AI Framework is a quantitative investment system designed to autonomously identify and evaluate potential investment factors. This system utilizes multiple autonomous agents operating with defined goals related to factor discovery, such as identifying variables correlated with asset returns. Unlike traditional factor analysis relying on pre-defined specifications, the framework employs an iterative process where agents propose, test, and refine factors based on observed performance and economic rationale. The system is intended to move beyond simple statistical correlations, focusing instead on factors with a logical basis grounded in financial theory and supported by empirical data, with the ultimate goal of generating investment signals.

The Agentic AI Framework utilizes the ReAct (Reason + Act) framework, a design pattern for large language models that enables iterative action and reasoning. This involves the agent generating both reasoning traces – textual explanations of its thought process – and actions based on its objectives. Crucially, the observations resulting from these actions are fed back into the reasoning process, allowing the agent to assess the outcome of its prior steps and dynamically adjust its subsequent search strategy. This iterative loop of reasoning, acting, and observing facilitates a more robust and adaptable factor discovery process compared to static or non-interactive methods, enabling the agent to overcome challenges and refine its approach in pursuit of its goals.

The Agentic AI Framework incorporates economic rationale as a prioritization mechanism during factor discovery. This is achieved by evaluating potential investment factors based on established financial principles and economic theory. Factors aligning with these principles receive higher weighting, increasing the likelihood of identifying relationships grounded in sound financial logic rather than statistical anomalies. This approach enhances the interpretability of the discovered factors, allowing for a clearer understanding of the underlying drivers of investment performance, and mitigates the risk of implementing strategies based on spurious correlations that may not persist over time.

Agentic factor frameworks demonstrate superior out-of-sample performance compared to traditional methods.

LightGBM: The Engine for Non-Linear Factor Aggregation

LightGBM (Light Gradient Boosting Machine) functions as the primary predictive model within the system, utilizing a gradient boosting framework to combine numerous input factors into a single, consolidated prediction. This is achieved through the iterative training of decision trees, where each subsequent tree corrects errors made by prior trees. The engine is designed for efficiency, enabling rapid model training and prediction even with high-dimensional datasets. By aggregating multiple factors – representing diverse aspects of the prediction target – LightGBM generates a more robust and accurate model compared to approaches relying on fewer or simpler combinations of input data. The resulting model effectively captures complex interactions between these factors, contributing to improved predictive performance.

Gradient-based One-Side Sampling (GOSS) and Exclusive Feature Bundling (EFB) are key optimization techniques employed within the LightGBM algorithm to enhance both performance and robustness. GOSS addresses class imbalance by focusing on gradients with larger absolute values during data sampling, prioritizing informative instances while downsampling smaller gradients, thereby reducing loss and improving model accuracy. EFB is a feature bundling technique that efficiently handles high-dimensional data by identifying and bundling mutually exclusive features – those rarely occurring together – into a single feature, drastically reducing the feature space and computational cost without significant information loss. Both techniques contribute to faster training times and improved generalization capabilities, particularly in datasets with imbalanced classes or a large number of features.

Traditional machine learning models, such as linear regression or decision trees with limited depth, often struggle to model interactions between factors and exhibit limited expressiveness when faced with complex data distributions. LightGBM, through its gradient-based tree growth and leaf-wise splitting strategy, effectively captures these non-linear relationships by allowing the model to learn intricate decision boundaries. This is achieved by iteratively refining the model’s predictions based on the residuals, and by enabling the creation of more complex trees with greater depth, which allows for the representation of higher-order feature interactions without necessarily requiring an exponential increase in computational cost or overfitting.

Comparing linear and LightGBM aggregation methods reveals differences in their ability to capture relevant information, as assessed by the Information Coefficient (IC).

Demonstrating Robustness and Practicality: Validating Performance

The true test of any algorithmic trading framework lies not in how it performs on historical data, but in its ability to generate consistent returns when faced with previously unseen market conditions. This framework is specifically designed to prioritize generalization, employing rigorous validation techniques that emphasize out-of-sample performance. Unlike systems optimized solely for in-sample data – which often exhibit overfitting and subsequent failure when encountering new data – this approach focuses on identifying strategies that demonstrate robustness and adaptability. By continuously evaluating strategies against independent datasets, the framework effectively filters out those prone to overfitting, ensuring that only those with demonstrable predictive power are retained and deployed, thereby bolstering its long-term viability and potential for sustained alpha generation.

The system incorporates a dynamic memory update mechanism that enables continuous performance improvement through iterative backtesting. Following each backtesting cycle, the agent analyzes the results and adjusts its internal parameters, effectively “learning” from past performance. This isn’t a static recalibration; instead, the agent refines its decision-making process by subtly modifying weighting factors and strategy selection criteria. The result is a self-improving system capable of adapting to evolving market conditions and consistently optimizing for long-term returns, as evidenced by its sustained performance across multiple out-of-sample evaluations. This adaptive capability distinguishes the framework, allowing it to move beyond initial strategy design and towards a robust, continually refined trading approach.

A robust trading framework must move beyond theoretical profitability and address the practical realities of market execution; therefore, this system explicitly incorporates transaction costs – including commissions, slippage, and bid-ask spreads – into its backtesting and optimization processes. By simulating these costs, the framework avoids identifying strategies that appear profitable on paper but would be rendered unviable by real-world expenses. This meticulous approach ensures that only genuinely robust and economically feasible strategies are selected, increasing the likelihood of sustained profitability when deployed in live trading. The inclusion of these costs provides a more accurate assessment of a strategy’s true performance and allows for realistic evaluation of its potential for generating alpha in a competitive market environment.

Evaluating investment strategies demands a nuanced approach beyond simply measuring returns; the Sharpe Ratio provides precisely that, quantifying risk-adjusted performance by considering both gains and the volatility required to achieve them. A statistically significant Sharpe Ratio of 3.11, achieved through rigorous out-of-sample testing of this framework, indicates a compelling balance between reward and risk. This value suggests the strategy consistently delivers substantial returns for each unit of risk undertaken, exceeding benchmarks commonly considered indicative of skilled investment management. The framework’s ability to generate such a robust Sharpe Ratio underscores its potential for sustained, profitable performance in dynamic market conditions, offering a valuable tool for investors seeking to maximize returns while maintaining a prudent risk profile.

Rigorous evaluation of the framework reveals a compelling capacity for generating substantial alpha, culminating in an annualized return of 59.53%. This figure isn’t merely a statistical outcome; it signifies the potential for significant gains beyond typical market performance. The consistently positive results, validated through out-of-sample testing, suggest the framework effectively identifies and capitalizes on profitable trading opportunities. Such a high return demonstrates the framework’s capacity to not only navigate market complexities but also to deliver robust, risk-adjusted performance – a critical distinction for practical implementation and sustained profitability.

Gross returns consistently outperform net returns across the evaluation period, demonstrating the impact of transaction costs on overall investment performance.

The Future of Factor Investing: Adaptability and Intelligent Systems

The inherent challenge of factor investing lies in the non-stationary nature of financial markets; relationships that predict future returns are rarely permanent. This framework directly confronts this reality by explicitly modeling and accounting for factor decay – the gradual erosion of a factor’s predictive power over time. Rather than assuming consistent efficacy, the system continuously monitors factor performance, identifying instances where historical relationships weaken or break down. This isn’t simply a reactive adjustment; the model incorporates mechanisms to anticipate and proactively address decay, adjusting factor weights or even dynamically incorporating new factors based on evolving market dynamics. By acknowledging that today’s alpha source may not guarantee tomorrow’s returns, the system strives to maintain a resilient and sustainable investment approach, mitigating the risks associated with relying on static, historical patterns.

The core of this investment approach lies in a system designed for perpetual refinement. Unlike traditional strategies reliant on fixed factor definitions, the agent actively monitors market dynamics and adjusts its investment logic accordingly. This isn’t simply reacting to short-term fluctuations, but rather a continuous learning process where the agent identifies evolving relationships between factors and asset returns. By embracing change and updating its understanding of what drives performance, the system strives to overcome the inevitable decay of predictive signals, ultimately aiming for consistent, long-term outperformance and a lasting advantage in the financial markets.

Quantitative finance is undergoing a fundamental transformation, shifting away from reliance on historically fixed factor models towards systems capable of continuous learning and adaptation. Traditional approaches often treat factors – characteristics like value or momentum – as constants, but emerging agentic systems recognize that market relationships are inherently dynamic. This new paradigm leverages computational intelligence to monitor evolving conditions, recalibrate factor weights, and even discover novel predictive signals. The result is not simply a refinement of existing techniques, but a move towards genuinely intelligent investment strategies that can proactively respond to change, potentially offering a sustained competitive advantage in an increasingly complex financial landscape.

The ultimate ambition of this adaptive framework extends beyond simply mitigating factor decay; it envisions investment strategies possessing a genuine capacity for intelligence. These strategies aren’t programmed with fixed rules, but instead leverage continuous learning to interpret the ever-shifting dynamics of financial markets. By actively processing new information and refining its understanding of market behavior, the system aims to anticipate and respond to complexities that would overwhelm traditional, static models. This pursuit of intelligent investment represents a move toward systems capable of not merely reacting to the financial landscape, but proactively navigating its challenges and identifying opportunities, ultimately promising a more resilient and potentially more rewarding approach to investing.

Agentic AI expands upon traditional AI's capabilities by incorporating planning, tool use, and iterative refinement to achieve complex goals, unlike traditional AI which primarily focuses on pattern recognition and prediction. — Agentic AI expands upon traditional AI’s capabilities by incorporating planning, tool use, and iterative refinement to achieve complex goals, unlike traditional AI which primarily focuses on pattern recognition and prediction.

The pursuit of systematic factor investing, as detailed in the framework, echoes a fundamental principle of interconnectedness. The architecture isn’t merely about identifying profitable signals; it’s about understanding how these signals interact within the larger financial ecosystem. This resonates deeply with Michel Foucault’s observation: “Power is not an institution, and not a structure; neither is it a certain strength one possesses; it is a strategy.” The agentic AI, in its autonomous exploration of data, functions as this strategy, shifting and adapting to uncover ‘power’ – in this case, financial alpha – not through brute force, but through a nuanced understanding of relationships. The system’s ability to discover and implement factors autonomously highlights how structure-the framework’s design-dictates behavior, yielding scalable insights beyond traditional quantitative methods.

Where Do We Go From Here?

The pursuit of systematic investment strategies, even with the aid of agentic AI, inevitably encounters the limits of quantifiable information. This work demonstrates a capacity for automated factor discovery, yet the very architecture of such systems introduces new vulnerabilities. Factors identified through optimization-however robust in backtesting-are, at their core, statistical artifacts. The true test lies not in their initial performance, but in their resilience when faced with genuinely novel market regimes-those unseen conditions that expose the fragility of any model built on past data. Systems break along invisible boundaries-if one cannot see them, pain is coming.

Future research must therefore move beyond the refinement of algorithmic efficiency and address the fundamental problem of representation. How does one encode context – the qualitative, often unquantifiable forces driving market behavior – into a system designed for purely numerical analysis? The expansion of agentic capabilities toward true semantic understanding, coupled with methods for stress-testing against distributional shifts, offers a possible path.

Ultimately, the most pressing challenge isn’t discovering more factors, but understanding why factors fail. A focus on meta-learning-systems that learn how to learn, and therefore anticipate their own limitations-may prove more fruitful than perpetually chasing incremental gains. The elegance of a solution often resides not in its complexity, but in its capacity to reveal the underlying simplicity of the problem – and to acknowledge what remains beyond its grasp.

Original article: https://arxiv.org/pdf/2603.14288.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/