Beyond Prediction: Smarter Portfolio Optimization for Real-World Markets

Author: Denis Avetisyan

A new approach to portfolio management directly links learning objectives to investment decisions, delivering consistently improved performance and resilience.

During the intense market volatility of early 2020, robust optimization and a conservative stochastic programming approach-incorporating transaction costs-yielded nearly identical portfolio allocations, demonstrating that, under extreme stress, the inherent constraints and penalties within the latter already provided sufficient robustness without requiring additional complexity from the former-a finding suggesting diminishing returns from further robustness enhancements in highly constrained optimization problems.

This review demonstrates that aligning learning with decision-focused techniques-specifically the Smart Predict-then-Optimize paradigm-outperforms traditional methods while accounting for transaction costs and market volatility.

While improved return forecasts don’t always translate to better investment decisions, especially considering real-world trading frictions, this paper introduces the ‘Smart Predict–then–Optimize Paradigm for Portfolio Optimization in Real Markets’ to address this limitation. By directly aligning predictive model training with downstream portfolio performance-through a decision-focused learning approach-we demonstrate consistent improvements in risk-adjusted returns and robustness across diverse market conditions. Utilizing U.S. ETF data and incorporating transaction costs, our results highlight the practical value of this paradigm, even during adverse regimes like the 2020 COVID-19 pandemic. Could this decision-focused approach represent a fundamental shift in how predictive models are developed and applied within quantitative finance?

The Illusion of Control: Why Portfolio Optimization Often Fails

Mean-Variance Optimization, a cornerstone of modern portfolio theory, operates on the principle of maximizing expected return for a given level of risk, or minimizing risk for a target return. However, its practical application frequently falls short of theoretical ideals due to fundamental assumptions about asset returns. Specifically, the model often presumes returns follow a normal distribution and that risk can be adequately captured by standard deviation. Real-world financial data rarely conforms to these conditions; asset returns frequently exhibit skewness and kurtosis – ‘fat tails’ – indicating a higher probability of extreme events than the model predicts. Furthermore, the reliance on historical data to estimate expected returns and correlations introduces estimation errors, which can significantly distort portfolio allocations and lead to suboptimal performance, especially during periods of market stress or regime shifts. This inherent sensitivity to input parameters limits the robustness of Mean-Variance Optimization in dynamic and unpredictable financial landscapes.

Traditional portfolio optimization techniques, while mathematically elegant, frequently stumble when confronted with the unpredictable nature of financial markets. These methods often rely on historical data to estimate key parameters – such as expected returns, volatility, and correlations – yet these estimations are inherently prone to error and may not accurately reflect future market behavior. Even minor inaccuracies in these inputs can dramatically alter the optimal portfolio allocation, leading to unexpectedly high levels of risk. Moreover, these approaches typically fail to account for complex, real-world dynamics like transaction costs, liquidity constraints, or the impact of investor behavior, further exacerbating the potential for suboptimal outcomes and increasing the likelihood of significant financial losses. The sensitivity to these estimation errors underscores a fundamental limitation of relying solely on historical data to predict future performance in a constantly evolving financial landscape.

Traditional portfolio optimization techniques, while foundational, often operate under the constraint of static analysis, a limitation that significantly impacts long-term investment success. These methods typically define an optimal asset allocation based on historical data and projected returns at a single point in time, failing to account for the inherent dynamism of financial markets. As economic conditions evolve, correlations between assets shift, and new information emerges, the initially calculated optimal portfolio can quickly become outdated and misaligned with current realities. This inflexibility prevents portfolios from capitalizing on emerging opportunities or adequately mitigating new risks, potentially leading to diminished returns and increased vulnerability during periods of market volatility. Consequently, a static approach struggles to deliver consistent performance over extended investment horizons, highlighting the need for more adaptive strategies that can continuously rebalance and recalibrate in response to changing market dynamics.

Predictive Models: A Necessary, But Insufficient, Step

Recurrent Neural Networks (RNNs) and Transformer architectures represent advanced time series modeling techniques capable of forecasting asset returns and subsequently enhancing portfolio construction. RNNs, particularly Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks, process sequential data by maintaining an internal state, allowing them to capture temporal dependencies. Transformer architectures, utilizing self-attention mechanisms, excel at identifying relationships within the entire time series, even across long distances, without the sequential processing limitations of RNNs. These models ingest historical asset price data, volume, and potentially macroeconomic indicators to learn complex, non-linear patterns. The resulting return forecasts can then be integrated into portfolio optimization algorithms – such as mean-variance optimization or risk parity – to generate portfolios with improved risk-adjusted returns compared to those constructed using simpler forecasting methods or static asset allocations.

Time series models utilize historical data to identify and quantify relationships between past values and future outcomes, enabling improved predictive accuracy. These models move beyond simple linear regression by incorporating techniques like autoregression, moving averages, and exponential smoothing to capture temporal dependencies and seasonality. More advanced architectures, such as Recurrent Neural Networks (RNNs) and Transformers, can model non-linear relationships and long-range dependencies within the data. The ability to recognize these complex patterns – including trends, cycles, and autocorrelations – allows the model to extrapolate future values with greater precision than methods relying solely on static relationships or limited historical windows. Consequently, the models’ predictive power stems from their capacity to learn the underlying data-generating process from observed sequences.

While predictive models can generate forecasts of asset returns, these predictions alone do not constitute a complete investment strategy. An effective framework requires the integration of these forecasts with an optimization process to translate predicted returns into actionable portfolio allocations. This optimization step involves considering factors such as risk tolerance, investment constraints, and transaction costs to determine the portfolio weights that maximize expected returns for a given level of risk. Without this integration, predicted returns remain theoretical and cannot be directly applied to improve portfolio construction or drive investment decisions; a complete system necessitates both predictive capability and a robust optimization engine.

Decision-Focused Learning: Shifting the Focus to Actionable Outcomes

Decision-Focused Learning (DFL) represents a departure from traditional machine learning approaches to portfolio management, which often prioritize prediction accuracy as a proxy for investment performance. Instead, DFL directly optimizes for the quality of decisions-specifically, portfolio allocations-by treating predictions not as ends in themselves, but as inputs to an optimization process. This paradigm shift allows the model to learn the relationship between prediction errors and their impact on portfolio outcomes, effectively learning to make better decisions even with imperfect predictions. Consequently, DFL prioritizes maximizing $\mathbb{E}[R - \lambda C]$ , where R represents portfolio returns, C denotes transaction costs, and λ is a regularization parameter balancing return and cost, directly addressing the core objective of investment management.

Smart Predict-Then-Optimize (SPO) represents an advancement over Decision-Focused Learning (DFL) by directly integrating the portfolio optimization process into the model training phase. Traditional DFL methods typically predict future outcomes and then separately optimize decisions based on those predictions; SPO, however, embeds the optimization problem – including constraints and objectives – as part of the loss function within the learning loop. This creates a closed-loop system where the model learns to predict not just the future, but also the optimal actions to take given its predictions, effectively learning a policy that simultaneously predicts and optimizes. The optimization component directly influences the model’s predictive weights, resulting in predictions specifically tailored for improved decision-making, rather than simply accurate forecasting.

Within the Smart Predict-Then-Optimize (SPO) framework, the utilization of linear predictors – specifically those incorporating technical indicators such as Moving Average Convergence Divergence (MACD), Relative Strength Index (RSI), and Simple Moving Average (SMA) – provides multiple benefits. These indicators serve as readily interpretable input features, allowing for direct assessment of the model’s reliance on established trading signals. The linear nature of these predictors simplifies model complexity, contributing to increased robustness and reducing the risk of overfitting, particularly when dealing with limited historical data. Furthermore, the transparency afforded by linear predictors facilitates easier debugging and validation of the model’s decision-making process, improving trust and facilitating practical implementation.

The integration of established portfolio optimization formulations – including mean-variance optimization and risk parity – within the Smart Predict-Then-Optimize (SPO) framework substantially improves the practical relevance of the model. Critically, the SPO+ model explicitly accounts for transaction costs, a significant factor often omitted in theoretical portfolio construction. Empirical results demonstrate that SPO+ consistently yields improved portfolio decision quality and enhanced risk-adjusted returns – specifically, Sharpe ratios and Sortino ratios – when benchmarked against conventional portfolio management techniques that do not incorporate these optimization and cost considerations. This improvement is observed across multiple asset classes and market conditions, validating the model’s robustness and applicability in real-world trading scenarios.

Robustness and Risk Management: Acknowledging the Inevitable Uncertainty

Portfolio optimization traditionally relies on expected returns, a single point estimate vulnerable to market volatility and unforeseen events. Robust Portfolio Optimization, however, acknowledges that future asset returns aren’t fixed values but rather exist within a range of plausible outcomes. This approach shifts the focus from optimizing for a single, potentially inaccurate, expectation to finding portfolios that perform acceptably well across a set of possible scenarios. By explicitly considering this inherent uncertainty, the methodology constructs portfolios that are less sensitive to estimation errors and more resilient to adverse market conditions. Instead of seeking the absolute best outcome under ideal circumstances, it prioritizes consistent performance and minimizes potential losses when reality diverges from initial predictions, offering a pragmatic approach to investment strategy.

Distributionally Robust Optimization (DRO) represents a significant advancement in portfolio management by moving beyond reliance on single-point estimates of asset returns. Instead of assuming a precise distribution, DRO explicitly acknowledges and incorporates uncertainty surrounding these parameters. This is achieved by optimizing portfolio performance not just for a nominal distribution, but across a defined ambiguity set – a range of plausible distributions reflecting potential model misspecification or unforeseen market shifts. By considering the worst-case performance within this ambiguity set, DRO aims to create portfolios that are demonstrably more resilient to distributional uncertainty and less susceptible to unexpected losses. This approach, unlike traditional methods, actively guards against the risks stemming from inaccurate or incomplete knowledge of underlying asset behavior, offering a more conservative and reliable framework for navigating complex financial landscapes.

Conditional Value-at-Risk, or CVaR, represents a critical advancement in risk assessment by focusing not simply on the magnitude of potential losses, but on the expected loss given that a certain threshold has already been breached. Unlike simpler measures like Value-at-Risk (VaR), which only indicates the maximum loss within a given probability, CVaR calculates the average of all losses exceeding that VaR level. This provides a more comprehensive understanding of “tail risk”-the potential for extreme, infrequent losses that can significantly impact a portfolio. $CVaR = E[Loss | Loss > VaR]$ Consequently, CVaR is particularly valuable for risk-averse investors and portfolio managers seeking to mitigate the impact of catastrophic events, offering a more nuanced perspective on downside exposure than traditional risk metrics.

Evaluating investment strategies in theory differs significantly from observing their performance amidst actual market fluctuations; therefore, rigorous backtesting with rolling-window methods is essential for assessing real-world robustness. This technique simulates trading over time, using only past data to make decisions, and continually updates the analysis as new information becomes available. Recent studies utilizing this approach demonstrate the benefits of robust optimization; for instance, the RobustSPO model, parameterized with $\rho = 0.1$ , exhibited a maximum drawdown of less than 10% during the highly volatile COVID-19 pandemic. This performance stands in stark contrast to the over 30% drawdown experienced by a standard Predict-Then-Optimize approach, highlighting the critical role of explicitly accounting for uncertainty in achieving more resilient portfolio outcomes.

The Future of Intelligent Portfolio Allocation: A Convergence of Techniques

Deep Reinforcement Learning (DRL) presents a compelling avenue for the automated management of investment portfolios, particularly in response to ever-shifting market landscapes. Traditional portfolio optimization often relies on pre-defined rules or static models, proving inflexible when faced with unforeseen economic events. DRL, however, utilizes algorithms inspired by behavioral psychology, enabling a system to learn optimal trading strategies through trial and error within a simulated market environment. This allows the portfolio to adapt dynamically, responding to complex market signals and maximizing returns while minimizing risk. The system isn’t programmed with specific instructions; instead, it develops its own strategies based on the rewards – or profits – it receives for successful trades, ultimately leading to a more resilient and potentially higher-performing investment approach compared to conventional methods.

Differentiable allocation represents a significant advancement in portfolio construction by enabling the direct optimization of asset weights through gradient descent. Traditional portfolio optimization often relies on iterative, non-differentiable processes, requiring complex approximations and limiting the potential for efficient refinement. This method, however, formulates the entire allocation process as a single, continuous function, allowing for end-to-end training with standard backpropagation techniques. Consequently, the investment process is streamlined, as models can directly learn optimal weight adjustments based on market feedback, rather than relying on pre-defined rules or heuristics. This approach not only accelerates the optimization process but also facilitates the incorporation of complex constraints and risk preferences, leading to portfolios that are better tailored to specific investor needs and market conditions.

Softmax allocation enhances the robustness and transparency of intelligent portfolio systems by transforming raw investment preferences into probability distributions. Instead of rigidly assigning capital to specific assets, this technique assigns a probability to each asset, reflecting the model’s confidence in its potential performance; the allocation then scales with these probabilities. This probabilistic approach introduces a degree of flexibility, preventing overly concentrated positions and mitigating the impact of any single asset’s volatility-thereby boosting model stability. Furthermore, the resulting probability distribution offers a readily interpretable view of the model’s decision-making process, revealing which assets the system deems most promising and to what degree; this transparency is crucial for building trust and understanding in automated investment strategies. The probabilities, normalized to sum to one, provide a clear and concise representation of the portfolio’s risk exposure and investment rationale.

The convergence of deep reinforcement learning, differentiable allocation, and softmax allocation strategies promises a new era in portfolio management, resulting in systems capable of outperforming traditional methods. Empirical evidence, notably from the SPO+ model, substantiates this potential; it achieved an annualized return of 14.05% coupled with a Sharpe Ratio of 0.785. This performance extends beyond favorable market conditions, as SPO+ consistently delivered superior risk-adjusted returns when benchmarked against a PtO baseline, navigating both the volatility of the COVID-19 crisis and the gains of a sustained bull market. Further illustrating its robustness, the model also boasts a Sortino Ratio of 0.728, signifying a strong capacity to mitigate downside risk and consistently generate positive outcomes for investors.

The pursuit of flawless optimization models feels increasingly…quixotic. This paper’s focus on aligning learning objectives with actual portfolio decisions-decision-focused learning-simply acknowledges a fundamental truth: the market doesn’t care about elegant predictions. It responds to actions. It’s a blunt admission that ‘the good life is one inspired by love and guided by knowledge,’ but in finance, that knowledge must be demonstrably linked to profit, not just theoretical accuracy. The consistent outperformance across varying conditions isn’t a triumph of prediction, it’s a recognition that robustness-withstanding the inevitable chaos-is the only sustainable advantage. They don’t deploy models-they let go, and hope for controlled failure.

What’s Next?

The pursuit of ‘smart’ optimization will inevitably reveal the limitations of ‘smart’ prediction. This work, aligning learning directly with action, demonstrates a performance improvement – a temporary stay of execution, perhaps – but does not erase the fundamental tension. Every model, however decision-focused, is a map, not the territory. Transaction costs, acknowledged here, are merely the most visible friction; the true cost lies in the assumptions baked into the optimization itself.

Future iterations will undoubtedly focus on extending this framework to more complex asset classes and incorporating alternative risk measures. Yet, the real challenge isn’t scaling the algorithm, but acknowledging its inherent fragility. The market, after all, is remarkably efficient at exploiting any consistent edge, however subtly learned. Everything optimized will one day be optimized back, requiring continuous, and ultimately Sisyphean, refinement.

The field will likely move toward meta-optimization – algorithms that learn how to optimize, rather than directly optimizing. But architecture isn’t a diagram; it’s a compromise that survived deployment. The focus shouldn’t be on building the perfect optimizer, but on building systems that can gracefully degrade – and perhaps, occasionally, resuscitate hope – when the inevitable imperfections emerge.

Original article: https://arxiv.org/pdf/2601.04062.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/