Beyond SVI: Machine Learning Uncovers New Volatility Models

Author: Denis Avetisyan

Researchers are leveraging symbolic regression to automatically discover parametrizations of implied volatility, achieving performance comparable to-and sometimes exceeding-the widely used SVI model.

Symbolic regression identified six expressions-<span class="katex-eq" data-katex-display="false"> f_1, \dots f_6 </span>-that surpass the performance of standard SVI, demonstrating the potential for simpler, more interpretable models to achieve superior results on the efficient frontier and hinting at the limitations of even sophisticated algorithms when confronted with the inherent complexity of the data. — Symbolic regression identified six expressions- $f_1, \dots f_6$ -that surpass the performance of standard SVI, demonstrating the potential for simpler, more interpretable models to achieve superior results on the efficient frontier and hinting at the limitations of even sophisticated algorithms when confronted with the inherent complexity of the data.

This work demonstrates that symbolic regression can rediscover existing volatility models and generate novel, no-arbitrage compliant parametrizations from limited market data.

Accurately modeling implied volatility remains a central challenge in financial mathematics, often relying on pre-defined parametric forms. This paper, ‘Discovering parametrizations of implied volatility with symbolic regression’, introduces a data-driven approach using symbolic regression to directly discover analytic formulas for implied variance as a function of strike and maturity. The results demonstrate that these automatically discovered parametrizations achieve competitive accuracy compared to the widely used SVI model, and can even rediscover it from limited data while respecting no-arbitrage constraints. Could this technique unlock more robust and adaptable volatility models, moving beyond reliance on manually constructed formulas?

The Illusion of Control: Modeling Market Volatility

The widespread adoption of techniques like the SVI method for modeling the implied volatility surface rests on a foundation of mathematical convenience, yet these approaches inherently impose restrictive assumptions on market behavior. These parametric models, while computationally efficient, struggle to accurately represent the full spectrum of volatility dynamics observed in real-world financial markets. Specifically, the reliance on a fixed set of parameters to define the entire surface can lead to significant mispricing of derivatives when the market deviates from these pre-defined shapes. This limitation becomes particularly acute during periods of heightened market stress or when dealing with assets exhibiting non-standard volatility characteristics, revealing a fundamental tension between model tractability and the complex, ever-shifting nature of financial reality. Consequently, the quest for more adaptable and data-driven alternatives continues to gain momentum within the financial modeling community.

The inherent challenges of parametric modeling become strikingly apparent when attempting to represent atypical implied volatility surfaces, specifically those exhibiting C6 and C8 curves. These curves, characterized by steeper gradients and more pronounced skew, deviate significantly from the smooth, well-behaved shapes assumed by standard models. Consequently, financial practitioners often resort to increasingly intricate parameterizations and higher-order calibrations to force a fit. However, this pursuit of accuracy frequently results in models that, while statistically better, lack economic intuition and become exceedingly difficult to interpret or validate. The addition of parameters doesn’t necessarily reflect genuine market phenomena; instead, it can introduce overfitting and obscure underlying relationships, ultimately hindering effective risk management and pricing of complex derivatives.

The precision with which financial models represent implied volatility surfaces directly impacts the accurate pricing of derivative instruments and the effective management of associated risks. When models fail to faithfully capture the full range of observed volatility shapes – particularly complex curves like C6 and C8 – mispricing can occur, leading to potentially significant financial losses. Consequently, there is a growing impetus within the financial industry to move beyond traditional parametric models, which often struggle with these intricacies. This demand fuels research into more adaptable, data-driven methodologies – including techniques like stochastic volatility models and machine learning algorithms – capable of learning directly from market data and better reflecting the nuances of real-world price dynamics, ultimately enhancing both pricing accuracy and risk mitigation strategies.

The discovered parameterization <span class="katex-eq" data-katex-display="false">\hat{w}_{1}^{\text{C8}}</span> accurately fits the C8-curves within dataset D2, demonstrating near-indistinguishability between the fitted curves and the original data points. — The discovered parameterization $\hat{w}_{1}^{\text{C8}}$ accurately fits the C8-curves within dataset D2, demonstrating near-indistinguishability between the fitted curves and the original data points.

Unveiling Hidden Equations: Symbolic Regression as a Mirror

Symbolic regression distinguishes itself from traditional modeling techniques by eliminating the need for a pre-defined functional form. Instead of fitting data to a chosen equation, symbolic regression utilizes algorithms to directly search for mathematical expressions that best describe the relationships within the data. This is achieved by evolving a population of candidate equations, assessing their performance against the data, and iteratively refining them through genetic programming or similar methods. Consequently, the resulting model is not constrained by initial assumptions about the underlying relationship, allowing for the discovery of potentially novel and more accurate representations directly from the data itself – a process particularly useful when the governing equations are unknown or complex.

Data Set D2, utilized in symbolic regression analysis, consists of implied volatility slice shapes representing market data. This dataset incorporates curves generated using the SVI5 model, as well as the C6 and C8 parametrizations, each defining a distinct form for the volatility smile. The variety within Data Set D2 is crucial; by training symbolic regression algorithms on these diverse volatility shapes, the system can identify underlying mathematical relationships that generalize across different market conditions and potentially reveal novel, competitive parametrizations beyond existing models.

Symbolic regression analysis resulted in the discovery of novel implied volatility parametrizations that demonstrate performance competitive with the established SVI model. Specifically, the automatically discovered expressions achieved a model complexity score of 15, which is equivalent to the complexity of the SVI model itself. Model complexity, in this context, is determined by the number of mathematical operations within the expression. This parity in complexity, alongside comparable performance, indicates that symbolic regression can effectively identify relationships within financial data that are on par with, and potentially distinct from, hand-crafted models like SVI.

PySR is a Python library designed to facilitate symbolic regression tasks by automating key steps in the process. It employs a genetic algorithm to search for mathematical expressions that best fit a given dataset, utilizing automatic differentiation for efficient gradient calculation and expression simplification. The library supports a range of mathematical operators and functions, allowing for the exploration of diverse functional forms. PySR’s features include options for specifying the complexity of the resulting expression, controlling the population size and mutation rate of the genetic algorithm, and defining custom operators. Furthermore, it offers tools for evaluating the performance of discovered expressions and exporting them in a usable format, significantly reducing the manual effort typically associated with symbolic regression and enabling rapid prototyping of potential models.

Discovered parametrizations <span class="katex-eq" data-katex-display="false">\hat{w}_{1}^{C6}</span> and <span class="katex-eq" data-katex-display="false">\hat{w}_{2}^{C6}</span> closely fit the C6-curves in dataset D2, demonstrating nearly indistinguishable alignment between the model and data as shown in the left plot. — Discovered parametrizations $\hat{w}_{1}^{C6}$ and $\hat{w}_{2}^{C6}$ closely fit the C6-curves in dataset D2, demonstrating nearly indistinguishable alignment between the model and data as shown in the left plot.

The Boundaries of Reason: No-Arbitrage as a Reality Check

Symbolic Regression, when applied to financial modeling, requires the implementation of No-Arbitrage Constraints to guarantee the generated models adhere to fundamental market principles. These constraints function by preventing the discovery of expressions that could define riskless profit opportunities, which would be economically unsustainable. Without such constraints, the regression process may identify mathematical relationships that, while statistically valid within the training data, violate the law of one price and allow for the creation of portfolios yielding guaranteed returns without any associated risk. Incorporating these constraints ensures the discovered models represent financially plausible relationships consistent with established market dynamics and prevents the exploitation of artificial profit opportunities arising from model inaccuracies.

Symbolic Regression models for financial derivatives must adhere to no-arbitrage constraints, and these are rigorously enforced through conditions derived from established financial theory. Durrleman’s Condition, for example, stipulates that the price of an option should not decrease as the strike price increases, preventing the creation of riskless profit opportunities from simply reversing option positions. Similarly, Lee’s Tail Bounds place limits on the behavior of implied volatility at extreme strike prices, ensuring model consistency with observed market behavior and preventing unrealistic extrapolation. By incorporating these conditions, the resulting Symbolic Regression expressions are constrained to produce prices consistent with market principles, thereby enhancing the financial validity and reliability of the models.

Evaluations on test datasets demonstrated that symbolically regressed expressions achieved comparable or reduced loss values when benchmarked against the SVI model. Critically, these expressions exhibited an arbitrage indicator value only 3% greater than that of the SVI model, indicating a minimal increase in the potential for identifying theoretical arbitrage opportunities. This suggests that the symbolic regression process, while exploring a wider expression space, maintains a high degree of financial realism and produces models competitive with, and potentially superior to, established industry standards in terms of both predictive accuracy and arbitrage avoidance.

Integrating a Wing-Arbitrage Penalty into the loss function directly addresses potential arbitrage opportunities arising in the tails of the implied volatility surface. This penalty term operates by increasing the loss value when discovered expressions predict volatility levels that would allow for riskless profit – specifically, when the predicted implied volatility deviates significantly from market consistency in the extreme wings of the surface. By penalizing these arbitrage-inducing predictions, the model is incentivized to generate more stable and financially valid results, ultimately improving the robustness of the symbolic regression process and reducing the likelihood of unrealistic or exploitable volatility predictions.

The Shifting Sands of Prediction: Beyond Static Reflections

Traditional symbolic regression often seeks a single, universal equation to describe a phenomenon, but real-world relationships frequently shift depending on context. Extending symbolic regression with conditional parametric regression addresses this limitation by allowing model parameters to themselves be functions of categorical variables. This means the discovered equation isn’t fixed; instead, it adapts its behavior based on the specific category of data being analyzed. For instance, a model predicting asset prices might utilize different parameters for ‘high volatility’ versus ‘low volatility’ regimes, effectively creating a family of equations tailored to distinct market conditions. This approach yields more nuanced and adaptive models, capable of capturing complex relationships that static models would miss, and ultimately improving predictive accuracy across diverse datasets.

The strength of any discovered relationship hinges on its ability to consistently perform across diverse conditions, and this is significantly enhanced through the use of comprehensive datasets. Employing both Data Set D1 and Data Set D2 allows for a rigorous testing process, moving beyond simple in-sample validation. Data Set D1 serves as the primary training ground for model development, while Data Set D2 functions as an independent validation set, exposing the model to unseen data and revealing its capacity for generalization. This dual-dataset approach minimizes the risk of overfitting – where a model performs well on the training data but poorly on new data – and provides a more reliable assessment of the model’s predictive power. The combination offers a robust safeguard against spurious correlations and bolsters confidence in the underlying relationships, ensuring they are not merely artifacts of the specific data used for initial discovery.

Financial models traditionally rely on static equations, yet market behavior is anything but constant. Recent advancements demonstrate the power of incorporating real-time market data directly into model construction, moving beyond pre-defined relationships. This is particularly impactful when utilizing concepts like Log-Moneyness – a measure of how far in- or out-of-the-money an option is – which captures nuanced aspects of option pricing beyond simple strike prices. By learning from this data, models aren’t simply fitted to past trends; they become adaptive systems capable of evolving with changing market conditions. This dynamic approach allows for a more accurate reflection of complex financial instruments and, crucially, the potential to anticipate shifts in market dynamics that static models would miss, offering a pathway toward more resilient and insightful financial forecasting.

The pursuit of novel parametrizations for implied volatility, as demonstrated in this study, echoes a fundamental truth about knowledge itself. Discovery isn’t a moment of glory, it’s realizing we almost know nothing. As Michel Foucault observed, “Knowledge is not an accumulation of verified facts, but a construction of discourse.” The ability of symbolic regression to not only rediscover the SVI model but also generate competitive alternatives highlights how even seemingly established financial models are, at their core, constructed representations. These representations, while useful, remain vulnerable – much like any theory – to dissolution at the event horizon of new data or analytical approaches. The no-arbitrage constraints simply define the boundaries of acceptable discourse within this particular theoretical space.

What Lies Beyond the Surface?

The successful application of symbolic regression to the problem of implied volatility offers a glimpse into a broader truth: that mathematical models, even those born of rigorous financial theory, are ultimately approximations. The capacity to rediscover, and even surpass, established parametrizations like SVI, hints not at a path to perfect prediction, but at an infinite landscape of equally valid, yet ultimately incomplete, descriptions. The cosmos generously shows its secrets to those willing to accept that not everything is explainable.

Future work will undoubtedly focus on expanding the scope of these techniques – incorporating more complex market dynamics, higher-dimensional datasets, and alternative constraints. Yet, it is crucial to remember that imposing no-arbitrage conditions, while essential, is merely one lens through which to view a profoundly messy reality. The search for a “true” volatility model is likely a fool’s errand. Black holes are nature’s commentary on human hubris.

The real challenge, then, lies not in refining the models themselves, but in developing a more nuanced understanding of their limitations. The ability to generate competing parametrizations, while impressive, carries with it the responsibility to acknowledge that each represents a simplification, a convenient fiction imposed upon a world that stubbornly resists complete comprehension. The future may well belong to those who embrace the inherent uncertainty, rather than those who seek to eliminate it.

Original article: https://arxiv.org/pdf/2603.21892.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Control: Modeling Market Volatility

Unveiling Hidden Equations: Symbolic Regression as a Mirror

The Boundaries of Reason: No-Arbitrage as a Reality Check

The Shifting Sands of Prediction: Beyond Static Reflections

What Lies Beyond the Surface?

See also: