Splitting the Difference: Smarter Models from Complex AI

Author: Denis Avetisyan

A new approach decomposes the knowledge of large AI models into simpler components, boosting performance in financial applications.

Through the decomposition of complex large language model features—achieved by varying three decoupling variables—specialized small models are distilled, demonstrating a pathway to simplify intricate systems into manageable, focused components.

This research introduces CMM, a framework leveraging feature decomposition and a Hájek-MoE to distill large language model capabilities into smaller models for improved market making.

While large language models demonstrate promising performance in complex tasks like reinforcement learning for market making, their computational cost hinders practical deployment. This paper, ‘Two Heads are Better than One: Distilling Large Language Model Features Into Small Models with Feature Decomposition and Mixture’, addresses this limitation by introducing Cooperative Market Making (CMM), a novel framework that decomposes LLM features into orthogonal components and integrates them via a Hájek-MoE. Through this approach, CMM achieves superior performance on real-world market datasets compared to existing distillation methods and traditional reinforcement learning strategies. Can this feature decomposition and mixture approach unlock more efficient and scalable LLM applications across diverse financial domains?

Beyond Reinforcement: LLMs and the Evolving Market Landscape

Current market-making algorithms, often reliant on Reinforcement Learning, struggle with complex, rapidly changing data. They generalize poorly across diverse conditions and falter during volatility. A key limitation is their inability to effectively capture the intricate relationships between market features. Traditional approaches treat feature interactions implicitly, learning through trial and error, while a more sophisticated approach is needed for feature extraction and prediction. This work proposes framing market-making as a sequence prediction problem, leveraging large language models to forecast mid-price, spread, and volume, and construct order books via arithmetic sequences—a potentially more robust and adaptable solution.

A market-making workflow leverages a large language model to directly predict future market conditions—mid-price, spread, and volume—and construct orders using arithmetic sequences, demonstrating performance exceeding traditional reinforcement learning algorithms and further improvement with a distilled model suitable for real-time application.

Deconstructing Complexity: A Cooperative Market Making Framework

We propose a Cooperative Market Making (CMM) framework that leverages Large Language Models (LLMs) for feature extraction and prediction. This addresses the computational demands of LLMs in dynamic markets. Central to CMM is Orthogonal Feature Decomposition Distillation, which disentangles features by layer, task, and data type, creating specialized, smaller models. Knowledge distillation transfers the LLM’s understanding to these models, maintaining predictive power with improved efficiency. The framework aggregates outputs from these specialized models, weighting them with confidence scores derived from a kernel function, enabling a nuanced and informed prediction.

The proposed CMM framework decomposes the complex feature space of a large language model across layer, task, and data dimensions, enabling specialized small models to represent it, and then aggregates their outputs—weighted by confidence scores derived from a kernel function—to produce a final prediction.

Layered Intelligence: Uncovering Task Specialization within LLMs

A novel ‘Normalized Fluorescent Probe’ was developed to analyze the Layer Feature Hierarchy within large language models, revealing critical relationships between layers, tasks, and input data. This technique provides granular insight into feature extraction. Analysis reveals task specialization: shallow layers excel at Mid-Price Prediction, middle layers at Spread Prediction, and deep layers at Total Volume Prediction. This specialization demonstrably improves prediction accuracy and efficiency. Understanding the LLM’s response to different Data Market Regimes is crucial for robustness; the probe maps feature sensitivity to regime shifts, informing adaptation and stabilization strategies.

Analysis of the decomposed large language model features reveals increasing separation between clusters under stronger decoupling conditions, with shallow layers specializing in mid-price prediction, middle layers focusing on spread, and deep layers geared towards total volume.

Harnessing Collective Expertise: A Mixture-of-Experts Approach

To refine market making, we integrate outputs from specialized ‘Small Model’ instances using a Hajek Projection-based Mixture-of-Experts. This enables parallel processing of diverse market signals and subsequent aggregation of predictions. Individual Small Models are trained on specific order book aspects, fostering specialization and enhancing predictive power. Kernel Functions project features into a shared space, facilitating weighted averaging of model outputs based on confidence and relevance, reducing variance and improving robustness. This collective expertise yields significant improvements in liquidity provision and reduced transaction costs.

Implementation of the CMM framework achieves a 31.39% improvement in Episodic Profit and Loss (EPnL) compared to the original large language model, while reducing latency by a factor of 6.3 to 0.3 seconds.

Beyond Finance: Future Directions for Intelligent Liquidity

This work demonstrates the potential of large language models to revolutionize financial modeling and market making. We developed a framework leveraging LLM predictive capabilities to simulate order books and optimize trading strategies. Evaluated on the RB dataset, the framework achieved a PnLMAP of 298, indicating substantial profitability and effective risk management, exceeding traditional statistical models. Future research will incorporate alternative data sources like news sentiment and explore complex strategies including algorithmic execution and portfolio optimization. These principles are applicable beyond finance, suggesting broader implications for artificial intelligence and data analysis.

The pursuit of efficient systems, as demonstrated by this paper’s CMM framework, echoes a fundamental tenet of robust design. It appears deceptively simple to distill complex LLM features into manageable components through orthogonal decomposition and a Hájek-MoE, but elegance often belies underlying intricacy. As John McCarthy observed, “If the system looks clever, it’s probably fragile.” This work navigates that risk, seeking not just performance gains in market making—a field demanding both speed and stability—but a structure where complexity is understood and contained. The decomposition process isn’t merely about reducing dimensionality; it’s about revealing the underlying architecture, recognizing that structure dictates behavior, and ultimately building a system less prone to unforeseen failures.

What’s Next?

The pursuit of distilling intelligence into smaller forms, as demonstrated by CMM, inevitably bumps against the inherent limits of decomposition. While orthogonal feature decomposition offers a compelling method for simplification, the question remains: at what point does elegance become fragility? Each division, each extracted component, introduces a potential point of failure, a loss of the holistic understanding residing within the original, larger model. The current work addresses market making, but the broader implication is a search for generalizable distillation techniques. To truly assess the utility of CMM, its performance must be evaluated across a diverse range of tasks, pushing the boundaries of what can be reasonably compressed without catastrophic loss of function.

Further investigation should consider the interplay between decomposition method and the architecture of the receiving model. Is Hájek-MoE the optimal integrator for orthogonally decomposed features, or might alternative approaches—perhaps those prioritizing redundancy over strict separation—yield more robust results? The current paradigm favors increasingly complex mixtures of experts. It may be, however, that true progress lies not in adding layers of sophistication, but in refining the fundamental principles of information representation.

Ultimately, the goal is not simply to shrink models, but to understand why certain features are essential and others are not. This requires moving beyond purely empirical observation and developing a more principled understanding of the underlying structure that dictates behavior. If a design feels clever, it’s probably fragile. The simplest explanation, consistently, remains the most likely.

Original article: https://arxiv.org/pdf/2511.07110.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/