When Agents Learn to Trade: Modeling Markets with Diverse Behaviors

Author: Denis Avetisyan

New research shows that simulating financial markets with agents who learn and have unique preferences can recreate realistic trading patterns.

A multi-agent reinforcement learning approach models financial markets by assigning unique traits to each agent, then governing their behavior with a shared policy optimized—through optimal transport calibration against real market data—to satisfy individual preferences within a limit order book environment.

Combining multi-agent reinforcement learning with heterogeneous agent preferences produces emergent market dynamics, offering a novel approach to financial market simulation and calibration.

While agent-based models increasingly explain financial markets as emergent phenomena, prior work typically isolates learning and heterogeneous preferences as separate modeling paradigms. This study, ‘Emergence from Emergence: Financial Market Simulation via Learning with Heterogeneous Preferences’, introduces a multi-agent reinforcement learning framework demonstrating that jointly modeling these factors drives both individual behavioral differentiation and realistic collective market dynamics. Specifically, the research reveals how agents’ learning, guided by varying risk aversion and time horizons, fosters niche specialization and ultimately generates emergent patterns like fat-tailed price fluctuations. Could this ‘emergence from emergence’ paradigm offer a more robust foundation for understanding and predicting complex financial system behavior?

The Illusion of Homogeneity: Embracing Agent Diversity

Traditional economic models often simplify reality by assuming homogenous agents and perfect information, a limitation that obscures the complexities of real-world markets. These models fail to account for the inherent diversity in preferences, risk tolerances, and the bounded rationality that characterizes individual decision-making. This diversity drives complex emergent phenomena, including price bubbles and cascading failures.

The simulation progresses by sequentially selecting agents to submit orders into a limit order book (LOB) market, where transactions are determined by matching buy and sell orders and subsequently updating the market state.

Accurately modeling agent-level heterogeneities is therefore paramount. Ignoring these factors yields inaccurate predictions and ineffective interventions. The market is not a simple equation, but a complex system governed by individual uncertainties.

Simulating Complexity: The Power of Agent-Based Modeling

Agent-Based Models (ABMs) provide a powerful framework for exploring complex systems by simulating interacting, autonomous agents. ABMs move beyond traditional methods by explicitly representing heterogeneity, allowing agents to differ in characteristics and strategies.

These models incorporate realistic behaviors, including bounded rationality and adaptive learning, enabling the emergence of novel patterns. The Limit Order Book provides a critical environment for deploying and analyzing these agents, offering a realistic setting for studying market dynamics.

Agent trading performance, measured as log returns, varies significantly across levels of uninformedness, as demonstrated by the distribution of medians and 90% ranges in the box plots.

Defining Rationality: From Randomness to Adaptive Strategies

Agent-based modeling utilizes a spectrum of behavioral rules, ranging from the baseline Zero Intelligence Agent, which acts randomly, to more complex strategies like the Fundamental-Chartist-Noise Agent, combining valuation, technical analysis, and noise.

Advanced models incorporate adaptive learning. The Adaptive FCN Agent dynamically adjusts strategies based on observed conditions within a Partially Observable Markov Decision Process (POMDP) framework. To enhance learning, Shared-Policy Learning allows agents to benefit from each other’s experiences, accelerating convergence and improving overall performance.

Agent action vectors, visualized in a scatter plot and colored by quadrant, reveal a relationship with the agent’s discount factor, which is indicated by point size.

Calibrating the System: Validating Emergent Market Behavior

Calibration techniques, notably Optimal Transport, align agent-based model trait distributions with empirical data, minimizing discrepancies and enhancing realism. Successful calibration, measured by minimizing the Optimal Transport Distance, is a prerequisite for observing realistic emergent phenomena.

Calibrated ABMs achieving the lowest Optimal Transport Distance reproduce complex financial phenomena like Volatility Clustering and Fat-Tailed Return Distributions, evidenced by positive Kurtosis and a Hill coefficient approximating 3. These models exhibit a long-memory property and positive volume-volatility correlation, suggesting true systemic understanding requires mirroring the underlying algorithmic structure.

The scaled order volume derived from the obtained policy exhibits a dependency on both rescaled volatility and the agent’s risk aversion, as illustrated by the heatmap.

The study meticulously constructs a simulated financial ecosystem, mirroring real-world complexities through the implementation of heterogeneous agent preferences. This approach acknowledges that collective market behavior isn’t simply the sum of individual actions, but an emergent property arising from their interactions—a concept echoing G.H. Hardy’s assertion: “A mathematician, like a painter or a poet, is a maker of patterns.” The patterns observed within the simulation, specifically the emergent order book dynamics and realistic price formation, are not pre-programmed but made through the algorithmic interactions of agents, each governed by uniquely calibrated reinforcement learning strategies. The inherent mathematical structure driving these simulated agents allows for a provable link between individual behavioral differences and the resultant macroscopic market phenomena, confirming the power of a rigorously constructed, mathematically-grounded model.

What’s Next?

The demonstration of ‘emergence from emergence’—that is, the confluence of individually learned behavioral rules generating recognizable market phenomena—should not be mistaken for a triumph of simulation fidelity. Rather, it highlights the profound gaps in current methodologies. The observed dynamics, while superficially resembling financial markets, remain largely descriptive. A formal proof of convergence—demonstrating that these learned agent behaviors necessarily lead to specific, predictable market states—is conspicuously absent. The current reliance on calibration against historical data, while pragmatic, lacks the elegance of a mathematically derived solution.

Future work must prioritize analytical rigor. The exploration of alternative reinforcement learning algorithms—those offering guarantees of convergence or bounds on error—is paramount. Equally important is a deeper investigation into the space of preference heterogeneity. The current parameterizations, while sufficient to generate interesting behavior, lack a grounding in economic theory. Are these preferences locally optimal, or merely a consequence of the learning process? A mathematically sound justification for these preferences is critical.

Ultimately, the goal should not be to replicate market behavior, but to explain it. A model is not validated by its resemblance to the observed world, but by its ability to predict future states with quantifiable certainty. Until such predictive power is demonstrated, this remains a fascinating, yet incomplete, exploration of complexity.

Original article: https://arxiv.org/pdf/2511.05207.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Illusion of Homogeneity: Embracing Agent Diversity

Simulating Complexity: The Power of Agent-Based Modeling

Defining Rationality: From Randomness to Adaptive Strategies

Calibrating the System: Validating Emergent Market Behavior

What’s Next?

See also: