Mastering the Art of Deception: An AI Conquers Stratego

Author: Denis Avetisyan

A new artificial intelligence, trained through self-play and strategic search, has achieved superhuman performance in the classic game of imperfect information.

Researchers developed Ataraxos, an AI leveraging reinforcement learning, transformer networks, and belief networks to surpass top human players in the complex board game Stratego.

Despite decades of artificial intelligence research, strategic games with substantial hidden information – such as Stratego – have remained a persistent challenge for achieving truly superhuman performance without prohibitive computational cost. This work, ‘Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search’, demonstrates a significant leap forward, presenting an AI, Ataraxos, that not only matches but decisively surpasses top human Stratego players with a remarkably modest budget. By combining self-play reinforcement learning with innovative test-time search techniques and a novel approach to belief network management, Ataraxos achieves a new state-of-the-art in imperfect information games. Could these methods unlock similarly dramatic advances in other complex domains characterized by uncertainty and strategic depth?

The Veil of Uncertainty: Stratego’s AI Challenge

Stratego presents a unique artificial intelligence challenge due to its inherent incomplete information. Unlike games of perfect knowledge, Stratego demands decision-making based on limited observations and probabilistic assessments of the opponent’s hidden forces. This mirrors the ‘fog of war’ in real-world military scenarios, requiring robust reasoning under uncertainty. Traditional AI approaches struggle with this uncertainty, highlighting the need for adaptive strategies beyond brute force.

Ataraxos: Learning Through Self-Contest

Ataraxos is a Stratego AI agent built upon self-play reinforcement learning. Trained through millions of simulated games, the agent refines its strategic capabilities autonomously, adapting to diverse opponent styles without human guidance. It learns directly from game outcomes, eschewing pre-programmed strategies. Techniques like Lambda Returns and Monte Carlo Returns accurately assess long-term value, prioritizing moves maximizing success.

Navigating the Game Tree: Search and Policy

Ataraxos employs a test-time search algorithm to determine optimal moves dynamically, exploring possible future states. Search depth is adaptively managed to balance computational cost and strategic foresight. This search is guided by a learned policy, refined through Magnetic Mirror Descent—an optimization technique ensuring stable updates and consistent performance. A Belief Network models the probabilities of hidden pieces, reasoning about the game’s likely state despite incomplete information. Regularization prevents overfitting, enabling generalization to novel situations.

Beyond Human Expertise: Victory and Validation

Ataraxos recently defeated a world champion Stratego player, demonstrating its strategic efficacy. The agent secured 15 wins, 4 draws, and 1 loss, achieving an 85% effective win rate. Statistical analysis yielded a p-value of less than 0.00026, strongly supporting the hypothesis that Ataraxos outperforms highly skilled human opponents. Further validation showed a 95% win rate against other world championship attendees. Careful management of the Policy Update Size ensured stable performance throughout learning. The enduring success of Ataraxos demonstrates that clarity can prevail even in the fog of war.

The pursuit of artificial intelligence in complex games, as demonstrated by Ataraxos in Stratego, often leads to intricate designs. However, the system’s success hinges not on the sheer number of features, but on the elegance of its core mechanisms. This echoes the sentiment of Brian Kernighan: “Complexity is our enemy. Simplicity is our friend.” The researchers distilled the essence of strategic gameplay – belief networks managing imperfect information, coupled with efficient test-time search – into a surprisingly lean architecture. The result is not merely a winning algorithm, but a testament to the power of reduction, a system where meaning emerges not from what is added, but from what is removed.

Where to Now?

The creation of Ataraxos, while a demonstrable success, merely clarifies the scale of what remains unknown. The game of Stratego, for all its tactical depth, is a bounded problem. The true challenge isn’t replicating expertise within those bounds, but generalizing the principles to domains lacking even that pretense of structure. The architecture, reliant as it is on self-play, demands an almost profligate expenditure of computational resources. A leaner approach—one prioritizing insight over brute force—remains a distant, though necessary, goal.

Further refinement will undoubtedly yield incremental gains in Stratego itself. However, the core limitation lies in the representation of uncertainty. Belief networks, while functional, are still approximations of a fundamentally unknowable state. The pursuit of more elegant, and crucially, more minimal representations of imperfect information—those that discard extraneous detail rather than attempting to model it—should be prioritized. The signal is often lost in the noise of comprehensive simulation.

Ultimately, the value of this work resides not in a machine that plays Stratego, but in the questions it forces one to confront. The path forward isn’t about building ever-more-complex systems, but about identifying—and then ruthlessly eliminating—unnecessary complexity. The simplest explanation is almost always the most robust, even—or perhaps especially—in the face of strategic deception.

Original article: https://arxiv.org/pdf/2511.07312.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Veil of Uncertainty: Stratego’s AI Challenge

Ataraxos: Learning Through Self-Contest

Navigating the Game Tree: Search and Policy

Beyond Human Expertise: Victory and Validation

Where to Now?

See also: