Strategic AI Learns to Conquer Amazons with Limited Resources

Author: Denis Avetisyan

A novel framework combines tree search, graph neural networks, and weak supervision from large language models to achieve strong performance in the complex board game of Amazons, even with computational constraints.

The proposed method establishes a framework predicated on the inevitability of systemic failure, accepting that any architecture is merely a temporary respite before entropy reasserts itself, and thus designs for graceful degradation rather than absolute prevention.

The system integrates Monte Carlo Tree Search with Graph Attention Autoencoders and a Stochastic Graph Genetic Algorithm, learning from weakly supervised data.

Despite advances in AI game-playing, achieving strong performance under resource constraints remains a significant challenge. This paper introduces ‘Resource-constrained Amazons chess decision framework integrating large language models and graph attention’, a novel approach to decision-making in the complex game of Amazons. By integrating a Graph Attention Autoencoder with Monte Carlo Tree Search and leveraging synthetic data generated by large language models, we demonstrate that competitive performance can be achieved even with limited computational resources and weak supervision. Could this hybrid framework unlock new possibilities for evolving specialized AI agents from general-purpose foundation models in other resource-limited domains?

The Inevitable Complexity of Strategic Depth

The game of Amazons, and others exhibiting similarly vast branching factors, presents a formidable obstacle for conventional artificial intelligence search algorithms. These algorithms, often reliant on exhaustively exploring possible move sequences, quickly become computationally overwhelmed as the number of legal moves at each turn-and thus the potential game states-expands exponentially. Unlike simpler games such as Tic-Tac-Toe or even Chess, where selective search techniques can drastically reduce the search space, Amazons’ complex board and strategic options render such optimizations less effective. This necessitates the development of novel AI approaches, moving beyond brute-force calculation towards methods that prioritize promising lines of play and effectively prune irrelevant branches to make the game computationally tractable.

Effective game analysis transcends a simple tally of pieces; accurately gauging a position’s strength necessitates sophisticated evaluation metrics. Traditional algorithms often stumble when faced with subtleties – a seemingly equal material balance can conceal a decisive positional advantage, or a weakness easily exploited through coordinated maneuvers. Researchers are therefore developing metrics that assess factors beyond mere quantity, including piece activity, pawn structure, king safety, and control of key squares. These advanced systems employ complex weighting schemes and, increasingly, machine learning techniques to approximate the ‘true’ value of a position – a value that reflects not just immediate gains, but also long-term strategic potential and inherent vulnerabilities. The goal is to move beyond superficial assessments and capture the nuanced interplay of forces that define success in complex games.

With both <span class="katex-eq" data-katex-display="false">N=30</span> and <span class="katex-eq" data-katex-display="false">N=50</span> search constraints, the Hybrid Model demonstrates comparable performance to GPT-4o-mini. — With both $N=30$ and $N=50$ search constraints, the Hybrid Model demonstrates comparable performance to GPT-4o-mini.

Beyond Exhaustive Search: A Graph-Based Appraisal

Monte Carlo Tree Search (MCTS) is a heuristic search algorithm frequently employed in game AI, particularly for games with large branching factors like Go. While MCTS effectively explores the game tree by balancing exploration and exploitation, its performance is heavily reliant on the accuracy of the node evaluation function. The standard approach involves random simulations to estimate a node’s value; however, this can be computationally expensive and yield inaccurate results, especially in complex positions. Improving the node evaluation, for example by incorporating domain-specific knowledge or learned heuristics, directly translates to more informed decision-making during the tree traversal and ultimately enhances the algorithm’s playing strength. A more accurate evaluation allows MCTS to prioritize promising lines of play and prune less viable options earlier in the search process, reducing the computational burden and improving the quality of the chosen move.

Adjacency-Territory quantifies control by counting vacant intersections directly adjacent to a player’s stones, providing a localized measure of influence. Line-Territory extends this by assessing control along lines – rows, columns, and diagonals – indicating potential for future area consolidation. One-Mobility determines the number of empty intersections a player’s stone can directly reach in one move, representing immediate tactical options. Line-Mobility, similarly, counts reachable empty intersections along lines, indicating broader movement potential. These metrics, when considered in aggregate, offer a more nuanced evaluation of board position than simple territory counts, factoring in both present control and future strategic flexibility for piece maneuvering and area expansion.

Representing a game board as a graph allows for the formalization of relationships between pieces and territories beyond simple spatial coordinates. Nodes in the graph represent board positions or game elements, while edges define connections – for example, adjacency, lines of attack, or territorial influence. This structure facilitates the calculation of complex metrics like connectivity and control, enabling evaluation functions to assess board states based on relational properties rather than isolated positions. Utilizing a graph-based system allows algorithms to efficiently traverse these relationships and quantify the impact of each piece or territory on the overall board state, offering a more nuanced and accurate assessment compared to traditional, coordinate-based methods.

Monte Carlo Tree Search (MCTS) iteratively builds a search tree by balancing exploration and exploitation to determine the optimal action in a given state.

Distilling Structure: Graph Attention Networks as Evaluators

A Graph Attention Autoencoder (GAE) leverages the inherent graph structure of the Monte Carlo Tree Search (MCTS) tree to create a condensed, informative representation of board positions. The MCTS tree, by its nature, represents relationships between game states as a graph, where nodes are board positions and edges represent possible moves. The GAE encodes this graph by employing attention mechanisms to weigh the importance of different nodes and edges during the encoding process. This allows the model to focus on strategically relevant connections and features within the game tree, effectively capturing structural information. The resulting low-dimensional embedding serves as a robust representation of the board position, facilitating improved decision-making and generalization compared to traditional feature-based approaches.

The Graph Attention Autoencoder (GAE) leverages the Information Bottleneck principle to create a compressed, strategically relevant representation of game states. This involves intentionally reducing the dimensionality of the input data – the MCTS tree – forcing the network to retain only the most crucial features for accurate evaluation. By minimizing irrelevant information, the GAE focuses on identifying and encoding the board configurations and relationships that most significantly impact optimal decision-making. This filtering process improves generalization and allows the system to efficiently discern key strategic elements, even with incomplete or noisy data, ultimately enhancing the performance of the Monte Carlo Tree Search.

Integrating a Graph Attention Network (GAN) into the Monte Carlo Tree Search (MCTS) framework enhances both the quality of decision-making and the efficiency of the search process. Standard MCTS relies on random simulations to evaluate board positions; however, the GAN provides a learned representation of the MCTS tree’s structural information, allowing for more informed evaluations. This learned representation facilitates a focused search, prioritizing promising nodes and pruning less relevant branches, thereby reducing the computational cost associated with deep searches. The GAN’s ability to distill critical strategic features from the tree structure enables MCTS to converge on optimal or near-optimal moves with fewer iterations, resulting in a demonstrable improvement in both search speed and decision accuracy.

Evaluations demonstrate the system achieves a 66.5% win rate when competing against GPT-4o-mini. This performance was obtained utilizing a learning paradigm based on weak supervision, meaning the system was trained with limited and potentially noisy data. Critically, this result was achieved with a restricted search depth, indicating the efficiency gains from integrating Graph Attention Networks within the Monte Carlo Tree Search framework. The win rate suggests the system effectively leverages structural information from the MCTS tree to enhance decision-making capabilities, even with limited computational resources.

This autoencoder model utilizes an encoder to compress input data into a latent space representation, and a decoder to reconstruct the original input from that compressed representation.

The Fragile Equilibrium: Stochasticity and Depth Control

The integration of a Stochastic Graph Genetic Algorithm represents a significant advancement in Monte Carlo Tree Search (MCTS) methodology. Rather than relying on purely deterministic node selection, this algorithm introduces an element of randomness inspired by evolutionary principles. By maintaining a population of candidate search trees and applying genetic operators – such as crossover and mutation – to their structures, the algorithm cultivates diversity in the exploration process. This approach actively combats the tendency of MCTS to converge prematurely on suboptimal solutions, effectively mitigating the risk of becoming trapped in local optima. The result is a more robust and adaptable search process, capable of navigating complex decision spaces with greater efficiency and discovering solutions that might otherwise remain hidden.

The search process benefits significantly from techniques that address the inherent challenges of propagating values through increasingly deep search trees. Depth-Dependent Accumulation and Global Depth Normalization work in concert to refine this process; the accumulation method dynamically adjusts node values based on their depth, giving more weight to evaluations closer to the root and mitigating the impact of errors that accumulate at deeper levels. Complementing this, Global Depth Normalization ensures that accumulated values remain within a consistent range, preventing the dominance of any single branch and facilitating more balanced exploration. By actively controlling the flow of information and reducing the effects of error propagation, these techniques enable the algorithm to effectively assess positions at greater search depths, leading to improved decision-making in complex scenarios.

The integration of stochasticity and depth control within the search algorithm yields notably robust and effective decision-making capabilities when confronted with complex game scenarios. By introducing diversity into the candidate selection process and dynamically adjusting node values based on search depth, the algorithm mitigates common pitfalls such as premature convergence on suboptimal solutions and the amplification of errors as the search progresses. This refined approach enables the system to explore a wider range of possibilities and maintain accuracy even in deep searches, resulting in more consistent and strategically sound choices. Consequently, the algorithm demonstrates an enhanced ability to navigate intricate game states and formulate effective plans, proving particularly advantageous in situations demanding long-term foresight and adaptability.

Evaluations demonstrate the efficacy of this novel approach in complex game environments, achieving a 79.5% win rate when competing against UCTS-AE and a 62.0% win rate against GAT-AE, both at a search depth of 20. These results indicate a substantial improvement in decision-making capabilities relative to established algorithms. However, performance metrics reveal a more nuanced outcome when benchmarked against the Stochastic Graph Genetic Algorithm (SGGA); at an increased search depth of 30, the win rate drops to 57.5%. This suggests that while the hybrid approach excels in overcoming limitations of UCTS-AE and GAT-AE, further refinement may be necessary to consistently outperform algorithms explicitly designed to embrace stochasticity and diversity in the search process.

The Stochastic Graph Genetic Algorithm utilizes a graph-based representation and stochastic processes to evolve solutions, combining elements of genetic algorithms with graph theory.

The pursuit of strong performance in complex games, as demonstrated by this work on Amazons, inevitably reveals the limitations of imposed structure. The framework, blending Monte Carlo Tree Search with Graph Attention Autoencoders, attempts to navigate a vast search space, yet relies on weak supervision from large language models – an acknowledgement that complete knowledge is an illusion. As Ken Thompson observed, “A guarantee is just a contract with probability.” The system doesn’t solve Amazons; it adapts, learns, and probabilistically optimizes within inherent uncertainty. Stability, in this context, is merely an illusion that caches well, a temporary respite from the game’s fundamental chaos. The architecture doesn’t dictate success, it propagates it.

What’s Next?

This work demonstrates a path toward intelligence built from constrained resources, grafting the predictive power of large language models onto the skeletal structure of graph-based search. However, the achievement merely reframes the inevitable. The elegance of combining Monte Carlo Tree Search with graph attention does not erase the fundamental truth: every system built reaches a point where its dependencies outweigh its resilience. The Amazon’s board, though finite, presents a combinatorial complexity that will always outpace any fixed architecture.

The reliance on weak supervision, while pragmatic, underscores a deeper limitation. The language model provides guidance, but its knowledge is, itself, an emergent property of a far larger, more chaotic system. To assume its directives are universally optimal is to invite brittle behavior when faced with novelty. Future work will likely focus on closed-loop learning, where the agent actively refines the supervisory signal, but even that pursuit only delays the ultimate descent into unforeseen error states.

The true challenge lies not in achieving strong performance on a single game, but in building systems that gracefully degrade. The framework presented offers a momentary respite from the chaos, a localized equilibrium. Yet, it is a prophecy written in code: every connection established is a potential point of failure, and every optimization enacted narrows the space of possible adaptation. The system will not simply lose; it will become exquisitely, predictably, fragile.

Original article: https://arxiv.org/pdf/2603.10512.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inevitable Complexity of Strategic Depth

Beyond Exhaustive Search: A Graph-Based Appraisal

Distilling Structure: Graph Attention Networks as Evaluators

The Fragile Equilibrium: Stochasticity and Depth Control

What’s Next?

See also: