Can AI Solve What We Can? Testing Reasoning with Complex Graphs

A new benchmark challenges large language models with graph algorithm problems, exposing limitations in their ability to handle complex relational data and revealing a tendency to overthink.
![tt-SAIL encodes knowledge graphs as sequences and generates new graphs through a three-stage process: an encoder utilizes self-attention to create a latent representation [latex]\mu, \log\sigma[/latex] from input triples, a decoder autoregressively generates token sequences conditioned on this latent code via cross-attention, and conditional sampling iteratively predicts tokens until the sequence terminates, all within a unified vocabulary encompassing special tokens, entities, and relations.](https://arxiv.org/html/2602.06707v1/x10.png)

![AgentCPM-Explore establishes a training framework wherein an agent iteratively refines its policy through exploration, leveraging a multi-stage process to maximize cumulative reward, formalized as [latex] \max_a \mathbb{E}_{\tau \sim p(\tau|a)} [R(\tau)] [/latex], where τ represents a trajectory and [latex] R(\tau) [/latex] denotes the associated reward.](https://arxiv.org/html/2602.06485v1/x2.png)
![The system demonstrates robustness to geometric distortions by maintaining consistent segmentation of lung imagery-where a transformation applied to the input yields a correspondingly transformed output-and preserving classification accuracy in MNIST digit recognition, even when subjected to diffeomorphic deformations-ensuring [latex]f\_{\theta}(g\cdot x)=g\cdot f\_{\theta}(x)[/latex] for segmentation and [latex]f\_{\theta}(g\cdot x)=f\_{\theta}(x)[/latex] for classification-highlighting an inherent equivariance and invariance to spatial relationships within the data.](https://arxiv.org/html/2602.06695v1/graphics/random/mnist_t/bsp_trans.png)

