Rewriting the Graph: Improving AI Explanations with Targeted Changes

Author: Denis Avetisyan

A new approach to explaining graph neural network decisions focuses on intelligently altering the input graph to reveal key influencing factors.

The system explores counterfactual explanations in graph classification not simply by removing edges, but through a more comprehensive perturbation of both edge additions and feature alterations-as demonstrated by simulating the conversion of benzene from neutral to cationic through electron loss-thereby expanding the search space for identifying critical factors influencing model predictions.

This paper introduces XPlore, a counterfactual explainer that enhances the quality and faithfulness of explanations for graph neural networks through edge additions, node feature perturbations, and gradient-based optimization.

Despite the increasing deployment of Graph Neural Networks (GNNs) across critical domains, their lack of transparency hinders trust and reliable decision-making. This work, ‘Beyond Edge Deletion: A Comprehensive Approach to Counterfactual Explanation in Graph Neural Networks’, addresses this challenge by introducing XPlore, a novel technique that significantly expands the search space for counterfactual explanations beyond simple edge removals to include edge additions and node feature perturbations. By employing gradient-guided optimization, XPlore generates more coherent and minimal explanations, demonstrated through improvements of up to +56.3% in validity and +52.8% in fidelity across diverse benchmarks. Can this richer approach to counterfactual explanations ultimately unlock the full potential of GNNs in high-stakes applications requiring both accuracy and interpretability?

The Opaque Oracle: When Prediction Outstrips Understanding

Graph Neural Networks (GNNs) have rapidly become a leading approach for analyzing complex relational data, demonstrating remarkable performance across diverse fields like social network analysis, drug discovery, and recommendation systems. However, this power comes at a cost: their internal workings often remain a “black box”. Unlike traditional machine learning models where feature importance can be readily assessed, GNNs’ decision-making processes are distributed across the graph structure and numerous network layers, making it difficult to pinpoint why a particular prediction was made. This opacity poses significant challenges for deploying GNNs in critical applications where trust and accountability are paramount – for example, in medical diagnosis or financial risk assessment. Without understanding the reasoning behind a GNN’s output, verifying its reliability, identifying potential biases, or ensuring its robustness becomes exceedingly difficult, ultimately hindering wider adoption despite their predictive capabilities.

The utility of Graph Neural Networks extends far beyond predictive accuracy; discerning why a particular prediction was generated is paramount for responsible and effective implementation. Without understanding the reasoning behind a GNN’s output, identifying and correcting errors – a critical debugging step – becomes exceptionally difficult. More importantly, assessing potential biases embedded within the network’s decision-making process requires a transparent view of its internal logic, ensuring fairness and preventing discriminatory outcomes. Ultimately, the ability to interpret a GNN’s predictions unlocks deeper insights from the underlying data, moving beyond simple classification or regression to reveal the complex relationships and influential factors driving the results – transforming the network from a ‘black box’ into a powerful tool for knowledge discovery.

Node explanations demonstrate a strong trade-off between validity-measuring the consistency of explanations with the ground truth-and fidelity-quantifying how well explanations reflect the model's decision-making process. — Node explanations demonstrate a strong trade-off between validity-measuring the consistency of explanations with the ground truth-and fidelity-quantifying how well explanations reflect the model’s decision-making process.

Counterfactuals: Rewriting the Narrative of Prediction

Graph Counterfactual Explanations (GCE) focus on determining the smallest modifications to a given graph structure that would result in a different prediction from a Graph Neural Network (GNN). This is achieved by systematically altering the input graph – typically through edge additions or removals, or node attribute changes – and observing the corresponding change in the GNN’s output. The core principle is to identify a minimal “counterfactual” graph where the model’s prediction differs from the original, providing insight into which aspects of the input most strongly influence the GNN’s decision-making process. The identified changes are not simply any alterations that shift the prediction, but rather those requiring the fewest modifications to achieve that shift, highlighting the most critical features for the model.

Graph Counterfactual Explanations (GCE) facilitate understanding of Graph Neural Network (GNN) decision-making by presenting minimal graph alterations that would result in a different prediction. These “what-if” scenarios move beyond simply identifying influential nodes or edges; instead, they demonstrate specific changes to the input graph structure or node features that directly impact the model’s output. This provides actionable insights by highlighting the precise conditions under which the GNN’s prediction would change, enabling users to assess the model’s sensitivity to specific features and potentially identify biases or vulnerabilities in its reasoning process. Consequently, GCE allows for a more targeted and effective approach to model debugging, refinement, and trust-building than traditional explanation methods.

Generating effective Graph Counterfactual Explanations (GCE) necessitates a systematic search within the expansive space of potential graph alterations; the number of possible edge additions, deletions, or attribute modifications grows combinatorially with the size of the graph. Consequently, naive approaches-such as exhaustively evaluating all possible changes-are computationally infeasible for even moderately sized graphs. Efficient search strategies, therefore, employ heuristics and optimization techniques – including greedy algorithms, beam search, and $A*$ search – to prune the search space and identify minimal modifications that achieve the desired prediction change. Furthermore, comprehensive search requires defining appropriate constraints and regularization terms to prevent the generation of counterfactuals that are unrealistic or irrelevant to the underlying domain.

By strategically modifying the graph structure-adding or removing edges and potentially nodes-the model can successfully flip predictions from mutagenic to non-mutagenic, though maintaining chemically valid properties isn't guaranteed in all cases. — By strategically modifying the graph structure-adding or removing edges and potentially nodes-the model can successfully flip predictions from mutagenic to non-mutagenic, though maintaining chemically valid properties isn’t guaranteed in all cases.

XPlore: Charting a Course Through the Space of Possibilities

XPlore generates counterfactual graphs by systematically modifying the input graph structure and node features. This is achieved through the addition of edges to the original graph and the perturbation of individual node feature values. Employing both edge additions and feature perturbations allows XPlore to explore a broader solution space compared to methods that focus on a single type of modification. The combination aims to identify minimal changes to the input graph that will alter the model’s prediction, effectively creating a more comprehensive set of potential counterfactual explanations for the model’s behavior.

XPlore utilizes Projected Gradient Descent (PGD) as an iterative optimization technique to determine the smallest alterations to the input graph that will successfully change the model’s predictive outcome. PGD efficiently searches for these adversarial examples by repeatedly taking gradient steps within a constrained perturbation set. A significant challenge in this process is the non-differentiability of the discrete graph modification operations; therefore, XPlore incorporates the Straight-Through Estimator (STE). The STE approximates the gradient of these discrete operations by simply passing the gradient through as if the operation were a smooth, differentiable function, allowing for effective backpropagation and optimization despite the discrete nature of graph changes.

XPlore incorporates a distance loss, $L_{dist}$ , to constrain the search for counterfactual graphs by penalizing significant deviations from the original graph structure. This loss function quantifies the difference between the adjacency matrices of the original and counterfactual graphs, effectively encouraging the identification of minimal changes necessary to alter the model’s prediction. Simultaneously, the prediction loss, $L_{pred}$ , ensures that the counterfactual graph yields a different prediction than the original, preventing trivial solutions where the graph remains largely unchanged but the prediction is altered. The optimization process balances these two losses, minimizing $L_{dist}$ while simultaneously maximizing the change in $L_{pred}$ , thereby achieving a targeted and efficient counterfactual explanation.

XPlore consistently maintains strong counterfactual similarity (CS) scores-comparable to those of competitors-across equivalent graph edit distances (GED) in log-scale, suggesting it effectively preserves semantic relationships and mitigates out-of-distribution generalization issues.

Measuring the Echo of Change: Validation and Significance

The effectiveness of XPlore is rigorously evaluated through two primary metrics: Validity and Fidelity. Validity quantifies the system’s ability to successfully alter a model’s prediction by modifying the input graph – a higher score indicates more reliable counterfactual explanations. Complementing this, Fidelity measures the degree to which the generated counterfactual graph resembles the original input; a high Fidelity score suggests the explanation is plausible and doesn’t require drastic changes to understand the prediction shift. These metrics, considered in tandem, provide a comprehensive assessment of explanation quality, ensuring XPlore doesn’t just change a prediction, but does so with minimal and meaningful alterations to the input data, fostering trust and interpretability in the model’s reasoning.

Assessing the practicality of explanations requires evaluating not only what changes are made to a graph, but also how many. Sparsity, a key metric, quantifies the number of alterations – additions or deletions of nodes and edges – needed to generate a counterfactual explanation, with fewer changes generally indicating a more concise and readily understandable rationale. Complementing this is Graph Edit Distance (GED), which calculates the minimum number of edit operations required to transform one graph into another; a lower GED suggests the counterfactual remains structurally similar to the original, enhancing its plausibility. By prioritizing explanations with both high sparsity and low GED, researchers aim to identify interventions that are not only effective in altering a model’s prediction, but also intuitively reasonable and potentially implementable in real-world scenarios.

The XPlore system demonstrably advances the field of explainable AI, achieving results that surpass current state-of-the-art methods in generating meaningful counterfactual explanations. Rigorous evaluation reveals a substantial increase in Validity – the ability of an explanation to successfully alter a model’s prediction – by +17.3%. Complementing this, XPlore also elevates Fidelity by +15.0%, indicating that the generated counterfactual examples remain closely aligned with the original input data. This dual improvement signifies that XPlore not only provides explanations that work to change predictions, but also do so with explanations that are intuitively plausible and realistically represent minimal changes to the initial conditions, representing a significant step forward in trustworthy artificial intelligence.

To illuminate the changes driving XPlore’s counterfactual explanations, high-dimensional graph embeddings were generated utilizing Wavelet Characteristic Embeddings. These embeddings capture the complex relationships within the graph structure, but are initially difficult to interpret due to their dimensionality. Consequently, a dimensionality reduction technique, t-distributed Stochastic Neighbor Embedding (t-SNE), was applied to project these embeddings into a two-dimensional space, allowing for visualization. This process reveals discernible structural differences between the original graphs and their corresponding counterfactuals, highlighting precisely which connections were altered to shift the model’s prediction. The resulting visualizations provide a compelling visual confirmation of XPlore’s ability to identify meaningful and minimal changes within the graph that effectively influence the outcome.

A t-SNE projection of Wavelet Characteristic embeddings from the Tree-Cycle dataset reveals a clear distinction between the representations learned by XPlore and RSGG.

The pursuit of explainability in Graph Neural Networks, as demonstrated by XPlore, isn’t about imposing a desired outcome, but rather understanding the delicate balance within the network’s ecosystem. The system doesn’t yield to forceful correction; it responds to gentle guidance. This echoes Andrey Kolmogorov’s sentiment: “The most important thing in science is not to be afraid of making mistakes.” XPlore embraces this through edge additions and feature perturbations – acknowledging that the path to a valid explanation isn’t always direct deletion, but a nuanced exploration of possibilities. A system isn’t a machine to be controlled, it’s a garden-and sometimes, growth requires adding, not subtracting.

What Lies Ahead?

The pursuit of counterfactual explanations in Graph Neural Networks, as exemplified by methods like XPlore, reveals a fundamental tension. Each refinement in explanation quality – fidelity, validity, the illusion of actionability – merely postpones the inevitable encounter with systemic unpredictability. The graph itself isn’t static; it’s a projection of relationships, constantly reshaped by forces the model can’t fully grasp. To focus solely on perturbation-addition or deletion of edges-is to treat the network as a puzzle, rather than an ecosystem.

Future work will undoubtedly explore more sophisticated perturbation strategies, perhaps incorporating dynamic graph evolution or multi-objective optimization. However, the real challenge isn’t achieving perfect explanation, but accepting the inherent limitations of such endeavors. A guarantee of explanation quality is simply a contract with probability. The model doesn’t ‘understand’ causality; it identifies correlation within a frozen snapshot of a complex system.

Ultimately, the field must shift its focus from explaining individual predictions to characterizing the system’s response to intervention. Stability is merely an illusion that caches well. True progress lies not in minimizing explanation error, but in quantifying, and perhaps even embracing, the inherent chaos. The goal isn’t to tame the graph, but to map its fault lines.

Original article: https://arxiv.org/pdf/2603.04209.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Opaque Oracle: When Prediction Outstrips Understanding

Counterfactuals: Rewriting the Narrative of Prediction

XPlore: Charting a Course Through the Space of Possibilities

Measuring the Echo of Change: Validation and Significance

What Lies Ahead?

See also: