Uncovering Hidden Rules: AI Decodes Defects in 2D Materials

Author: Denis Avetisyan

A new approach using deep symbolic regression is revealing the underlying equations that govern how defects interact within atomically thin materials.

Researchers demonstrate that symbolic regression, specifically the SEGVAE algorithm, can discover interpretable defect interaction equations with performance comparable to, and potentially exceeding, that of graph neural networks.

While machine learning excels at prediction, a fundamental challenge remains in extracting physically interpretable insights from complex data. This is addressed in ‘Symbolic regression for defect interactions in 2D materials’, which demonstrates the power of deep symbolic regression-specifically the SEGVAE algorithm-to discover analytical equations governing defect interactions in two-dimensional materials. Achieving comparable, and in some cases superior, performance to state-of-the-art graph neural networks, this approach offers a pathway to both accurate prediction and transparent, generalizable models. Could this technique unlock a new era of physics-informed machine learning capable of accelerating materials discovery and design?

The Algorithmic Impasse: LLMs and the Demand for Logical Transparency

The recent surge in capability with Large Language Models (LLMs) presents a significant paradox: while these systems demonstrate remarkable performance across diverse tasks – from generating creative text formats to translating languages – the mechanisms driving these achievements remain largely inscrutable. This opacity, often referred to as the ‘black box’ problem, stems from the complex, multi-layered neural networks at the heart of LLMs, where intricate patterns of weighted connections govern information processing. Although outputs may be accurate and even impressive, tracing the logical steps – the ‘reasoning’ – that led to a specific conclusion is exceedingly difficult. This isn’t merely a technical hurdle; it represents a fundamental challenge to understanding and trusting these powerful tools, particularly as their deployment expands into increasingly sensitive and consequential domains.

The opacity of large language models presents a significant barrier to widespread implementation, especially within domains demanding rigorous justification. Without insight into the decision-making process, establishing confidence in model outputs proves challenging, potentially undermining applications in fields like healthcare, finance, and criminal justice. Accountability necessitates a clear audit trail – the ability to trace how a conclusion was reached – and the ‘black box’ nature of these systems currently impedes this crucial requirement. Consequently, even highly accurate predictions may be met with skepticism or outright rejection if the reasoning behind them remains concealed, hindering the responsible and ethical integration of LLMs into critical infrastructure.

The increasing reliance on Large Language Models necessitates a shift beyond mere predictive accuracy; the rationale behind a model’s output is becoming critically important. While a correct answer is valuable, especially in tasks like translation or summarization, understanding how that conclusion was reached is paramount for building trust and ensuring responsible deployment. This need for ‘explainability’ isn’t simply about satisfying curiosity; it’s about identifying potential biases embedded within the model, verifying the soundness of its reasoning, and ultimately, preventing unintended consequences, particularly in high-stakes applications such as medical diagnosis or legal assessments. Without insight into the decision-making process, even highly accurate models remain fundamentally untrustworthy, hindering widespread adoption and limiting their potential benefits.

Post-hoc Reasoning: Dissecting the LLM’s Internal Logic

Post-hoc explanation methods constitute a class of techniques designed to analyze pre-trained machine learning models – specifically, large language models (LLMs) – without requiring any modification to the model itself. These methods operate on the model’s outputs, given specific inputs, to approximate the internal reasoning that led to those outputs. The primary goal is to increase transparency and interpretability, allowing developers and researchers to understand why a model made a particular prediction. This contrasts with inherently interpretable models, or methods requiring model retraining, and is particularly crucial for debugging, identifying biases, and building trust in complex LLM systems. The resulting explanations are approximations of the model’s decision-making process, rather than a direct representation of its internal state.

Gradient-based methods determine feature importance by calculating the gradient of the model’s output with respect to the input features; larger gradient magnitudes indicate greater influence on the prediction. Perturbation-based methods, conversely, assess importance by systematically perturbing, or altering, input features and observing the resulting change in the model’s output; significant output variations after perturbation suggest a critical feature. Common perturbation techniques include masking, where a feature is replaced with a neutral value, or replacing it with random noise. Both approaches provide feature attribution scores, enabling the identification of the most salient inputs driving a model’s decision, though they can be susceptible to noise and require careful implementation to avoid misleading results.

Attention mechanisms, integral to the architecture of many Large Language Models (LLMs), provide a means of assessing feature importance by quantifying the weight assigned to each input token during processing. Specifically, these mechanisms generate attention weights – numerical values indicating the degree to which the model focuses on a particular input element when generating an output. Higher attention weights suggest a stronger influence of that input token on the model’s prediction. While not a direct representation of reasoning, analyzing these attention weights can offer insights into which parts of the input sequence the model deems most relevant, facilitating a degree of interpretability and potentially revealing biases or unexpected dependencies within the model’s decision-making process.

Evaluating Logical Fidelity: Faithfulness and Human Plausibility

Faithfulness in the context of explanation evaluation refers to the degree to which an explanation accurately represents the internal logic and reasoning employed by a machine learning model when generating a particular output. This is distinct from simply providing a justification that appears reasonable; a faithful explanation must demonstrably reflect the actual computational steps taken by the model. Assessing faithfulness involves determining if the explanation highlights the features or processes that genuinely influenced the model’s prediction, rather than attributing importance to irrelevant factors or presenting a post-hoc rationalization that does not align with the model’s behavior. Establishing faithfulness is crucial for building trust in AI systems and for debugging or improving model performance, as discrepancies between explanation and actual reasoning indicate potential flaws in the model or the explanation method itself.

Faithfulness in explanation evaluation is refined by considering two distinct but related aspects: intrinsic and extrinsic faithfulness. Intrinsic faithfulness concerns the degree to which an explanation accurately reflects the model’s internal decision-making process; this is typically assessed by examining feature importance scores or activation patterns used during prediction. Extrinsic faithfulness, conversely, evaluates consistency by observing how the explanation changes when the input data is systematically modified – a faithful explanation should exhibit corresponding changes in its highlighted features or rationales. Specifically, if a small perturbation to the input alters the model’s prediction, an extrinsically faithful explanation should also reflect this change, demonstrating a robust relationship between input, prediction, and explanation.

Plausibility in explanations refers to the degree to which a human observer finds the explanation understandable and convincing, independent of its factual accuracy. While faithfulness measures alignment with the model’s internal processes, plausibility addresses human perception and utility. An explanation can be faithful – accurately reflecting the model’s reasoning – yet implausible if it relies on technical jargon or counterintuitive logic, thereby hindering its practical application. Conversely, a highly plausible explanation that doesn’t reflect the model’s actual decision-making process is considered unfaithful. Achieving both faithfulness and plausibility is crucial for building trust and enabling effective human-AI collaboration, particularly in high-stakes domains where interpretability is paramount.

Assessing and Quantifying Explanation Quality: Human and Algorithmic Approaches

Human evaluation is considered the definitive method for assessing the quality of explanations generated by predictive models, specifically focusing on two key attributes: faithfulness and plausibility. Faithfulness refers to the degree to which an explanation accurately reflects the underlying reasoning process of the model; a faithful explanation should not attribute importance to features that did not genuinely contribute to the prediction. Plausibility, conversely, concerns the human understandability and coherence of the explanation; a plausible explanation should align with domain knowledge and be intuitively acceptable to an expert. While subjective, these assessments, typically performed by materials scientists in this context, provide a direct measure of explanation quality that serves as a benchmark for automated evaluation metrics.

Manual evaluation of explanation quality, while considered the gold standard for assessing faithfulness and plausibility, presents significant practical limitations due to its inherent cost and time requirements. The process necessitates expert human annotators to individually review and score explanations, a process that does not scale efficiently with increasing data volume or model complexity. This inefficiency motivates research into automated evaluation methods capable of providing rapid and reproducible assessments. Automated techniques aim to approximate human judgment through algorithmic scoring, thereby reducing the reliance on expensive and time-consuming manual annotation and enabling more frequent and comprehensive model evaluation cycles.

The research presents a symbolic regression approach for predicting crystal properties that attains accuracy levels comparable to current state-of-the-art graph neural networks (GNNs). Specifically, the symbolic regression method achieved a Mean Absolute Error (MAE) on formation energy statistically equivalent to MEGNet, and surpassed the performance of SchNet, GemNet, and CatBoost when evaluated on MAE for HOMO-LUMO gap. A key advantage of this approach is its performance with limited datasets; comparable results were achieved using fewer than 300 crystal structures, whereas GNNs typically require substantially larger datasets for training. Furthermore, the symbolic regression method generates equations that are inherently interpretable, offering a transparent model compared to the often opaque decision-making processes of neural networks.

The symbolic regression approach demonstrated competitive performance against established graph neural networks (GNNs) in predicting material properties. Specifically, the method achieved a Mean Absolute Error (MAE) on predicting formation energy that was comparable to that of MEGNet. Furthermore, the symbolic regression approach outperformed SchNet, GemNet, and CatBoost when evaluated on MAE for HOMO-LUMO gap prediction. These results indicate the method’s capacity to accurately predict key material properties, offering a viable alternative to more complex GNN architectures.

The symbolic regression approach detailed in this work achieved comparable or superior performance to state-of-the-art graph neural networks (GNNs) – including MEGNet, SchNet, GemNet, and CatBoost – in predicting crystal properties using substantially less data. Specifically, results were obtained with fewer than 300 unique crystal structures, whereas GNNs typically require datasets comprising thousands of structures to achieve similar levels of accuracy. This demonstrates a significant advantage in scenarios where data acquisition is expensive or limited, offering a viable alternative to data-intensive machine learning techniques.

The pursuit of demonstrable truth in material science, as highlighted by this work on defect interactions, aligns perfectly with a foundational principle of computation. As Edsger W. Dijkstra stated, “Program testing can be a useful activity, but it can never prove correctness.” This research eschews reliance solely on performance metrics – akin to ‘testing’ – and instead prioritizes the discovery of interpretable equations governing defect behavior. By employing deep symbolic regression, specifically SEGVAE, the study doesn’t merely predict interactions but proves relationships, achieving results comparable to graph neural networks while offering a level of mathematical rigor that is demonstrably superior. This emphasis on provability, rather than empirical observation alone, represents a critical step towards truly understanding and predicting material properties.

What Lies Ahead?

The demonstrated success of deep symbolic regression in uncovering the governing equations of defect interactions in two-dimensional materials should not be mistaken for a solved problem. Rather, it highlights the enduring limitations of purely numerical approaches. While graph neural networks offer a pragmatic, albeit opaque, pathway to prediction, the SEGVAE algorithm offers something rarer: a glimpse beneath the surface. The true test, however, will lie in extending this methodology beyond the carefully curated datasets used here. The consistency of the discovered equations under varied defect configurations and material compositions remains to be rigorously established.

A critical, and often overlooked, aspect is the inherent trade-off between model complexity and interpretability. The equations derived, while concise, may still represent only approximations of the underlying physics. Future work must address the quantification of model uncertainty and the development of techniques for systematically refining these symbolic representations. One wonders if a formalized framework for assessing the ‘elegance’ of an equation-perhaps rooted in information theory or Kolmogorov complexity-could guide the search for truly fundamental descriptions.

Ultimately, the value of this approach resides not merely in its predictive power, but in its potential to reveal previously unknown physical principles. The pursuit of machine learning should not be a quest for ever-more-accurate black boxes, but a concerted effort to articulate the mathematical language of nature. The boundaries of this methodology will be defined by its ability to consistently generate equations that are not only correct, but demonstrably, beautifully, true.

Original article: https://arxiv.org/pdf/2512.20785.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Algorithmic Impasse: LLMs and the Demand for Logical Transparency

Post-hoc Reasoning: Dissecting the LLM’s Internal Logic

Evaluating Logical Fidelity: Faithfulness and Human Plausibility

Assessing and Quantifying Explanation Quality: Human and Algorithmic Approaches

What Lies Ahead?

See also: