Beyond ‘Why’: Crafting AI Explanations People Actually Understand

Author: Denis Avetisyan

New research reveals that simply explaining how an AI made a decision isn’t enough – effective explanations need to provide supporting details and hidden reasoning.

The study reveals a user’s preference leans toward plainly stated, semi-factual information-whether positive (loan accepted) or negative (loan rejected)-over elaborately constructed semi-factuals, suggesting a bias for directness even when presented with potentially misleading scenarios.

This paper introduces a novel method, ISF, for generating informative semi-factual explanations by identifying hidden feature contributions, enhancing AI interpretability and trust.

While explainable AI (XAI) has increasingly adopted semi-factual explanations-highlighting how outcomes remain consistent despite altered inputs-these explanations often lack crucial contextual information. The work ‘Informative Semi-Factuals for XAI: The Elaborated Explanations that People Prefer’ addresses this gap by introducing a novel method, ISF, which enriches semi-factuals with contributions from previously hidden features influencing automated decisions. Experimental results demonstrate that ISF generates explanations that are both informative and preferred by users over simpler alternatives. Could this approach to elaborated semi-factuals represent a key step towards building more transparent and trustworthy AI systems?

The Illusion of Explanation: Beyond Simple Counterfactuals

Many explanation methods rely on counterfactuals – identifying the smallest change to an input that would flip a model’s prediction. However, these explanations frequently suggest implausible alterations to input features, undermining their practical value and eroding user trust. For example, an algorithm denying a loan might suggest a candidate increase their income by an unrealistic amount, or claim a different ethnicity would have secured approval. Such recommendations aren’t actionable; they don’t address why the decision occurred, only what fantastical change would alter the outcome. This disconnect hinders effective debugging of models, limits the ability to address systemic biases, and ultimately prevents users from confidently acting on AI-driven insights, highlighting a critical need for explanation techniques grounded in real-world feasibility.

Rather than suggesting wholesale changes to an input, semi-factual explanations refine the approach to AI interpretability by strategically altering only the most influential features – those demonstrably linked to the model’s decision. This method acknowledges the interconnectedness of data; a complete overhaul of inputs often yields unrealistic scenarios and diminishes trust in the explanation itself. By preserving the plausibility of the modified input – keeping the majority of features intact – semi-factual explanations provide insights that are both understandable and actionable. This nuanced approach allows users to grasp why a particular outcome occurred without being presented with a hypothetical situation that could never realistically arise, fostering a greater sense of confidence in the AI system’s reasoning and promoting effective human-AI collaboration.

The development of truly reliable artificial intelligence necessitates a paradigm shift – moving beyond simply what a system decides, to a deep comprehension of how those decisions are reached. A system’s output, regardless of its accuracy, remains insufficient without accompanying insight into its reasoning process; this isn’t merely about transparency, but about fostering trust and enabling effective intervention. When AI systems can articulate the factors driving their conclusions, users are empowered to validate those conclusions, identify potential biases, and ultimately, collaborate more effectively with the technology. This emphasis on explainability isn’t a luxury, but a fundamental requirement for deploying AI responsibly in critical domains, ensuring accountability and paving the way for genuinely human-centered artificial intelligence.

Analysis of 38,233 generated samples across five datasets revealed that both the ISF and Ensemble methods exhibit a seesaw pattern in the relationship between key and hidden feature contributions, indicating a potential reliance on spurious correlations.

Hidden Influences: Unveiling the True Drivers of Prediction

Hidden features represent input attributes that influence a model’s prediction but are not explicitly modified during the generation of a semi-factual explanation. These features retain their original values while still contributing to the outcome, indicating that model decision-making is often a complex interplay of multiple inputs, not solely a direct response to altered features. Their presence highlights the limitations of attributing causality solely to the changed features and necessitates consideration of the broader feature space to accurately understand the model’s reasoning. Identifying and acknowledging these hidden features is crucial for developing more comprehensive and truthful explanations, particularly in scenarios where a complete understanding of the model’s behavior is required.

Informative semi-factuals extend beyond simply highlighting the features directly modified to explain a prediction; they integrate consideration of hidden features – those variables influencing the model output without being explicitly altered in the explanation. This integration provides a more comprehensive understanding of the decision-making process by acknowledging the complex interplay of factors contributing to the outcome. Consequently, these explanations move beyond isolating the immediate causes and offer a richer, more nuanced representation of the model’s reasoning, improving interpretability and potentially revealing previously unknown relationships within the data.

Model predictions are rarely determined by only the features explicitly manipulated in a semi-factual explanation; numerous underlying factors contribute to the final outcome. Recognizing this complexity is crucial for building trustworthy AI systems, as it avoids presenting a simplified and potentially misleading account of the decision-making process. By acknowledging the influence of these unobserved or unchanged features, explanations become more honest representations of the model’s behavior, fostering greater user confidence and enabling more accurate debugging and refinement of the system.

Perturbing the loan amount from $20k to $65k while holding credit score constant reveals a seesaw effect where the loan amount’s marginal contribution to loan acceptance decreases <span class="katex-eq" data-katex-display="false"> ext{as}</span> the credit score’s contribution increases, demonstrating a trade-off between these features. — Perturbing the loan amount from $20k to $65k while holding credit score constant reveals a seesaw effect where the loan amount’s marginal contribution to loan acceptance decreases $ext{as}$ the credit score’s contribution increases, demonstrating a trade-off between these features.

ISF: A Method for Unearthing Informative Semi-factuals

The ISF method utilizes ‘Marginal Contribution’ as a feature selection technique by quantifying the change in model prediction resulting from altering individual feature values. This allows identification of features that, while not directly modified to generate a semi-factual explanation, exert significant influence on the model’s output. Specifically, the method calculates the contribution of each feature to the prediction difference between the original instance and its semi-factual counterpart; a larger absolute marginal contribution indicates a greater influence, even if the feature’s value remained constant. This is crucial for generating explanations that are not only plausible but also highlight the underlying causal factors driving the model’s decision, improving the overall informativeness of the semi-factual.

The ISF method utilizes the Non-dominated Sorting Genetic Algorithm II (NSGA-II) for multi-objective optimization during semi-factual explanation generation. NSGA-II concurrently optimizes for both plausibility and informativeness by treating these characteristics as conflicting objectives. The algorithm iteratively refines a population of candidate explanations, selecting changes to input features and identifying relevant hidden features. This process prioritizes solutions that minimize the disruption to the original prediction (plausibility) while maximizing the change in prediction attributable to the modified features (informativeness). The resulting optimized feature sets contribute to the generation of semi-factual explanations that are both realistic and insightful.

Evaluations demonstrate the ISF method achieves an 89% success rate in generating semi-factual explanations characterized by a ‘seesaw pattern’ of feature contributions; this pattern indicates a clear and alternating influence of features on the model’s prediction. This performance metric signifies a substantial improvement in both the informativeness and overall quality of generated explanations when compared to currently established state-of-the-art techniques for semi-factual explanation generation. The success rate is determined by the frequency with which generated explanations accurately reflect this alternating contribution pattern, validating the method’s ability to highlight key influencing factors.

Across five datasets and four evaluation metrics, the ISF and Ensemble methods produced semi-factuals with comparable goodness scores, as demonstrated by analysis of <span class="katex-eq" data-katex-display="false">N=38,233</span> instances in Experiment 2. — Across five datasets and four evaluation metrics, the ISF and Ensemble methods produced semi-factuals with comparable goodness scores, as demonstrated by analysis of $N=38,233$ instances in Experiment 2.

The Human Factor: Demonstrating the Value of Informed Explanation

User Study 1 demonstrated a clear preference for informative semi-factual explanations generated by the ISF model when compared to standard semi-factuals. Participants consistently indicated that explanations detailing the contributions of hidden features were more helpful in understanding decision-making processes. This preference was notably strong in both loan acceptance and rejection scenarios, suggesting a broad applicability of the ISF approach to enhancing explanation quality. The study highlights the value of moving beyond simple counterfactual reasoning to incorporate feature contributions, ultimately providing users with a more nuanced and actionable understanding of complex model outputs.

A user study examining the preference for different explanation types revealed a strong inclination towards informative semi-factuals. In scenarios simulating loan acceptance, 69% of participants consistently selected the informative explanations – those detailing feature contributions – over standard, bare semi-factuals. This preference was even more pronounced in loan rejection scenarios, where 80% of users favored the informative explanations. These results demonstrate a clear user benefit derived from understanding why a decision was made, beyond simply being told that a decision occurred, highlighting the importance of transparency in algorithmic explanations.

The enhanced utility of informative semi-factual explanations stems from their ability to reveal previously obscured factors influencing model decisions. Research indicates that simply presenting a ‘what if’ scenario – a bare semi-factual – is less valuable than detailing how specific feature contributions led to a particular outcome. By incorporating these hidden influences, explanations move beyond merely identifying a change and instead illuminate the underlying reasoning, increasing user preference – demonstrated by a significant margin in loan acceptance and rejection scenarios. This suggests that transparency regarding feature importance is not merely desirable, but fundamentally improves the quality and actionability of explanations, offering users a more comprehensive understanding of complex model behavior and fostering greater trust in automated systems.

Despite being approved for a loan, Mark is informed that his credit score of 550 allows him to borrow up to $65k, revealing the loan approval boundary that previously led him to believe requesting a higher amount would result in rejection like the denied applicant, Mary.

Towards a Synthesis: The Future of Semi-factual Explanation

A growing toolkit of techniques addresses the challenge of semi-factual explanation, offering varied approaches to understanding how input changes might alter a model’s prediction. Methods such as KLEOR, which focuses on identifying minimal changes to key features, coexist with approaches like the Local Region Model, that explore perturbations within a defined neighborhood of the original input. Further diversifying the landscape are generative models like C2C-VAE and techniques emphasizing disentanglement, such as DSER and PIECE. Even probabilistic approaches, exemplified by MDN, and counterfactual diversification, as seen in DiCE, contribute to the breadth of available strategies, each with unique strengths in navigating the complex input space and revealing plausible alternative scenarios that lead to different outcomes.

Semi-factual explanations, while aiming to reveal ‘what if’ scenarios, are generated through a variety of computational strategies, each with inherent trade-offs. Some methods, like those focused on local region modeling, excel at providing minimally disruptive changes to the input, offering explanations easily understood by humans but potentially missing more substantial, yet plausible, alterations. Conversely, approaches employing generative models can explore a wider range of possibilities, potentially identifying more insightful counterfactuals, but risk generating unrealistic or irrelevant changes. Techniques such as those based on variational autoencoders prioritize fluency and coherence in the generated explanations, while others prioritize fidelity to the original prediction. This diversity in approach means no single method universally outperforms others; the optimal technique depends heavily on the specific application, the nature of the data, and the desired characteristics of the explanation itself.

The field of semi-factual explanation stands to benefit significantly from concerted efforts to synthesize existing methodologies. Currently, techniques like KLEOR, C2C-VAE, and DiCE operate largely in isolation, each with its own strengths and limitations in navigating the input space to generate plausible counterfactuals. Future investigations should prioritize the development of hybrid approaches that leverage the complementary advantages of these diverse methods, potentially through ensemble techniques or modular frameworks. Crucially, progress hinges on establishing standardized evaluation metrics; a common benchmark for assessing the quality, plausibility, and actionability of semi-factual explanations is needed to facilitate meaningful comparisons and drive innovation beyond isolated performance gains on specific datasets. Such standardization will not only accelerate research but also foster greater trust and adoption of these powerful tools in real-world applications.

The pursuit of AI interpretability, as demonstrated by this research into informative semi-factual explanations, isn’t about constructing a perfectly transparent machine, but nurturing a resilient ecosystem of understanding. The study reveals that simply highlighting feature contributions isn’t enough; true explanation lies in identifying supporting, often hidden, elements. This echoes Kolmogorov’s sentiment: “The most interesting mathematical problems are those which seem at first impossible, but are in fact only difficult.” Similarly, crafting explanations that truly resonate with human understanding requires navigating complexity, acknowledging that clarity isn’t a pre-defined state, but emerges from carefully considered connections and a willingness to explore the subtle dependencies within a system.

What Lies Ahead?

The pursuit of “informative” explanations, as demonstrated by this work, merely postpones the inevitable. The system identifies supporting features – hidden levers to justify a decision – but does not address the fundamental truth that any model is a simplification, a deliberate blindness to complexity. Long-term stability, born of such justifications, is not a sign of robustness, but the quiet accumulation of undetected failure modes. The architecture itself propagates a prophecy of eventual divergence between model and reality.

Future efforts will undoubtedly focus on refining the optimization process, seeking explanations that feel better, that align with human cognitive biases. This is a distraction. The real challenge lies not in making AI decisions more palatable, but in acknowledging their inherent fragility. A truly robust system wouldn’t need justification; it would operate within well-defined limits, gracefully degrading rather than offering post-hoc rationalizations.

The exploration of semi-factuals offers a glimpse into the necessary direction: away from monolithic explanations and toward localized, conditional reasoning. However, this approach risks building ever-more-complex webs of “what-ifs”, each a potential point of systemic failure. The ecosystem, once seeded, will evolve in unexpected ways. The task is not to control it, but to observe its trajectory with a clear-eyed understanding of its inevitable imperfections.

Original article: https://arxiv.org/pdf/2603.17534.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/