Seeing What the Machine Sees: Explainable Edge Detection with Fuzzy Logic

Author: Denis Avetisyan

A novel deep learning architecture combines the power of U-Nets with the interpretability of fuzzy logic to deliver both accurate and understandable edge detection.

Spatially-adaptive mixture-of-experts leverage Sobel edge detection <span class="katex-eq" data-katex-display="false"> \nabla I </span> to dynamically refine feature maps, enabling a nuanced understanding of image structure and localized processing within a neural network. — Spatially-adaptive mixture-of-experts leverage Sobel edge detection $\nabla I$ to dynamically refine feature maps, enabling a nuanced understanding of image structure and localized processing within a neural network.

This research introduces the sMoE U-Net, a hybrid approach leveraging spatial mixture-of-experts and Takagi-Sugeno-Kang fuzzy systems for enhanced edge detection and explainability.

While deep learning excels at edge detection, its “black box” nature hinders trust and verification in critical applications. To address this, we present the ‘Rule-Based Spatial Mixture-of-Experts U-Net for Explainable Edge Detection’, a novel architecture that fuses the power of U-Nets with the interpretability of fuzzy logic. Our approach achieves competitive performance on the BSDS500 benchmark-matching state-of-the-art results while providing pixel-level explanations through visual “Rule Firing Maps” and “Strategy Maps.” Can this hybrid approach unlock a new era of transparent and verifiable AI for image understanding and beyond?

Deconstructing the Illusion: The Limits of Conventional Edge Detection

Conventional edge detection algorithms, while foundational in computer vision, often falter when presented with the intricacies of real-world imagery. These methods, frequently reliant on gradient-based approaches, are highly susceptible to noise and struggle to differentiate genuine edges from irrelevant image details. Consequently, the resulting edge maps are frequently plagued by spurious detections – often termed ‘noise’ – or, conversely, exhibit incomplete boundaries where edges are either missed entirely or fragmented. This is particularly pronounced in images containing complex textures, subtle illumination changes, or occlusions, where the algorithms struggle to maintain consistent and accurate edge delineation. The limitations of these traditional techniques highlight the ongoing need for more robust and adaptable edge detection strategies capable of handling the challenges posed by complex visual scenes.

The fidelity of edge detection profoundly impacts the performance of numerous computer vision applications. Precise edge maps serve as foundational blueprints for image segmentation, enabling the partitioning of an image into meaningful regions, and are equally vital for object recognition, where defining object boundaries is paramount. Consequently, algorithms that falter in accurately delineating these boundaries introduce significant errors in subsequent analyses; a blurred or incomplete edge map can lead to misclassified objects or improperly segmented regions. This demands the development of robust edge detection solutions capable of handling image complexity and noise, ensuring reliable performance across a diverse range of visual data and ultimately unlocking the full potential of advanced image processing systems.

Current methods for evaluating edge detection performance, such as the OIS (Overall Image Similarity) and Average Precision, often fall short in capturing the perceptual quality of detected edges. While these metrics quantify pixel-level agreement with ground truth, they struggle to differentiate between edges that are genuinely perceptually important and those resulting from noise or minor image details. This limitation hinders progress in edge detection research, as improvements in these standard metrics don’t always correlate with human perception of edge map quality. Consequently, there is a growing need for novel evaluation techniques that prioritize the structural correctness and perceptual relevance of detected edges, potentially incorporating psychovisual principles or learned perceptual metrics to better reflect how humans interpret edge information within an image.

Different edge detection methods demonstrate varying abilities to accurately identify boundaries within an image.

Architecting Perception: The sMoE U-Net and Adaptive Expertise

The sMoE U-Net architecture builds upon the established U-Net convolutional neural network by incorporating a Spatially-Adaptive Mixture-of-Experts (sMoE) layer. This integration enhances adaptability by dynamically routing input data to different expert networks within the sMoE layer based on spatial location. The sMoE layer consists of multiple expert networks, each specialized in a particular feature extraction or processing task, and a gating network that determines the contribution of each expert for each input region. This spatially-adaptive routing allows the network to focus computational resources on relevant features and improves performance on complex image analysis tasks by leveraging specialized processing pathways.

The Mixture of Experts (MoE) methodology within the sMoE U-Net architecture divides edge detection into discrete sub-tasks, each handled by a dedicated expert network. This decomposition allows for specialized processing of varying edge characteristics – such as orientation, contrast, and thickness – which would be more difficult for a single, generalized network to achieve. Each sub-task receives the input feature map and produces a localized response, and a gating network dynamically weights the contributions of each expert based on the input data. This targeted approach reduces computational redundancy and improves the network’s capacity to accurately identify and localize edges across diverse image conditions, enhancing overall performance compared to monolithic edge detection systems.

The Adaptive Neuro-Fuzzy Inference System (ANFIS) utilized in the sMoE U-Net represents an evolution of traditional fuzzy logic systems by integrating neural network learning capabilities. Unlike conventional fuzzy systems relying on predefined membership functions and rule bases, ANFIS employs a hybrid learning approach – backpropagation for parameter tuning and least squares for premise parameter estimation. This allows the system to automatically learn optimal fuzzy rules and membership functions directly from training data, improving adaptability and performance. The resulting framework is demonstrably scalable due to its modular structure and ability to handle complex, non-linear relationships inherent in image processing tasks, offering improved robustness compared to static fuzzy systems or purely neural network-based approaches.

The proposed explainable sMoE U-Net utilizes Sobel pre-processing and a TSK fuzzy head within a compact architecture to enhance interpretability and performance.

Revealing the Logic: The TSK Fuzzy Head and Dynamic Assessment

The sMoE U-Net architecture incorporates a Takagi-Sugeno-Kang (TSK) Fuzzy Head as a crucial component for adaptive processing. This Fuzzy Head functions as an initial analysis stage, examining input image characteristics to determine optimal routing within the Mixture of Experts (MoE) layer. Rather than relying on fixed pathways, the TSK Fuzzy Head dynamically assesses each pixel or image region, and its output directly influences which expert within the MoE is engaged for subsequent processing. This allows the network to tailor its computations based on the specific features present in the input data, enhancing both efficiency and accuracy by selectively activating relevant expert networks.

The TSK Fuzzy Head within the sMoE U-Net assesses each pixel based on calculated Edge Strength and Semantic Confidence values to dynamically select an optimal processing path. Edge Strength quantifies the magnitude of gradient changes, indicating the presence of object boundaries, while Semantic Confidence represents the network’s certainty regarding the pixel’s class or object type. These values are combined using fuzzy logic rules to determine which Mixture of Experts branch is best suited to process the pixel, enabling the network to prioritize detailed processing along edges and confident classifications for semantic regions, while potentially reducing computation in areas of low confidence or minimal edge detail.

Gaussian Membership Functions quantify the degree to which a pixel’s characteristics align with predefined fuzzy sets, such as “strong edge” or “high semantic confidence”. These functions assign a value between 0 and 1, where 1 indicates full membership and 0 indicates no membership. The function’s standard deviation (σ) controls the width of the membership function; a smaller σ results in a narrower, more selective membership, while a larger σ broadens the range of pixel values considered to belong to the set. This allows the system to handle variations in image quality and noise, facilitating nuanced decisions about how each pixel should be processed by the Mixture of Experts within the sMoE U-Net.

The proposed Task-Specific Knowledge (TSK) rule-based head leverages a hierarchical structure to process and apply domain-specific knowledge for improved performance.

Validating the Hypothesis: Quantitative Results and Performance Metrics

The sMoE U-Net’s capabilities were rigorously tested using the BSDS500 dataset, a cornerstone benchmark within the field of edge detection. This dataset, comprising 500 images meticulously annotated with ground truth edge maps, allows for standardized and comparable evaluation of different algorithms. By assessing performance on BSDS500, researchers can objectively measure the sMoE U-Net’s ability to accurately identify image boundaries – a crucial task with implications for diverse applications like image segmentation, object recognition, and medical imaging. The dataset’s established reputation ensures the findings are readily contextualized within the broader landscape of computer vision research, facilitating meaningful comparisons to existing methodologies.

Rigorous evaluation of the sMoE U-Net employed the widely accepted ODS F-score as a primary performance indicator, and results demonstrate a substantial advancement over existing edge detection techniques. The model achieved an ODS F-score of 0.7628, representing a measurable improvement in accurately identifying image boundaries. This metric, which balances precision and recall, confirms the sMoE U-Net’s ability to minimize both false positives and false negatives during edge detection, ultimately delivering more refined and reliable results compared to conventional approaches and establishing a new benchmark in the field.

Evaluations on the BSDS500 dataset reveal the sMoE U-Net not only exceeds the performance of a standard U-Net architecture – achieving an ODS F-score of 0.7628 compared to 0.7437 – but also rivals the capabilities of specialized edge detection deep learning models like HED. Critically, the sMoE U-Net demonstrates a markedly improved Average Precision (AP) of 0.7222, significantly outperforming both U-Net (0.6946) and HED (0.7126). This combination of competitive F-scores and superior Average Precision indicates a refined ability to accurately identify edge pixels while minimizing false positives, suggesting a robust and precise edge detection capability.

The sMoE U-Net’s training regimen strategically employs both Binary Cross Entropy and Dice Loss functions to refine its edge detection capabilities. This combined approach doesn’t simply minimize error; it actively shapes the model’s output to maximize overlap with ground truth edge maps, yielding a stabilized Training Loss of 0.35190. Crucially, the close proximity of the Validation Loss – recorded at 0.4254 – to the Training Loss indicates a remarkably narrow generalization gap. This suggests the model isn’t merely memorizing the training data, but rather learning robust features capable of accurately identifying edges in unseen images, a critical factor for real-world applicability and reliable performance.

The sMoE U-Net, as detailed in the article, doesn’t simply find edges; it constructs a system for determining what constitutes an edge, blending the precision of convolutional networks with the transparency of fuzzy logic. This approach echoes Bertrand Russell’s sentiment: “The difficulty lies not so much in developing new ideas as in escaping from old ones.” The network doesn’t adhere to pre-defined edge characteristics but actively models them through its mixture of experts, effectively discarding conventional assumptions. Every parameter adjusted, every expert specialized, becomes a philosophical confession of imperfection – a recognition that even the most robust system is built upon approximations and learned biases, mirroring the inherent limitations of any knowledge framework.

Beyond the Horizon

The sMoE U-Net represents an exploit of comprehension – a successful disassembly of the edge detection ‘black box’. However, the very act of illumination reveals further shadows. The current architecture, while offering visual justification through fuzzy rule activation, still relies on the inherent limitations of convolutional kernels. A truly robust system would not merely indicate what features triggered a detection, but demonstrate why those features are, in a fundamental sense, edges. The next iteration demands a move beyond pattern recognition toward a model capable of inferring geometric primitives directly from pixel data.

Furthermore, the reliance on handcrafted fuzzy sets, however interpretable, introduces a subtle form of bias. The system’s ‘understanding’ of an edge is ultimately defined by the parameters chosen a priori. A worthwhile challenge lies in allowing the network to dynamically construct its own fuzzy logic, adapting its definitions of ‘edge-ness’ through unsupervised learning. This would require a shift from rule-based systems to models that generate rules, effectively reversing the engineering process.

Ultimately, the pursuit of explainable AI isn’t about creating algorithms that appear intelligent; it’s about reverse-engineering intelligence itself. This sMoE U-Net is a step toward that goal, but the most interesting problems – and the most fruitful exploits – always lie just beyond the current state of understanding. The true test won’t be accuracy, but the capacity to be demonstrably, elegantly, wrong.

Original article: https://arxiv.org/pdf/2602.05100.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

Deconstructing the Illusion: The Limits of Conventional Edge Detection

Architecting Perception: The sMoE U-Net and Adaptive Expertise

Revealing the Logic: The TSK Fuzzy Head and Dynamic Assessment

Validating the Hypothesis: Quantitative Results and Performance Metrics

Beyond the Horizon

See also: