Balancing the Scan: AI Tackles COVID-19 Diagnosis with Limited Data

Author: Denis Avetisyan

A new approach combines generative AI and intelligent optimization to improve the accuracy of COVID-19 detection from chest X-rays, even when positive cases are rare.

Progressive generation, implemented via ProGAN, demonstrably constructs increasingly detailed synthetic images-specifically of the covid-19 class-through sequential stages, beginning with <span class="katex-eq" data-katex-display="false">7 \times 7</span> pixel representations and culminating in high-resolution outputs at <span class="katex-eq" data-katex-display="false">224 \times 224</span> pixels, thereby illustrating a scalable approach to image synthesis from a latent space. — Progressive generation, implemented via ProGAN, demonstrably constructs increasingly detailed synthetic images-specifically of the covid-19 class-through sequential stages, beginning with $7 \times 7$ pixel representations and culminating in high-resolution outputs at $224 \times 224$ pixels, thereby illustrating a scalable approach to image synthesis from a latent space.

This study utilizes ProGAN-based data augmentation and a Slime Mould Algorithm-optimized ResNet50V2 to enhance medical image classification performance on imbalanced datasets.

Despite advances in medical imaging, accurate classification remains challenged by imbalanced datasets-a common issue, particularly during pandemics. This is addressed in ‘Medical Image Classification on Imbalanced Data Using ProGAN and SMA-Optimized ResNet: Application to COVID-19’, which proposes a novel approach to augment limited data using progressive generative adversarial networks and optimize a ResNet classifier via a slime mould algorithm. Experimental results demonstrate substantially improved performance-achieving up to 98.5% accuracy on imbalanced COVID-19 chest X-ray data-suggesting a robust solution for pandemic-related diagnostic challenges. Could this methodology be generalized to address data scarcity in other critical medical imaging applications?

The Inherent Bias of Imbalanced Data

The potential of chest X-ray (CXR) imaging for accurate and timely diagnosis is well established, yet a significant challenge frequently undermines its effectiveness: substantial class imbalance within training datasets. These datasets, crucial for developing diagnostic algorithms, often contain a disproportionately large number of images representing common conditions – such as healthy lungs or typical pneumonia – while images depicting rarer, but potentially life-threatening, pathologies are comparatively scarce. This skewed distribution biases standard machine learning models, causing them to prioritize the more frequent classes and consequently struggle to accurately identify the less common, critical conditions. The result is a system that performs well overall, but falters when faced with the very cases where accurate diagnosis is most vital, highlighting a critical need for strategies to mitigate the effects of imbalanced data in CXR analysis.

The inherent class imbalance present in many chest X-ray datasets introduces a significant bias into standard classification algorithms. These algorithms are typically designed assuming a roughly equal distribution of classes, and consequently, they prioritize maximizing overall accuracy – often at the expense of correctly identifying rarer conditions. Consequently, the model becomes proficient at recognizing the prevalent conditions while exhibiting poor generalization to those less frequently represented in the training data. This skewed performance isn’t simply a matter of lower precision for rare diseases; it fundamentally compromises the reliability of the diagnostic tool, potentially leading to missed diagnoses and inequitable healthcare outcomes as the algorithm systematically underperforms on the very conditions where accurate identification is most critical.

Conventional machine learning algorithms often falter when tasked with identifying infrequent yet clinically significant conditions in chest X-rays. These methods typically require a substantial number of examples to accurately discern patterns, and when confronted with rare pathologies – such as atypical viral pneumonias or specific manifestations of COVID-19 – their performance degrades considerably. The limited availability of images depicting these conditions results in models that are biased towards more common findings, effectively minimizing their ability to generalize and accurately diagnose less prevalent, but potentially life-threatening, diseases. Consequently, diagnostic tools trained on imbalanced datasets may exhibit high accuracy overall, but demonstrate unacceptably low sensitivity for the very conditions where timely and precise identification is most critical.

The development of trustworthy diagnostic tools reliant on chest X-ray analysis fundamentally depends on overcoming the challenge of data scarcity. When datasets disproportionately represent common conditions while underrepresenting rarer, yet potentially severe, pathologies, diagnostic algorithms exhibit significant performance biases. This imbalance doesn’t merely affect overall accuracy; it introduces the risk of inequitable healthcare, as the system is less capable of correctly identifying conditions in the minority. Consequently, focused efforts to augment datasets with examples of these underrepresented diseases – through techniques like data augmentation, synthetic data generation, or focused data collection initiatives – are not simply about improving performance metrics, but about ensuring that diagnostic capabilities are robust and accessible to all patients, regardless of the prevalence of their condition.

Cross-validation on a real-world imbalanced dataset demonstrates the classifier's performance, as visualized by the average confusion matrix (left) and its normalized counterpart (right). — Cross-validation on a real-world imbalanced dataset demonstrates the classifier’s performance, as visualized by the average confusion matrix (left) and its normalized counterpart (right).

Synthetic Data Generation: A Principled Approach

A Progressive Generative Adversarial Network (ProGAN) was utilized to create synthetic chest X-ray (CXR) images representing each class within the dataset. This technique involved training a generative model to learn the underlying distribution of real CXR images and subsequently sample new images from that distribution. The generated images were then added to the original training dataset, a process known as data augmentation. This augmentation strategy aimed to increase the diversity and size of the training data, particularly for classes with limited examples, thereby improving the robustness and generalization ability of the classification model.

The ProGAN utilized a Wasserstein Loss function, also known as Earth Mover’s Distance, during training to address common challenges in Generative Adversarial Network (GAN) training. Traditional GAN loss functions can suffer from vanishing gradients and mode collapse, where the generator produces a limited variety of samples. Wasserstein Loss provides a more stable gradient signal, particularly when the generator and discriminator distributions have minimal overlap, thus facilitating more robust training and improved sample diversity. This loss function calculates the distance between the generated and real data distributions, enabling the generator to learn more effectively and produce higher-quality synthetic CXR images by minimizing this distance.

The Synthetic Image Injection Ratio (SIIR) controlled the proportion of synthetically generated images added to the training dataset. Systematic experimentation with varying SIIR values was conducted to determine the optimal balance between real and synthetic data. Results indicated that a SIIR of 20%, representing the inclusion of synthetic images equivalent to 20% of the original dataset size, yielded the best performance in balancing class distribution and improving classifier accuracy. Values significantly higher or lower than 20% demonstrated diminished returns and, in some cases, negatively impacted model generalization capabilities.

The generation of synthetic chest X-ray (CXR) images using ProGAN was specifically implemented to address class imbalance within the training dataset. Certain conditions were under-represented, potentially limiting the classifier’s ability to accurately identify them. By creating synthetic images of these minority classes, the dataset was augmented to provide a more equal distribution of examples for each condition. This balanced dataset aims to improve the classifier’s performance on all conditions, particularly those with limited original data, by reducing bias towards the more prevalent classes and ensuring sufficient training examples for less frequent, but clinically important, diagnoses.

Cross-validation on a 2048-image dataset used to train ProGANs yields the average confusion matrix (left) and its normalized counterpart (right), demonstrating classification performance.

ResNet50V2: Leveraging Transfer Learning and Optimization

ResNet50V2 was chosen as the primary classification model due to its established performance in image recognition tasks and availability as a pre-trained model on the ImageNet dataset. Utilizing a pre-trained network facilitated transfer learning, reducing the training time and data requirements compared to training a model from scratch. ResNet50V2’s architecture, consisting of 50 layers with residual connections, allows for the training of deeper networks without the vanishing gradient problem, which is crucial for complex image analysis like chest X-ray (CXR) interpretation. The pre-trained weights provide a strong feature extraction foundation, enabling the model to generalize effectively to the CXR image dataset.

Hyperparameter optimization was conducted utilizing the Sequential Meta-heuristic Algorithm (SMA) to identify the configuration yielding peak performance from the ResNet50V2 network. The SMA systematically adjusted network hyperparameters and evaluated resulting performance metrics. This process determined an optimal learning rate of 7.26e-5, representing the value that minimized the loss function and maximized classification accuracy on the validation dataset. Other hyperparameters were also tuned during this process, contributing to overall network optimization and improved performance.

The dataset used for training and evaluating the ResNet50V2 model comprised both real and synthetic chest X-ray (CXR) images. Real images were sourced from publicly available datasets and clinical collaborations, while synthetic images were generated using data augmentation techniques and generative adversarial networks (GANs) to increase dataset size and variability. This combined approach addressed potential data scarcity issues and aimed to improve the model’s robustness to variations in image quality, patient positioning, and pathology presentation, ultimately enhancing its generalizability to unseen clinical data.

The model’s performance was assessed using 10-fold cross-validation, a technique where the dataset is partitioned into ten equally sized subsets. The model was trained on nine of these subsets and evaluated on the remaining subset, with this process repeated ten times, each time using a different subset for evaluation. This methodology provides a more reliable estimate of the model’s generalization ability than a single train/test split. The 10-fold cross-validation process yielded a classification accuracy of 94%, indicating a high degree of robustness and the model’s ability to perform accurately on unseen data.

Cross-validation of an optimized ResNet50V2 model on a real-world imbalanced dataset demonstrates performance in a four-class classification task, as visualized by the average confusion matrix (left) and its normalized representation (right).

Demonstrable Gains in Diagnostic Robustness

The diagnostic system’s ability to accurately identify various conditions saw notable gains through the incorporation of synthetically generated images. Utilizing the ProGAN architecture, the study created additional visual data, specifically targeting classes that were initially scarce in the training dataset. This strategic augmentation effectively balanced the representation of each condition, leading to a demonstrable 3.53% improvement in overall classification accuracy. Importantly, the benefits were not uniform; under-represented classes experienced the most significant gains, suggesting that synthetic data is particularly effective in mitigating biases stemming from imbalanced datasets and bolstering the reliability of the diagnostic tool across all conditions.

A significant challenge in medical image analysis lies in inherent class imbalances – certain conditions are far less prevalent than others, leading diagnostic systems to exhibit bias and reduced reliability. This study directly tackled this issue, recognizing that a disproportionate representation of common conditions could overshadow the accurate identification of rarer, but equally important, diseases. By specifically addressing this imbalance through techniques like balanced weighted loss functions and synthetic data generation, the diagnostic system demonstrated a marked improvement in its ability to correctly identify all conditions, regardless of their frequency. This mitigation of bias not only enhances the system’s overall accuracy, but also fosters greater trust and dependability in its clinical application, ensuring that less common conditions receive the same level of diagnostic attention as more prevalent ones.

Image clarity is paramount in diagnostic systems, and researchers found that a two-stage pre-processing technique significantly enhanced the quality of medical images before classification. This involved first applying Singular Value Decomposition (SVD) to reduce noise and dimensionality, followed by Contrast Limited Adaptive Histogram Equalization (CLAHE) to improve contrast and reveal subtle details often obscured in the original scans. The combined approach effectively normalized image characteristics, making it easier for the classification algorithms to discern important features and ultimately leading to improved diagnostic performance. This careful attention to image pre-processing demonstrates a commitment to maximizing the reliability and accuracy of the system, even before the core classification algorithms are engaged.

During the training of the diagnostic system, a balanced weighted categorical cross-entropy loss function proved instrumental in rectifying the inherent class imbalance within the dataset. This specialized loss function assigns differing penalties to misclassifications, giving greater weight to under-represented classes and, consequently, encouraging the model to learn more robust features for those categories. By effectively modulating the impact of each class on the overall loss, the function prevents the model from becoming overly biased towards the more prevalent classes, fostering a more equitable learning process and ultimately improving diagnostic accuracy across all represented conditions. The result is a system less prone to overlooking rarer, but potentially critical, cases.

During the first stage of training on the Covid-19 class, the generator and critic losses decreased with each iteration, indicating successful adversarial learning.

The pursuit of robust medical image classification, as demonstrated by this research, necessitates a commitment to verifiable results. Indeed, David Marr observed, “A sufficiently detailed and well-understood computational theory is essential.” This aligns directly with the work presented; the methodology isn’t simply about achieving higher accuracy on imbalanced COVID-19 datasets using ProGAN data augmentation and SMA-optimized ResNet50V2. It’s about establishing a reproducible and theoretically sound process. The augmentation strategy, combined with hyperparameter optimization, strives for deterministic outcomes – a system where the classification results are not left to chance, but are grounded in a provable computational framework, ensuring reliability and trustworthiness in critical diagnostic applications.

What Lies Ahead?

The pursuit of robustness in medical image classification, particularly with imbalanced datasets, reveals a fundamental truth: synthetic data, while superficially improving metrics, merely addresses a symptom, not the disease. The augmentation via ProGAN, though demonstrably effective in this instance, introduces a reliance on the generative model’s fidelity-a fidelity that remains, at its core, an approximation. Future work must confront the inherent limitations of such reconstructions, striving for augmentation techniques grounded in provable transformations rather than learned mimicry.

The application of the Slime Mould Algorithm to hyperparameter optimization, while yielding performance gains, highlights a persistent inefficiency. The search for optimal parameters, even with bio-inspired heuristics, remains a computationally expensive undertaking. The elegance of a solution does not reside in the cleverness of its search, but in the minimization of the search space itself. A more principled approach would involve the development of architectures inherently less sensitive to hyperparameter tuning-structures built upon mathematical foundations, not empirical experimentation.

Ultimately, the true advancement lies not in achieving incrementally higher accuracy on benchmark datasets, but in establishing a framework for verifiable certainty. The consistency of boundaries, the predictability of outcomes – these are the hallmarks of a truly elegant and reliable system. Only through such rigor can medical image analysis transition from an artful practice to a demonstrably sound science.

Original article: https://arxiv.org/pdf/2512.24214.pdf

Contact the author: https://www.linkedin.com/in/avetisyan/

The Inherent Bias of Imbalanced Data

Synthetic Data Generation: A Principled Approach

ResNet50V2: Leveraging Transfer Learning and Optimization

Demonstrable Gains in Diagnostic Robustness

What Lies Ahead?

See also: