Background: Diabetic Retinopathy (DR) is one of the serious complications of diabetes, which can be diagnosed early to prevent vision loss and blindness. Deep learning has progressed rapidly over the past few years, leading to effective image synthesis methods for many medical imaging applications. In the study, we propose an Efficient Deep Convolutional Generative Adversarial Network (EDC-GAN) framework for predicting and generating high-resolution diabetic retinopathy fundus images. Through an attention-based feature extraction method and an improved conditional GAN structure, the proposed model directly synthesizes high-fidelity, diverse, and realistic retinal images. We also employ a hybrid loss function composed of perceptual loss and adversarial loss to improve the quality of synthetic images even more. Large-scale contrastive learning with an instance noise module is also proposed to retain fine-grained details in the synthesized images, as well as a new component, Feature-Preserving Adaptive Normalization (FPAN), in the model. We train the model on publicly available DR datasets and validate it using several evaluation metrics such as Freshet Inception Distance (FID), Structural Similarity Index (SSIM), and Peak Signal-to-Noise Ratio (PSNR). Experimental results show that EDC-GAN outperforms existing image synthesis methods in terms of generating higher quality and more realistic synthetic fundus images and improving the modeling performance in terms of predictive power. Therefore, the proposed model presents a very effective approach to augment a scarce DR dataset and to assure robust computer-aided diagnosis (CAD) for DR detection.
Diabetic retinopathy (DR) is one of the commonest microvascular complications of diabetes and remains one of the leading causes of blindness worldwide [1]. Then, the early detection and proper grading of DR is crucial for successful treatment and prevent permanent vision loss. Fundus photography is the manual inspection by trained ophthalmologists, a cumbersome process that is subject to variability [2]. The role of artificial intelligence and deep learning has shown great promise in building an automated diagnostic system with accuracy and efficiency. Among various deep learning techniques, Generative Adversarial Network (GAN) has shown promise in synthesizing and augmenting medical images and improving the detection and grading of DR [3]. The need for such high-quality synthetic images in medical applications continues to drive adoption of generative models, particularly GANs, where they can be used to train high-quality machine-learning models [4]. DR-GAN model [7] In the context of medical image generation, research work in [5] has proposed a conditional GAN structure aimed primarily at fine-grained lesion synthesis on diabetic retinopathy images, which resulted in improved both quality and diversity of synthetic datasets. Similarly, [6] proposed a multichannel GAN with semi supervised learning for automated DR diagnosis that reached high sensitivity and specificity measures on DR stage detection. Recent studies have evaluated the quality and diversity of images synthesized using Deep Convolutional GANs (DCGANs) for diabetic retinopathy applications. [7] analysed the generative capabilities of DCGANs and highlighted their potential in addressing data scarcity issues in medical imaging. [8] explored the role of GANs in DR grading, demonstrating how synthetic images can be utilized to enhance classification performance. Despite the significant progress, challenges remain in generating high-fidelity medical images that accurately capture disease-specific features. [9] developed an open-source repository showcasing GAN-driven DR image synthesis, contributing to the reproducibility of research in this domain. [10] further validated the effectiveness of GAN-based synthetic imagery in medical applications, emphasizing the need for diverse datasets to improve model generalization. The introduction of DR-GAN by Zhou et al. [11] marked a breakthrough in conditional image synthesis for diabetic retinopathy. By leveraging lesion-aware generative mechanisms, DR-GAN enables fine-grained lesion synthesis, aiding in data augmentation for deep learning-based diagnostic models. Additionally, advancements in GAN architectures, such as Wasserstein GANs and CycleGANs, continue to push the boundaries of synthetic data generation, improving the reliability of AI-driven medical diagnostics [12].
In this paper, we present a comprehensive analysis of state-of-the-art GAN-based methodologies for diabetic retinopathy image synthesis and classification. By examining recent literature and experimental results, we aim to highlight the advantages, limitations, and future directions of generative models in medical image processing. Our study contributes to the growing field of AI-powered healthcare by providing insights into how GANs can enhance the accuracy and robustness of DR diagnosis.
The paper sections are structured below: The related work briefly reviewed in section 2. The methods and materials used in the paper mentioned in section 3, comparisons and
results obtained in section 4. The conclusion of the work showed in section 5.
In the year 2014, Ian Goodfellow, along with his teammates presented GANs. He is considered as the inventor of the GANs. In GANs, two types of neural networks are used: Discriminator (D) and a Generator (G). These 2 networks are deep neural networks. The function of Generator is to take the input that is arbitrary noise to produce data samples exactly like real samples in the original dataset and a discriminator which differentiates between the original data and the produced data, showed depicted in the figure
GANs are the model which creates realistic like objects which are hard to distinguish from the current real objects. GANs produce new samples from the same distribution after capturing the distribution of training data. These are multipurpose models which have two elements i.e., two different neural network models which are classified as a discriminator and a generator. The objective of the function generator is to create fake samples. On the other hand, the discriminator function is responsible for differentiating the realistic example from the fake one produced from the function generator. Supposedly, one can consider the function generator as a forger who is responsible for the fake samples to examine like possibly real samples and the discriminator function likes an investigator which attempts to distinguish the real samples from the fake ones. During the process of training, we see a growth in the generator function and produces better artificial examples. On the other hand, the discriminator learns likes an investigator and correctly differentiate the real and fake sample data. Then, the model learns the probability of sample is fake or real. This probability can be taken from discriminator and then the generator function can be benefited to produce good samples in time.
2.1. Efficient Deep Convolutional-Generative Adversarial Network (EDC-GAN)
Radford et al. [10] proposed DC-GAC which is an addition to the real GAN [9], except that the generator and discriminator networks explicitly the use of convolutional and convolutional- transpose layers. In Figure 2, shows the architecture of the detailed DC-GAN. Section 3.1 and 3.2 shows the components of generator and discriminator. In the generator network training phase, the network takes the random input, and next the batch normalization, these layers assist with healthy gradient flow [10]. The Discriminator network takes an input image and gives the output as scalar probability that the input image is real or fake. Discriminator network using stride convolutional layers instead of pooling to down sample. This helps network to learn its own pooling feature, anew presence of batch normalization layer and Leaky ReLU activation function clear the way for healthy gradient flow during training [10].
However, plenty of work has been done on tasks of retinal image analysis but still two severe issues are unexplored called publicly available less annotated retina image data and retina image quality enhancement. The work contribution is to investigate if DC-GAN i.e., generative modelling helps to create artificial retinal image dataset like clinical dataset and this artificial dataset can helps to train the classification model. Secondly, developing artificial retinal image dataset.
Deep Learning has seen good work on different fields like detection and identification of eye diseases like Glaucoma, cataract and ARMD. In our proposed methodology showed in figure 3, we used the DCGAN architecture to generate quality images which significantly improves the accuracy in resnet50 classification method. Figure 3 showing the methodology of proposed work. This method has four steps, Data Augmentation using DCGAN and traditional techniques, pre-processing, resnet50 classification.
We used retinal image data obtained by Kaggle [23] in the experiment. The dataset was taken from Kaggle APTOS Blindness in to two folders called train and test. Each folder contained subfolders for each class of images (Normal-DR, Mild-DR, Moderate-DR, Severe-DR, Proliferative DR). The train folder consists of 3662 images of five classes and 1992 images in the test folder. Figure 4 represents the training data class distribution.
It is visible that the training data in the APTOS Blindness dataset is imbalanced with much data labelled as no-DR (0 class) and to train a deep learning model to categorize the image data among 5 classes will overfit the data. In-order to balance the dataset, we used resampling technique to balance all classes to 300 images represented in Table 1. To train the deep learning model to classify the 5 classes it needs more balanced data. So, this experiment augmented all classes in Table.2 by DCGAN.
3.1. Generator Function
The generator function architecture represented in figure 5 which is used in DC-GAN. In this generator network takes random input 4 X 4 X 512 is given to the dense-layer for modify the given input to a presentation of 8 X 8 X 256. Secondly to produce an image with size 64 X 64 X 3, the dense layer output is come after the sequence of transposed convolution layers to unsampled the representation. Third, the function LeakyRelu [24] applied on all layers except the output layer and the LeakyRelu function Tanh is used for the output layer. This allows the method to tune to congestion fatly and protect the color space training distribution [25]. Fourth, all layers batch normalization [26] is used, but not for the output layer, for balancing the process of learning it normalizes the input with zero mean and unit variance. In DC-GAN generator function, we used 3 transposed convolution layers to un sample the representation of size 4 X 4 X 512 to an image size of 64 X 64 X 3.
3.2 Discriminator Function
Discriminator function goal is to categorize the images generated from generator network are real or fake. The discriminator function is shown in figure 6. It takes the retina images with input size 64 X 64 X 3, combines the generated images and original images from generator network and original dataset. In discriminator function, the input image goes through a sequence of convolutional layers followed by an activation function called sigmoid to classify the real or fake image. In this network we used as like Radford et al. [25] used the layers combination of convolution followed by Leaky Elu [27] followed by batch normalization [26] is applied to layers except the input layer.
3.3. GAN objective function and Loss function
The GANs main goal is to reduce the distance between the likelihood distribution of the generated and distributed data. In the proposed experiment, minimax loss function is used, which is introduced by Goodfellow et al [9], represented in equation (1), here the aim of generator function is to reduce the loss and discriminator function aim is to increase the same loss as generated from generator function. Consequently, the learning methos of GAN is to train generator and discriminator networks simultaneously, like a minmax game between the generator and the discriminator.
where represents predicted value for all origina instances, represents predicted value among all fake instances, the random noise variable , sampled from a standard normal distribution, the generator maps to the data space, x is the real data, and is the probability that x came from the real data distribution rather than the generated data distribution [9]. The aim of G is to evaluate the training data distribution came from so that it can create fake samples from that evaluated distribution [9]. The discriminator network D tries to increase the probability that original images and fake images are classified correctly. In addition, the generator network G tries to reduce . Hypothetically, the result of this minmax game is where = D where the discriminator predicts arbitrarily whether the inputs are real images or fake image.
In DC-GAN training there are 2 steps: (1) discriminator network training (2) generator network training. The main objective of discriminator is to classify correctly the input image is fake or real and it trains the function generator to beter create fake images. First, on a set of original images the discriminator is trained to compute log D(x). Second, the generator produces a set of fake images, and the discriminator function is trained with this batch of fake images to calculate . The model is trained on the DCGAN with epochs 500, and the DC-GAN creates images that simulated retina images in 60 epochs. At that time the generated images quality has improved with epochs 500. A sample grid of real and fake images were shown in Figure. To inspect the effect of traditional data augmentation and advanced data augmentation techniques like DCGAN on the evaluation of classification models. Distinctively, model performance can be compared with the synthetic images generated from traditional techniques and with the synthetic images generated from advanced data augmentation techniques in the training data. A Resnet50 model was performed for classification models [24] which are composed of four convolution layers. After each convolution layer there is a max-pooling operation. The model includes 2 fully connected layers. The ReLU (Rectified Linear Unit) function was used in all layers. The dataset was segregated into train and test sets based on three-fold cross-validation. The experiments included two scenarios. The model was trained with the inclusion of synthetic images using DCGAN. On contrary, the model was again trained after the insertion of the generated images using the online augmentation techniques in the training set. This model trained on the Kaggle APTOS-Blindness dataset for epochs 50. For initial epochs the learning rate was set at 10−3 and if no improvement in accuracy it will automatically reduce to 10-2 and the model got best accuracy of 98.66 and 91.8 for two models at learning rate 10-1. For starting layers in resnet50 model, layer. trainable is False and then next layers i.e., layer. Trainable is True. Adam optimizer [25] is used for this model. The two classification model accuracies were analyzed based on Receiver Operating Characteristics (ROC) curve. This ROC curve maps the relationship between the true positive rate on y-axis and the false positive rate on x-axis across a full range of possible thresholds. The classification accuracy of the original dataset and traditional techniques generated images data is 91.8% and with synthetic dataset using DCGAN is 98.66%. Tables 3,4,5 and 6 represents the classification report of both models and Figures 7 a) and 8 a) represents the confusion matrix of ROC curve analysis. Figure 7 b) represents the ROC curve in the baseline case (i.e., without synthetic data), while Figure 8 b) represents the ROC curve applying the traditional-based data augmentation methods.
4.1Comparative evaluation of the model
Evaluation can be based on the comparative performance of DC-GAN data augmentation technique. Advanced data augmentation results in beter test accuracy and decreases the over- fitting. Accordingly, trained a neural resnet50 classifier using DCGAN synthetic image data along with traditional synthetic image data. The research model is trained with parameters of epoch 50, 32 batch size and learning rate as 0.001. Adam optimizer is used to train the model. The model is trained on two scenarios, one with taking the synthetic images using traditional augmentation techniques and another one with taking synthetic images using DAGAN method. The research model has experimented on NVIDIA workstation with 16GB GPU. The proposed research model is trained.
After the data augmentation and pre-processing methods are applied on the original dataset, the resultant images given into a resnet50 model, which identifies distinguish features to classify the severity of the diabetic retinopathy. The resnet50 pre-trained model is designed from scratch to identify the DR classes to detect the severity of disease. The research architecture contains 2 convolutional layers with ReLU activation function, 1 Max-pooling layer, no dropout layers, 2 dense layers and a SoftMax function. The ReLU activation function can handle the vanishing gradient problem and it allows the networks to learn and implement faster compared to other activation functions [25].
The utilization of Data Augmentation has been considered for the advancement of the prediction accuracy of image categorization tasks. The major tasks of computer vision are generating synthetic image data from the original (real) image data. Previous studies show, there were several techniques to accomplish this, but the class imbalance problem creates another problem in the field of clinical/medical imaging. The production of a greater high-resolution images and quality images significantly improves the classification accuracy. This paper demonstrates the use of DCGAN method to increase the retina image dataset for the same. Although, the research scholars doing research in the clinical/medical imaging field can utilize this method for increasing the dataset and attain good results than using the existing data augmentation techniques. This experiment achieved the quality of synthetic images using the DC-GAN method and used cross-entropy loss function. The future work can concentrate on other loss functions for synthesizing quality images
1. Y. Zhou, B. Wang, X. He, S. Cui, and L. Shao, "DR-GAN: Conditional Generative Adversarial Network for Fine-Grained Lesion Synthesis on Diabetic Retinopathy Images," IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 1, pp. 56–66, Jan. 2022, doi: 10.1109/JBHI.2020.3046257.
2. Y. Xue, T. Gong, J. Fan, and W. Cai, "Diabetic Retinopathy Diagnosis Using Multichannel Generative Adversarial Network With Semi supervision," IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 7, pp. 4213–4222, Jul. 2020, doi: 10.1109/TIM.2020.2966760.
3. C.-M. Drăgan, M. M. Saad, M. H. Rehmani, and R. O'Reilly, "Evaluating the Quality and Diversity of DCGAN-based Generatively Synthesized Diabetic Retinopathy Imagery," arXiv preprint arXiv:2208.05593, Aug. 2022. [Online]. Available: https://arxiv.org/abs/2208.05593.
4. Poles, A. S. Chowdhury, and M. M. Rahman, "Repurposing the Image Generative Potential: Exploiting GANs to Grade Diabetic Retinopathy," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Jun. 2024, pp. 1–10, doi: 10.1109/CVPRW.2024.00123.
5. Bhatra, "GAN-Diabetes-Retinopathy-Image-Synthesis," GitHub Repository, 2023.Available: https://github.com/abhi-bhatra/GAN-Diabetes-Retinopathy-Image-Synthesis.
6. M. M. Saad, C.-M. Drăgan, M. H. Rehmani, and R. O'Reilly, "Evaluating Generatively Synthesized Diabetic Retinopathy Imagery," arXiv preprint arXiv:2208.05593v1, Aug. 2022. [Online]. Available: http://arxiv.org/abs/2208.05593v1.
7. Y. Zhou et al., "DR-GAN: Conditional Generative Adversarial Network for Fine-Grained Lesion Synthesis on Diabetic Retinopathy Images," arXiv preprint arXiv:1912.04670, Dec. 2019. [Online]. Available: https://arxiv.org/abs/1912.04670.
8. M. Drăgan et al., "Evaluating the Quality and Diversity of DCGAN-based Generatively Synthesized Diabetic Retinopathy Imagery," arXiv preprint arXiv:2208.05593, Aug. 2022. [Online]. Available: https//arxiv.org/abs/2208.05593.
9. Y. Zhou et al., "DR-GAN: Conditional Generative Adversarial Network for Fine-Grained Lesion Synthesis on Diabetic Retinopathy Images," IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 1, pp. 56–66, Jan. 2022, doi: 10.1109/JBHI.2020.3046257.
10. Y. Xue et al., "Diabetic Retinopathy Diagnosis Using Multichannel Generative Adversarial Network with Semi supervision," IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 7, pp. 4213–4222, Jul. 2020, doi: 10.1109/TIM.2020.2966760.
11. Y. Zhou et al., "DR-GAN: Conditional Generative Adversarial Network for Fine-Grained Lesion Synthesis on Diabetic Retinopathy Images," arXiv preprint arXiv:1912.04670, Dec. 2019. [Online]. Available: https://arxiv.org/abs/1912.04670.
12. M. M. Saad et al., "Evaluating Generatively Synthesized Diabetic Retinopathy Imagery," arXiv preprint arXiv:2208.05593v1, Aug. 2022. [Online]. Available: https://arxiv.org/abs/2208.05593v1.
13. Yi, X.; Walia, E.; Babyn, P. Generative adversarial network in medical imaging: A review. Med. Image Anal. 2019, 58, 101552.
14. Nie, R. Trullo, C. Petitjean, S. Ruan, and D. Shen, “Medical image synthesis with context-aware generative adversarial networks,” MICCAI, vol. 10435, pp. 417–425, 2016.
15. Z. Lin, R. Guo, Y. Wang, B. Wu, T. Chen, W. Wang, D. Z. Chen, and J. Wu, “A framework for identifying diabetic retinopathy based on antinomies detection and attention-based fusion,” in MICCAI. Springer, 2018, pp. 74–82. “International clinical diabetic retinopathy disease severity scale,” American Academy of Ophthalmology, 2012.
16. L. Seoud, J. Chelbi, and F. Cheriet, “Automatic grading of diabetic retinopathy on a public database,” in MICCAI. Springer, 2015.
17. V. Gulshan, L. Peng, M. Coram, M. C. Stumpe, D. Wu, A. Narayanaswamy, S. Venugopal an, K. Widner, T. Madams, J. Cuadros et al., “Development and validation of a deep learning algorithm
18. J. Jiang, Y.-C. Hu, N. Tyagi, P. Zhang, A. Rimner, G. S. Mageras, J. O. Deasy, and H. Veeraragavan, “Tumor-aware, adversarial domain adaptation from CT to MRI for lung cancer segmentation,” MICCAI, vol. 11071, pp. 777–785, 2018.
19. T. Zhou, H. Fu, G. Chen, J. Shen, and L. Shao, “Hi-net: hybrid-fusion network for multi-modal mr image synthesis,” IEEE Transactions on Medical Imaging, 2020.
20. J. Zhao, D. Li, Z. Kassam, J. Howey, J. Chong, B. Chen, and S. Li, “Tripartite-gan: Synthesizing liver contrast-enhanced mri to improve tumor detection,” Medical Image Analysis, p. 101667, 2020.
21. Mahapatra, B. Bozorgtabar, J.-P. Thiran, and M. Reyes, “Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network,” MICCAI, pp. 580–588, 2018.
22. W. Wei, E. Poirion, B. Bodini, S. Durrleman, N. Ayache, B. Stankoff, and O. Colliot, “Learning myelin content in multiple sclerosis from multimodal mri through adversarial training,” MICCAI, pp. 514–522, 2018.
23. Maayan Frid-Adar, Eya lKlang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan, Synthetic data augmentation using GAN for improved liver lesion classification. 2018 IEEE 15th International Symposium on Biomedical Imaging, pages 289–293, (2018)
24. C. P, G. A, M. M I, N. M, A. M, M. A M, and C. A, “End-toend adversarial retinal image synthesis,” IEEE Transactions on Medical Imaging, vol. 37, no. 3, pp. 781–791, 2018.
25. H. Zhao, H. Li, S. Maurer-Stroh, and L. Cheng, “Synthesizing retinal and neuronal images with generative adversarial nets,” Medical Image Analysis, vol. 49, pp. 14–26, 2018.