Risk of Diabetes Disease Prediction Using Machine Learning Approach

doi:10.5083/ejcm

Contents

Abstract
Keywords
Introduction
Materials And Methods
Results
Conclusion
References

Download XML

263 Views

9 Downloads

Share this article

Research Article | Volume 14 Issue 6 (Nov - Dec, 2024) | Pages 391 - 399

Risk of Diabetes Disease Prediction Using Machine Learning Approach

Ashutosh Pandey

Priyanka Gautam

Department Of Statistics, B. N. College, Patna University, Patna, Bihar. India

Department Of Psychology, D.D.U. Gorakhpur University Gorakhpur, UP, India

Under a Creative Commons license

Open Access

DOI : 10.5083/ejcm

Received

Oct. 9, 2024

Revised

Oct. 26, 2024

Accepted

Nov. 18, 2024

Published

Dec. 2, 2024

Abstract

Overall, machine learning is fundamentally one of the standard and evolving approach which has efficient algorithms for classification and reorganization through recursive learning. I argue that machine learning allows it’s possible to build and verify a classification system were, on a human level, can be called ‘intelligence’. In term of disease forecasting, it is machine learning which has done the greatest wonder provided one has the right training and testing case. This Study introduces a novel approach of predicting diabetes using Machine Learning Classification which is based on other factors that contributes to an individual’s diabetes risks. We have a dataset with a total of 768 instances and 9 attributes. It includes the usual risk factors such as age, glucose, and BMI. We were six method uses that is Logistic regression, Random Forest, KNN, Support Vector, Decision Tree and Naïve Bayes. The accuracy of the different algorithm of the training data set was obtained by 77 %, 100%, 81%, 81%, 100% and 74% respectively.

Keywords

Machine Learning

Training

Test

Regression

Random Forest

Diabetes

INTRODUCTION

Diabetes refers to a medical condition characterized by an excessive buildup of glucose or sugar in the blood. After a period of time, excessive glucose levels in blood can cause harm to various body tissues. For example change of micro angiopathy to macro angiopathy with a great risk for coronary heart disease, stroke, kidney, eye, gum, foot, and nerve. Eventually, this can also result in coronary heart disease. People suffering from the blood sugar ailment experience symptoms of heart disease earlier than those without it. People with diabetes have nearly twice the risk of heart disease or stroke compared to moderates without diabetes.

Diabetes is one such illness which one can survive very easily once the treatment has been administered and controlled as mentioned by WHO (2019) diabetes means ‘a disease characterized by hyperglycemia due to insulin deficiency or its insufficient effect’. This deficiency may be either genetic or acquired, resulting in the people suffering from hyperglycemia which, in turn, destroys many systems within the body with the most commonly affected being blood vessels and nerves. Chronic diseases such as diabetes rarely cure; therefore, an integrated approach is necessary to reduce the risk of occurrence or onset of the complication by managing and controlling the disease over a long time period. Behavioral changes, pharmacotherapy and continuous follow-up are necessary to control blood glucose levels at optimal targets and lower and reduce other related health risks.

The American orthodox diabetic association (ADA, Cahn et al 2014) further explains that, diabetes is a group of diseases of metabolic dysfunction, where diabetes was exhibited by an increase in blood sugar level, and where there is abnormality or either or both in the production and utilization of carbohydrates. Diabetes has many long term complications, but the one with the most significance is chronic high blood sugar and its undulated effects on progression causing long term damage to the eyes, hearts, blood vessels and nerves as well as the kidneys, In the view of the last statement it is clear, that it would be impossible to completely eradicate the management of diabetic type 2 nowadays. Diabetes America ADA has tackled such an issue, where, management and thorough care are essential to avoid or postpone the onset of these complications, making diabetes much more of a rather complex ailment that should be taken care of with utmost caution. Chichwa, 2009 states that, diabetes is an ailment of a long duration where all those people afflicted with it and whose state of health is only worsened by the increased blood sugar level which could b e supported by a wide range of clinical manifestations if the course is left neglected or poorly managed. It happens as a consequence of an insufficient creation of insulin, the key hormonal agent controlling the glucose content in blood, by pancreas or an ineffective action of the insulin that is produced by the human body. Once these cause and further management are discovered, which could entail lifestyle alteration quite frankly that could be a great inconvenience in their ordinary living.

Diabetic patients are dependent on treatments and other strategies to control their disease and guarantee their health and the risk of further complications, and this poses an enormous challenge in their everyday life. The diabetes lifestyle imposes a considerable burden to the sick since they need to keep checking their sugar levels, eat properly, exercise and take drugs whenever necessary. Such includes controlling the amount of carbohydrates eaten, checking glucose content 3 7 times a day and changing the Ruiz doses. Diabetic patients also require seeing its physician for opportunistic examinations to revise therapeutic schedules, exclude risk factors for potential organ/ tissue damage (kidneys, eyes, heart) and also deal with some common diseases (high pressure, raised cholesterol levels, nerve damage). At this degree or level of the management of crystalline diabetes, foot ulcers, inflammatory diseases, or brain functional impairment appear, making therapy more complicated. So working on different diseases a patient has and equally studying all health conditions is of utmost importance in all quarters. On the other hand, many patients suffering from diabetes lead dynamic life in spite of the glycemia due to diet, medication and devices e.g. insulin pumps and glucose continuous monitors.

MATERIALS AND METHODS

In the present examine we've taken diverse system gaining knowledge of classification set of rules is used. we have carried out Logistics Regression approach, k-nearest neighbor(KNN) algorithm decision Tree, Random forest and Naïve Bayes algorithm by way of the use of Python with the aid of enforcing the information set to get prediction, Accuracy, recall and Precision for fitting the algorithm to the education set: education facts is the largest (in -size) subset of the unique dataset, that's used to train or in shape the machine studying model. In our model 80% of the information was for training set and 20 % of the information changed into for the test information set.

2.1 Confusion Matrix in Machine Learning

A confusion matrix (or, mistakes matrix) is a visualization approach for classifier set of rules effects. Greater specifically, it's far a table that breaks down the range of floor reality instances of a specific magnificence in opposition to the number of anticipated class times. Confusion matrices are one in all several evaluation metrics measuring the overall performance of a category model. They can be used to calculate some of other version overall performance metrics, which includes precision and recall, amongst others.

Confusion matrices may be used with any classifier algorithm, which includes Naïve Bayes, logistic regression fashions, decision trees, and so on. due to their huge applicability in information technology and machine getting to know models, many programs and libraries come preloaded with features for growing confusion matrices, such scikit-learn’s sklearn.

2.2 Histogram of data Set:

2.6 Correlations among the Data set:

CLASSIFICATION REPORTS

3.1 Confusion matrix for different classification algorithm

3.2 ROC Curve for Classification of Algorithm

RESULTS

Logistic Regression	77.36156351791531	77.27272727272727
KNN	81.10749185667753	74.67532467532467
Support Vector	81.92182410423453	83.1168831168831
Decision tree	100.0	79.87012987012987
Random Forest	100.0	81.16883116883116
Naïve Bayes	74.2671009771987	74.02597402597402

3.4 Accuracy contrast of the model

Algorithm	Precision Score	Recall Score	ROC AUC SCORE
Logistic Regression	75.0000	57.8947	0.7327
KNN	68.75	57.89473	0.7121
Support Vector	86.04651	64.9122	0.79363
Decision tree	69.4915	71.9298	0.7668
Random Forest	72.5490	64.9122	0.7523
Naïve Bayes	65.4544	63.1578	0.7178

CONCLUSION

One of the important real-world medical problems is the detection of diabetes at its early stage. In this study, systematic efforts are made in designing a system which results in the prediction of disease like diabetes. During this work, 6 machine learning classification algorithms are studied and evaluated on various measures. That is Logistic regression, Random forest, KNN, Support Vector, Decision Tree and Naïve Bayes for the prediction of Diabetes disease. We find that Decision tree, Random forest and Support vector gives the better accuracy others.

REFERENCES

Fikirte Girma Wolde Michael and Sumitra Menaria, "Prediction of Diabetes using Data Mining Techniques Dept. of Computer Science and Engineering", Proceedings of the 2nd International conference on Trends in Electronics and Informatics (ICOEI 2018).
Terry Jacob Mathew and Elizabeth Sherly, Analysis Supersived Learning Techniques for Cost Effective Diaease Prediction using Non-Clinical Parameters, Trivndrum, July 2018.
Deepti Sisodiaa and Dilip Singh Sisodiab, "Prediction Diabetes and Classification Algorithm", International Conference on Computational Intelligence and Data Science.
Analysis of a Random Forests Model Gerard Biau LSTA&LPMA University Pierre et Marie Curie – Paris VI’ Boîte 158 Tour 15–25 2eme ‘etage’ 4 place Jussieu 75252 Paris Cedex 05, France.
A Novel Classification Method for Diagnosis of Diabetes Mellitus Using Artificial Neural Networks- 1.T.Jayalakshmi Computer Science Department CMS College of Science and Commerce Coimbatore, India , Dr .A. Santhakumaran Statistics Department Salem Sowdeswari College Salem, India 2010 International Conference on Data Storage and Data Engineering.
Intelligible Support Vector Machines for Diagnosis of Diabetes Mellitus Nahla H.Barakat, Andrew P. Bradley, Senior Member, IEEE,and Mohamed Nabil Barakat IEEE Transactions on Information Technology in biomedicine, vol. 14, no. 4, July 2010.
Design of a hybrid system for the diabetes and heart diseases Humar Kahramanli *, Novruz Allahverdi Department of Electronic and Computer Education, Selcuk University, Konya, Turkey.
Nonparametric criteria for supervised classification of fuzzy data Ana Colubi a, Gil González-Rodríguez a, , M. Angeles Gil a,Wolfgang Trutschnig b a Department of Statistics, University of Oviedo, 33007 Oviedo, Spain b Research Unit on Intelligent Data Analysis and Graphical Models, European Centre for Soft Computing, 33600 Mieres, Spain
A fast and adaptive automated disease diagnosis method with an innovative neural network model Erdem Alkım, Emre Gürbüz, Erdal Kılıç Department of Computer Engineering, Faculty of Engineering, Ondokuzmayıs Universities, 55139 Kurupelit, Samsun, Turkey,Pg, Neural Networks 33 (2012) 88–96
An automatic diabetes diagnosis system based on LDA-Wavelet Support Vector Machine Classifier Duygu Calisir a, Esin Dog˘antekin ba Istanbul University, Cerrahpas a Medical Faculty, Istanbul, Turkey b Firat University, Firat Medicine Center, Department of Microbiology and Clinical Microbiology, 23119 Elazi , Turkey, pg, Expert Systems with Applications 38 (2011) 8311–8315.
A cascade learning system for classification of diabetes disease: Generalized Discriminant Analysis and Least Square Support Vector Machine Kemal Polat a, Salih Gu¨nes a, Ahmet Arslan b a Selcuk University, Electrical and Electronics Engineering, 42075 Konya,Turkey b Selcuk University, Computer Science, 42075 Konya, Turkey, Expert Systems with Applications 34 (2008) 482–487.
Data mining a diabetic data warehouse Joseph L. Breaulta,b,*, Colin R. Goodallc,d, Peter J. Fose, b, pg, Artificial Intelligence in Medicine 26 (2002) 37–54.
Revision of the ADA-classification of diabetes mellitus type 2 (DMT2): The importance of maturity onset diabetes (MOD), and senile diabetes (DS) MarcoVacante, Michele Malaguarnera, Massimo Motta *, pg, Archives of Gerontology and Geriatrics 53 (2011) 113–119.
Using fuzzy Ant Colony Optimization for Diagnosis of Diabetes Disease Mostafa Fathi Ganji Faculty of Electrical and Computer Engineering University of Tarbiat Modares Tehran, Iran, Mohammad Saniee Abadeh Faculty of Electrical and Computer Engineering University of Tarbiat Modares Tehran, Iran.

European Journal of Cardiovascular Medicine

Download PDF