Overall, machine learning is fundamentally one of the standard and evolving approach which has efficient algorithms for classification and reorganization through recursive learning. I argue that machine learning allows it’s possible to build and verify a classification system were, on a human level, can be called ‘intelligence’. In term of disease forecasting, it is machine learning which has done the greatest wonder provided one has the right training and testing case. This Study introduces a novel approach of predicting diabetes using Machine Learning Classification which is based on other factors that contributes to an individual’s diabetes risks. We have a dataset with a total of 768 instances and 9 attributes. It includes the usual risk factors such as age, glucose, and BMI. We were six method uses that is Logistic regression, Random Forest, KNN, Support Vector, Decision Tree and Naïve Bayes. The accuracy of the different algorithm of the training data set was obtained by 77 %, 100%, 81%, 81%, 100% and 74% respectively.
Diabetes refers to a medical condition characterized by an excessive buildup of glucose or sugar in the blood. After a period of time, excessive glucose levels in blood can cause harm to various body tissues. For example change of micro angiopathy to macro angiopathy with a great risk for coronary heart disease, stroke, kidney, eye, gum, foot, and nerve. Eventually, this can also result in coronary heart disease. People suffering from the blood sugar ailment experience symptoms of heart disease earlier than those without it. People with diabetes have nearly twice the risk of heart disease or stroke compared to moderates without diabetes.
Diabetes is one such illness which one can survive very easily once the treatment has been administered and controlled as mentioned by WHO (2019) diabetes means ‘a disease characterized by hyperglycemia due to insulin deficiency or its insufficient effect’. This deficiency may be either genetic or acquired, resulting in the people suffering from hyperglycemia which, in turn, destroys many systems within the body with the most commonly affected being blood vessels and nerves. Chronic diseases such as diabetes rarely cure; therefore, an integrated approach is necessary to reduce the risk of occurrence or onset of the complication by managing and controlling the disease over a long time period. Behavioral changes, pharmacotherapy and continuous follow-up are necessary to control blood glucose levels at optimal targets and lower and reduce other related health risks.
The American orthodox diabetic association (ADA, Cahn et al 2014) further explains that, diabetes is a group of diseases of metabolic dysfunction, where diabetes was exhibited by an increase in blood sugar level, and where there is abnormality or either or both in the production and utilization of carbohydrates. Diabetes has many long term complications, but the one with the most significance is chronic high blood sugar and its undulated effects on progression causing long term damage to the eyes, hearts, blood vessels and nerves as well as the kidneys, In the view of the last statement it is clear, that it would be impossible to completely eradicate the management of diabetic type 2 nowadays. Diabetes America ADA has tackled such an issue, where, management and thorough care are essential to avoid or postpone the onset of these complications, making diabetes much more of a rather complex ailment that should be taken care of with utmost caution. Chichwa, 2009 states that, diabetes is an ailment of a long duration where all those people afflicted with it and whose state of health is only worsened by the increased blood sugar level which could b e supported by a wide range of clinical manifestations if the course is left neglected or poorly managed. It happens as a consequence of an insufficient creation of insulin, the key hormonal agent controlling the glucose content in blood, by pancreas or an ineffective action of the insulin that is produced by the human body. Once these cause and further management are discovered, which could entail lifestyle alteration quite frankly that could be a great inconvenience in their ordinary living.
Diabetic patients are dependent on treatments and other strategies to control their disease and guarantee their health and the risk of further complications, and this poses an enormous challenge in their everyday life. The diabetes lifestyle imposes a considerable burden to the sick since they need to keep checking their sugar levels, eat properly, exercise and take drugs whenever necessary. Such includes controlling the amount of carbohydrates eaten, checking glucose content 3 7 times a day and changing the Ruiz doses. Diabetic patients also require seeing its physician for opportunistic examinations to revise therapeutic schedules, exclude risk factors for potential organ/ tissue damage (kidneys, eyes, heart) and also deal with some common diseases (high pressure, raised cholesterol levels, nerve damage). At this degree or level of the management of crystalline diabetes, foot ulcers, inflammatory diseases, or brain functional impairment appear, making therapy more complicated. So working on different diseases a patient has and equally studying all health conditions is of utmost importance in all quarters. On the other hand, many patients suffering from diabetes lead dynamic life in spite of the glycemia due to diet, medication and devices e.g. insulin pumps and glucose continuous monitors.
In the present examine we've taken diverse system gaining knowledge of classification set of rules is used. we have carried out Logistics Regression approach, k-nearest neighbor(KNN) algorithm decision Tree, Random forest and Naïve Bayes algorithm by way of the use of Python with the aid of enforcing the information set to get prediction, Accuracy, recall and Precision for fitting the algorithm to the education set: education facts is the largest (in -size) subset of the unique dataset, that's used to train or in shape the machine studying model. In our model 80% of the information was for training set and 20 % of the information changed into for the test information set.
2.1 Confusion Matrix in Machine Learning
A confusion matrix (or, mistakes matrix) is a visualization approach for classifier set of rules effects. Greater specifically, it's far a table that breaks down the range of floor reality instances of a specific magnificence in opposition to the number of anticipated class times. Confusion matrices are one in all several evaluation metrics measuring the overall performance of a category model. They can be used to calculate some of other version overall performance metrics, which includes precision and recall, amongst others.
Confusion matrices may be used with any classifier algorithm, which includes Naïve Bayes, logistic regression fashions, decision trees, and so on. due to their huge applicability in information technology and machine getting to know models, many programs and libraries come preloaded with features for growing confusion matrices, such scikit-learn’s sklearn.
2.2 Histogram of data Set:
2.6 Correlations among the Data set:
3.1 Confusion matrix for different classification algorithm
3.2 ROC Curve for Classification of Algorithm
Logistic Regression |
77.36156351791531 |
77.27272727272727 |
KNN |
81.10749185667753 |
74.67532467532467 |
Support Vector |
81.92182410423453 |
83.1168831168831 |
Decision tree |
100.0 |
79.87012987012987 |
Random Forest |
100.0 |
81.16883116883116 |
Naïve Bayes |
74.2671009771987 |
74.02597402597402 |
3.4 Accuracy contrast of the model
Algorithm |
Precision Score |
Recall Score |
ROC AUC SCORE |
Logistic Regression |
75.0000 |
57.8947 |
0.7327 |
KNN |
68.75 |
57.89473 |
0.7121 |
Support Vector |
86.04651 |
64.9122 |
0.79363 |
Decision tree |
69.4915 |
71.9298 |
0.7668 |
Random Forest |
72.5490 |
64.9122 |
0.7523 |
Naïve Bayes |
65.4544 |
63.1578 |
0.7178 |
One of the important real-world medical problems is the detection of diabetes at its early stage. In this study, systematic efforts are made in designing a system which results in the prediction of disease like diabetes. During this work, 6 machine learning classification algorithms are studied and evaluated on various measures. That is Logistic regression, Random forest, KNN, Support Vector, Decision Tree and Naïve Bayes for the prediction of Diabetes disease. We find that Decision tree, Random forest and Support vector gives the better accuracy others.