IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Family Med Prim Care
  • v.11(11); 2022 Nov
  • PMC10041290

A survey on diabetes risk prediction using machine learning approaches

Shimoo firdous.

1 Department of Computer Science, Bhagwant University, Ajmer, Rajasthan, India

Gowher A. Wagai

2 Department of Medicine-Associated Hospital GMC, Anantnag, Jammu and Kashmir, India

Kalpana Sharma

Background:.

Diabetes mellitus (DM) is a chronic condition that can lead to a variety of consequences. Diabetes is a condition that is caused by factors such as age, lack of exercise, sedentary lifestyle, family history of diabetes, high blood pressure, depression and stress, poor food, and so on. Diabetics are at a higher risk of developing diseases such as heart disease, nerve damage (diabetic neuropathy), eye problems (diabetic retinopathy), kidney disease (diabetic nephropathy), stroke, and so on. According to the International Diabetes Federation, 382 million people worldwide suffer from diabetes. By 2035, this number will have risen to 592 million. Every day, a large number of people become victims, and many are ignorant whether they have it or not. It primarily affects individuals between the ages of 25 and 74 years. If diabetes is left untreated and undiagnosed, it can lead to a slew of complications. The emergence of machine learning approaches, on the other hand, solves this crucial issue.

Aims and Objectives:

The aim was to study the DM and analyze how machine learning algorithms are used to identify the diabetes mellitus at an early stage, which is one of the most serious metabolic disorders in the world today.

Methods and Materials:

Data was obtained from databases such as Pubmed, IEEE xplore, and INSPEC,and from other secondary sources and primary sources in which methods based on machine learning approaches used in healthcare to predict diabetes at an early stage are reported.

After surveying various research papers, it was found that machine learning classification algorithms like Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Random Forest (RF) etc shows the best accuracy for predicting diabetes at an early stage.

Conclusion:

Early detection of diabetes is critical for effective therapy. Many people have no idea whether or not they have it. The full assessment of Machine learning approaches for early diabetes prediction and how to apply a variety of supervised and unsupervised machine learning algorithms to the dataset to achieve the best accuracy are addressed in this paper.. Furthermore, the work will be expanded and refined to create a more precise and general predictive model for diabetes risk prediction at an early stage. Different metrics can be used to assess performance and for accurate diabetic diagnosis.

Introduction

Diabetes mellitus is a metabolic disorder defined by abnormally high blood sugar levels due to a lack of insulin secretion or a combination of insulin resistance and insufficient insulin synthesis to compensate.[ 1 ] It is a progressive metabolic ailment that affects all parts of the patient’s life, including physical and mental well-being, and no therapy technique can produce spectacular improvements or stop the disease from progressing.[ 2 ] In the year 2000, India had the greatest number of diabetics in the world (31.7 million), which increased to 62.4 million in 2011 and is anticipated to reach 69.9 million by 2025.[ 3 , 4 ] Rapid urbanization and economic development are to blame for India’s high frequency. Indians are more likely to develop diabetes as a result of their low BMI combined with high upper-body adiposity, high body fat percentage, and high insulin resistance.[ 5 ] Blurred vision, weight loss, fatigue, increased hunger and thirst, confusion, frequent urination, poor healing, frequent infections, and difficulty concentrating are all signs or symptoms of diabetes. “Diabetes means you have too much sugar in your blood. High blood sugar problems start when your body no longer makes enough of a chemical, or hormone, called insulin.”[ 6 ] “Sweet urine” is the direct translation. Normal urine does not contain sugar. There is sugar (or more precisely glucose) in the urine because the amount of glucose in the blood has increased to the point where it spills over into the urine. Because the body is unable to metabolize glucose properly, it accumulates in the bloodstream. As a result, diabetes is a disease in which the body is unable to use glucose properly.

This paper provides many machine learning algorithms used for the early prediction of diabetes. The remainder of the paper is conceived in the following manner: Section (1) is the introduction; section (2) is diabetes and its types; section (3) is machine learning algorithms; section (4) literature survey for prediction of diabetes; and section (5) is the conclusion.

Diabetes and its Types

Diabetes mellitus (DM) is a metabolic disease with a variety of causes. It is characterized by persistent hyperglycemia and alterations in carbohydrate, lipid, and protein metabolism caused by insulin deficiency, insulin action, or both. Diabetes is a chronic disease. Diabetes can injure neurons and blood arteries in the eyes, kidneys, heart, and lower legs if not effectively treated. Problems may emerge if blood glucose levels remain high for an extended period. Gum disease or tooth decay are examples of mouth issues. Diabetic retinopathy is a condition that causes vision loss and, in severe cases, blindness. Heart and blood vessel illnesses include heart attacks, strokes, and peripheral artery disease (cardiovascular diseases or CVD) (insufficient blood supply to the feet and legs). Kidney disease (diabetic nephropathy) is a condition in which the kidneys do not function properly or at all.[ 7 ] The three kinds of diabetes include type 1 diabetes, type 2 diabetes, and gestational diabetes.

Type 1 diabetes (T1D)

The body does not produce enough insulin in type 1 diabetes. Body cells can’t absorb glucose from the bloodstream without insulin, so they have to rely on other sources of energy. An excess of glucose in the blood causes diabetes and its complications. This type of diabetes is also known as insulin-dependent diabetes mellitus (IDDM). Although it affects adolescents and teenagers more frequently, it can affect anyone at any age. It requires a delicate balancing act of insulin injections (and, in some circumstances, oral medicines), exercise, nutrition planning, and lifestyle changes. Frequent urination, unusual thirst, unusual hunger, rapid weight loss, weariness and weakness, nausea, and irritability are some of the symptoms of type 1 diabetes.

Type 2 diabetes (T2D)

Type 2 diabetes can range from primarily insulin resistance to mostly secretory dysfunction with or without insulin resistance. The pancreas produces insulin, but it may not be enough to keep blood glucose levels normal, or the cells may be resistant to the insulin produced. The illness is most prevalent in those over the age of 40, but it is also becoming more prevalent in teenagers and young children. Type 1 diabetes is characterized by drowsiness, dry, itchy skin, unintended weight gain or loss, blurred vision, tingling, numbness, pain in the lower legs, easy weariness, sluggish healing of cuts, or scratches, and frequent infections (e.g., vaginal infections). Food, activity, lifestyle control, and, in some situations, oral medicines or insulin are all necessary for type 2 diabetes.[ 6 ]

Gestational diabetes

Pregnant women who have never had diabetes before but have high blood glucose (sugar) levels during pregnancy are diagnosed with gestational diabetes. It is a temporary condition that affects 2%–4% of all pregnant women and usually disappears after the baby is born. Women who have had gestational diabetes in the past are more likely to develop type 2 diabetes later in life. There is no known etiology for this type of diabetes. The placenta supports the infant’s development; placental hormones help the baby develop, but they also prevent the mother’s insulin from working properly in her body, resulting in insulin resistance. When a mother’s body is unable to create and use all of the insulin required during pregnancy, gestational diabetes develops. The majority of women are unaware of any signs or symptoms of gestational diabetes. Increased thirst and more frequent urination are two symptoms.

Diabetes mellitus is a deadly disorder, if not treated early’ however, early detection can minimize the risk significantly. A range of medical diagnostic procedures is already in use for early diagnosis. Early risk forecasts can be made using machine learning techniques. Recent research has given promising results in terms of forecasting the risk of diabetes mellitus. Machine learning is a field of study in which algorithms are used to teach machines without the need of humans. Without having to explicitly program them, we can train them to do a given job and then use that training to handle similar duties. Accuracy is always a major problem in medical science, and different algorithms might yield varying degrees of accuracy on the same data set. To design a better classifier for better classification, it is vital to figure out which algorithm delivers the greatest results. Machine learning can now be found in almost every industry. Its application in medical science has the potential to improve healthcare dramatically.

Decision trees, random forests, support vector machines, naive Bayes classifiers, and artificial neural networks are examples of machine learning and classification algorithms that work well in risk prediction. Because of the algorithms’ computing and data management skills, this is possible. Measures of classification accuracy can be used to select the best algorithm and determine the best classification accuracy. This statistic, however, is insufficient to properly and efficiently determine the best method. When determining the best conclusion, other variables such as the receiver operating characteristic (ROC) value, F-score, and calculation time should be taken into account. Metrics include classification accuracy, F-score, ROC value, and computation time. Future researchers will be aided by the findings of this study in constructing a baseline strategy for DM classification.

Machine Learning Algorithms

Machine learning (ML) is a rapidly developing field that is being applied in a variety of medical applications. ML models all learn from the past and make predictions based on a data set. Diabetes detection will become much easier and less expensive thanks to recent advances in ML. There are numerous diabetic data sets accessible. As a result, ML is required for medical diagnostics. The goal of this study is to forecast a patient’s probability of developing diabetes. Algorithms for machine learning are employed. There are two types of learning for the study.

  • 1) Supervised Learning
  • 2) Unsupervised Learning.

The goal of a supervised learning algorithm is to predict based on labeled data. In supervised learning, the data is labeled. It simulates what a student might learn from an instructor. Unsupervised learning, on the other hand, does not label the data. It’s more like self-learning based on previous experiences. The goal is to forecast a variable’s value. A set of traits and features are used to represent the data. The outcome of guided learning is predetermined. Decision trees (DT), random forests, linear regression, logistic regression, naive Bayes classifiers, k-nearest neighbors (k-NN), support vector machine (SVM), and artificial neural networks (ANN) are some of the most commonly used techniques.

The data in unsupervised learning is made up of values without labels, and the outcome is not predetermined. Based on self-learning, the model makes predictions. Forecasting, classifying, detecting, segmenting, and categorizing data are the key goals of these models. Machine learning applications include analysis, recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics.

Literature Survey for Prediction of Diabetes using Machine Learning Approaches

Birjais et al .[ 8 ] experimented on PIMA Indian Diabetes (PID) data set. It has 768 instances and 8 attributes and is available in the UCI machine learning repository. They aimed to focus more on diabetes diagnosis, which, according to the World Health Organization (WHO) in 2014, is one of the world’s fastest-growing chronic diseases. Gradient boosting, logistic regression, and naive Bayes classifiers were used to predict whether a person is diabetic or not, with gradient boosting having an accuracy of 86%, logistic regression having a 79% accuracy, and naive Bayes having a 77% accuracy.

Sadhu, A. and Jadli A.[ 9 ] experimented on a diabetes data set taken from the UCI repository. There were 520 occurrences and 16 attributes in all. They attempted to concentrate their efforts on predicting diabetes at an early stage. On the validation set of the employed data set, seven classification techniques were implemented: k-NN, logistic regression, SVM, naive Bayes, decision tree, random forests, and multilayer perceptron. The random forests classifier proved to be the best model for the concerned data set, with an accuracy score of 98%, followed by logistic regression at 93%, SVM at 94%, naive Bayes at 91%, decision tree at 94%, random forests at 98%, and multilayer perceptron at 98%, according to the results of training several machine learning models.

Xue et al .[ 10 ] experimented on the diabetes data set taken from the UCI repository; there were 520 patients and 17 qualities in it. They attempted to concentrate on early detection of diabetes. They trained on the actual data of 520 diabetic patients and probable diabetic patients aged 16–90 using supervised ML techniques such as SVM, naive Bayes classifiers, and LightGBM. The performance of the SVM is the best when comparing classification and recognition accuracy. The naive Bayes classifier is the most widely used classification algorithm, with an accuracy of 93.27%. SVM has the highest accuracy rate of 96.54%. LightGBM has an accuracy of only 88.46%. This demonstrates that SVM is the best classification algorithm for diabetes prediction.

Le et al .[ 11 ] experimented on the early-stage diabetes risk prediction; the data set used in this research was taken from the UCI repository and consisted of 520 patients and 16 variables. They suggested a ML approach for predicting diabetes patients’ early onset. It was a new wrapper-based feature selection method that employed grey wolf optimizer (GWO) and adaptive particle swarm optimization (APSO) to optimize the multilayer perceptron (MLP) and reduce the number of needed input attributes. They also compared the results obtained with this method to those obtained via a variety of traditional machine learning algorithms, including SVM, DT, k-NN, naive Bayes classifier (NBC), random forest classifier (RFC), and logistic regression (LR). LR achieved a 95% accuracy rate. k-NN had a 96% accuracy rate, SVM a 95% accuracy rate, NBC a 93% accuracy rate, DT a 95% accuracy rate, and RFC had a 96% accuracy rate. The suggested methods’ computational findings show that not only are fewer features required but also that higher prediction accuracy may be attained (96% for GWO–MLP and 97% for APSO–MLP). This research has the potential to be applied in clinical practice and used as a tool to assist doctors and physicians.

Julius et al .[ 12 ] used the Waikato Environment for Knowledge Analysis (Weka) application platform to test a data set collected from the UCI repository. There were 520 samples in the data set, each with a collection of 17 attributes. The goal of this study was to use machine learning classification approaches based on observable sample attributes to predict diabetes at an early stage. The k-NN, SVM, functional tree (FT), and RFCs were employed as classifiers. k-NN had the highest accuracy of 98%, followed by SVM at 94%, FT at 93%, and RF at 97%.

Shafi et al .[ 13 ] reported that because diabetes is a serious illness, early detection is always a struggle. This study used machine learning classification methods to develop a model that could solve any problem and that could be used to identify diabetes development early on. The authors of this research made concerted efforts to develop a framework that could accurately predict the likelihood of diabetes in patients. As part of this study, the three ML approach classification algorithms—DT, SVM, and NBC—were studied and assessed on various measures. In the study, the PID data set acquired from the UCI repository was used to save time and produce precise findings. The experimental results suggested that the NBC approach was adequate, with a 74% accuracy, followed by SVM with a 63% accuracy and the DT with a 72% accuracy. In the future, the built framework, as well as the ML classifiers used, could be used to identify or diagnose other diseases. The study, as well as several other ML methodologies, could be extended and improved for diabetes research, and the scientists intended to classify other algorithms with missing data.

Khanam et al .[ 14 ] experimented with diabetes illness prediction. Diabetes is a condition with no known cure; therefore early detection is essential. In this study, data mining, ML techniques, and neural network (NN) methodologies were utilized to predict diabetes. They developed a technique that could accurately predict diabetes. They used data from the UCI repository’s PID data set. The data set included information on 768 patients and their 9 attributes. On the data set, they utilized seven ML methods to predict diabetes: DT, k-NN, RFC, NBC, AB, LR, and SVM. They used the Weka tool to preprocess the data. They discovered that a model combining LR and SVM is effective at predicting diabetes. They created a NN model with two hidden layers and varied epochs and found that the NN with two hidden layers gave 88.6% accuracy. ANN scored 88.57%, LR scored 78.85%, NBC scored 78.28%, and RFC scored 77.34%.

Sisodia et al .[ 15 ] used the PID data set available on the UCI repository. This data set contained 768 patients and 8 attributes. They employed three ML classifications to identify diabetic patients: DT, SVM, and NBC. NBC had the highest accuracy (76.30%) when compared to the other models.

Agarwal et al .[ 16 ] used the PID data set of 738 patients as well in their study. To analyze the effectiveness of this data set for identifying diabetic patients, the authors applied models such as SVM, k-NN, NBC, ID3, C4.5, and CART. The SVM and LDA algorithms were the most accurate, with an accuracy of 88%.

Rathore et al .[ 17 ] employed classification techniques like SVM and DTs to predict diabetes mellitus. The PID data set provided the data for this investigation. PIMA India prioritizes women’s health. The SVM has an accuracy of 82%.

To predict diabetes mellitus, Hassan et al .[ 18 ] employed classification approaches such as the DT, k-NN, and SVM. The SVM outperformed the DT and KNN methods with a maximum accuracy of 90.23%.

Kandhasamy and Balamurali[ 19 ] investigated the prediction accuracy of J48, k-NN, RFC, and SVM on the diabetes data set. Before preprocessing the data, the author discovered that the J48 method had a higher accuracy than others, at 73.82%. After preprocessing, k-NN and RFC demonstrated improved accuracy.

Meng et al .[ 20 ] examined J48, LR, and k-NN algorithms on the diabetes data set. J48 was found to be the most accurate, with a classification accuracy of 78.27%.

Nai-Arun and Moungmai[ 21 ] created a web application based on the prediction accuracy for diabetes prediction. They compared prediction methods such as DTs, NNs, LR, NBC, and RFC, as well as, bagging and boosting. They discovered that RFC performed best in terms of accuracy and ROC score, with an accuracy of 85.558% and an ROC value of 0.912.

Saravananathan and Velmurugan[ 22 ] looked at J48, CART, SVM, and k-NN on a medical data set in their research. They compared them based on accuracy, specificity, sensitivity, precision, and error rate. With a score of 67.15%, they discovered that J48 algorithms were the most accurate, followed by SVM (65.04%), CART (62.28%), and k-NN (53.39%).

Kumari and Chitra[ 23 ] used SVM, RFC, DT, MLP, and LR, as well as four k-fold cross-validations (k = 2,4,5,10) in their research. According to the researchers, MLP with four-fold cross-validation achieves the best accuracy, at 78.7%. They discovered that MLP outscored all other algorithms.

To predict diabetes, Kavakiotis et al .[ 24 ] employed NBC, RFC, k-NN, SVM, DT, and LR methods. The algorithms were applied using a ten-fold cross-validation technique. SVM had the best accuracy of all the approaches, measuring 84%, according to the study.

The work on the classification of “Diabetes Prediction” based on eight attributes was done by Rawat et al .[ 25 ] In this study, five ML algorithms for the analysis and prediction of diabetic patients were described: AdaBoost, LogicBoost, RobustBoost, naive Bayes, and bagging. A group of diabetic PIMA Indians was used to test the proposed strategies. The computed results were found to be quite accurate, with a classification accuracy of 81.77% and 79.69% for the bagging and AdaBoost techniques, respectively. As a result, the proposed DM prediction algorithms were particularly appealing, effective, and efficient.

Using disease classifiers and an actual data set, Nai-Arun and Moungmai[ 21 ] suggested a web application. The data for this component was collected from 30,122 people at Sawanpracharak Regional Hospital’s twenty-six primary care units between 2012 and 2013. To identify a predictive model, thirteen classification models were investigated before the web application was created. These models, except the RFC method, included the DT, NN, LR, NBC, and RFC algorithms, which all used a combination of bagging and boosting techniques. Each model’s accuracy and ROC curves were calculated and compared to others to see how robust they were. According to the findings, RFC won in both accuracy and ROC curve. This could be owing to a wide range of options. Not only were data and input factors chosen at random in the RFC approach, but crucial variables were also taken into account. As a result, the precision values rose. As a result, this algorithm was chosen to represent diabetes risk prediction and was employed in the development of the application.

Perveen et al .[ 26 ] used a data set from the Canadian Primary Care Sentinel Surveillance Network (CPCSSN) database to do their research. The study employed the AdaBoost and bagging ensemble techniques using the J48 (C4.5) DT as a base learner and standalone data mining methodology J48 to categorize patients with diabetes mellitus based on diabetes risk indicators. This categorization was done across three separate ordinal adult groups in the CPCSSN. In terms of overall performance, the AdaBoost ensemble method surpassed both bagging and a single J48 DT, according to the findings.

Mujumdar and Vaidehi[ 27 ] presented a diabetes prediction model for better diabetes classification that included a few extrinsic factors that caused diabetes, as well as regular components such as glucose, BMI, age, insulin, and so on. The new data set enhanced classification accuracy when compared to the old data set. Multiple ML approaches were used on the data set, and classification was done with a variety of algorithms, with LR yielding the highest accuracy at 96%. The AdaBoost classifier was found to be the most accurate, with a 98.8% accuracy rate. They used two separate data sets to compare the accuracy of ML techniques. When compared to the existing data set, it was clear that the model improved diabetes prediction accuracy and precision.

Mercaldo et al .[ 28 ] offered a strategy for classifying diabetic patients based on a set of features chosen according to the WHO criteria. Evaluating real-world data using state of the art machine learning algorithms. The model was trained using six alternative classification approaches, with the Hoeffding Tree method scoring 0.770 in precision and 0.775 in recall. They used data from the PIMA Indian community in Phoenix, Arizona, to evaluate the method.

Early detection of diabetes is critical for effective therapy. Many people have no idea whether or not they have it. The full assessment of machine learning approaches for early diabetes prediction and how to apply a variety of supervised and unsupervised machine learning algorithms to the data set to achieve the best accuracy are addressed in this paper. Furthermore, the work will be expanded and refined to create a more precise and general predictive model for diabetes risk prediction at an early stage. Different metrics can be used to assess performance and for accurate diabetic diagnosis.

Financial support and sponsorship

Conflicts of interest.

There are no conflicts of interest.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Early Prediction of Diabetes Using an Ensemble of Machine Learning Models

Affiliations.

  • 1 Department of Biomedical Engineering (BME), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh.
  • 2 Department of Biomedical Engineering (BME), Military Institute of Science and Technology (MIST), Mirpur Cantonment, Dhaka 1216, Bangladesh.
  • 3 Department of Electrical and Electronic Engineering (EEE), Khulna University of Engineering & Technology (KUET), Khulna 9203, Bangladesh.
  • 4 School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD 4072, Australia.
  • 5 Electronics and Communication Engineering (ECE) Discipline, Khulna University (KU), Khulna 9208, Bangladesh.
  • 6 Statistics Discipline, Khulna University (KU), Khulna 9208, Bangladesh.
  • 7 Department of Computer Science, College of Computers and Information Technology, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia.
  • PMID: 36231678
  • PMCID: PMC9566114
  • DOI: 10.3390/ijerph191912378

Diabetes is one of the most rapidly spreading diseases in the world, resulting in an array of significant complications, including cardiovascular disease, kidney failure, diabetic retinopathy, and neuropathy, among others, which contribute to an increase in morbidity and mortality rate. If diabetes is diagnosed at an early stage, its severity and underlying risk factors can be significantly reduced. However, there is a shortage of labeled data and the occurrence of outliers or data missingness in clinical datasets that are reliable and effective for diabetes prediction, making it a challenging endeavor. Therefore, we introduce a newly labeled diabetes dataset from a South Asian nation (Bangladesh). In addition, we suggest an automated classification pipeline that includes a weighted ensemble of machine learning (ML) classifiers: Naive Bayes (NB), Random Forest (RF), Decision Tree (DT), XGBoost (XGB), and LightGBM (LGB). Grid search hyperparameter optimization is employed to tune the critical hyperparameters of these ML models. Furthermore, missing value imputation, feature selection, and K-fold cross-validation are included in the framework design. A statistical analysis of variance (ANOVA) test reveals that the performance of diabetes prediction significantly improves when the proposed weighted ensemble (DT + RF + XGB + LGB) is executed with the introduced preprocessing, with the highest accuracy of 0.735 and an area under the ROC curve (AUC) of 0.832. In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data.

Keywords: South Asian diabetes dataset; artificial intelligence; diabetes prediction; ensemble ML classifier; filling missing value; outlier rejection.

PubMed Disclaimer

Conflict of interest statement

The authors announce that they have no known competing financial interest or personal relationships that could have appeared to affect the outcome documented in this article.

Block diagram of the proposed…

Block diagram of the proposed workflow incorporating various ML-based classifiers, a pre-processing step,…

AUC versus feature numbers (2–13)…

AUC versus feature numbers (2–13) in the submitted DDC dataset, considering four distinct…

Box and whisker plots of…

Box and whisker plots of AUC results acquired from 5-fold cross-validation on various…

Box and whisker plots of AUC results acquired from 5-fold cross-validation on different…

Similar articles

  • Prediction of diabetes disease using an ensemble of machine learning multi-classifier models. Abnoosian K, Farnoosh R, Behzadi MH. Abnoosian K, et al. BMC Bioinformatics. 2023 Sep 12;24(1):337. doi: 10.1186/s12859-023-05465-z. BMC Bioinformatics. 2023. PMID: 37697283 Free PMC article.
  • Prediction Performance of Feature Selectors and Classifiers on Highly Dimensional Transcriptomic Data for Prediction of Weight Loss in Filipino Americans at Risk for Type 2 Diabetes. Chang L, Fukuoka Y, Aouizerat BE, Zhang L, Flowers E. Chang L, et al. Biol Res Nurs. 2023 Jul;25(3):393-403. doi: 10.1177/10998004221147513. Epub 2023 Jan 4. Biol Res Nurs. 2023. PMID: 36600204 Review.
  • Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data. Jiang X, Xu C. Jiang X, et al. J Clin Med. 2022 Sep 29;11(19):5772. doi: 10.3390/jcm11195772. J Clin Med. 2022. PMID: 36233640 Free PMC article.
  • Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective. Olisah CC, Smith L, Smith M. Olisah CC, et al. Comput Methods Programs Biomed. 2022 Jun;220:106773. doi: 10.1016/j.cmpb.2022.106773. Epub 2022 Mar 31. Comput Methods Programs Biomed. 2022. PMID: 35429810
  • A Novel Diabetes Healthcare Disease Prediction Framework Using Machine Learning Techniques. Krishnamoorthi R, Joshi S, Almarzouki HZ, Shukla PK, Rizwan A, Kalpana C, Tiwari B. Krishnamoorthi R, et al. J Healthc Eng. 2022 Jan 11;2022:1684017. doi: 10.1155/2022/1684017. eCollection 2022. J Healthc Eng. 2022. Retraction in: J Healthc Eng. 2023 May 24;2023:9872970. doi: 10.1155/2023/9872970. PMID: 35070225 Free PMC article. Retracted. Review.
  • Developing a hypertension visualization risk prediction system utilizing machine learning and health check-up data. Du J, Chang X, Ye C, Zeng Y, Yang S, Wu S, Li L. Du J, et al. Sci Rep. 2023 Nov 2;13(1):18953. doi: 10.1038/s41598-023-46281-y. Sci Rep. 2023. PMID: 37919314 Free PMC article.
  • Lupus nephritis or not? A simple and clinically friendly machine learning pipeline to help diagnosis of lupus nephritis. Wang DC, Xu WD, Wang SN, Wang X, Leng W, Fu L, Liu XY, Qin Z, Huang AF. Wang DC, et al. Inflamm Res. 2023 Jun;72(6):1315-1324. doi: 10.1007/s00011-023-01755-7. Epub 2023 Jun 10. Inflamm Res. 2023. PMID: 37300586 Free PMC article.
  • Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi MZ. Afsaneh E, et al. Diabetol Metab Syndr. 2022 Dec 27;14(1):196. doi: 10.1186/s13098-022-00969-9. Diabetol Metab Syndr. 2022. PMID: 36572938 Free PMC article. Review.
  • Misra A., Gopalan H., Jayawardena R., Hills A.P., Soares M., Reza-Albarrán A.A., Ramaiya K.L. Diabetes in developing countries. J. Diabetes. 2019;11:522–539. doi: 10.1111/1753-0407.12913. - DOI - PubMed
  • American Diabetes Association Diagnosis and classification of diabetes mellitus. Diabetes Care. 2009;32:S62–S67. doi: 10.2337/dc09-S062. - DOI - PMC - PubMed
  • Fitzmaurice C., Allen C., Barber R.M., Barregard L., Bhutta Z.A., Brenner H., Dicker D.J., Chimed-Orchir O., Dandona R., Dandona L., et al. Global, regional, and national cancer incidence, mortality, years of life lost, years lived with disability, and disability-adjusted life-years for 32 cancer groups, 1990 to 2015: A systematic analysis for the global burden of disease study. JAMA Oncol. 2017;3:524–548. - PMC - PubMed
  • Saeedi P., Petersohn I., Salpea P., Malanda B., Karuranga S., Unwin N., Colagiuri S., Guariguata L., Motala A.A., Ogurtsova K., et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas. Diabetes Res. Clin. Pract. 2019;157:107843. doi: 10.1016/j.diabres.2019.107843. - DOI - PubMed
  • Bharath C., Saravanan N., Venkatalakshmi S. Assessment of knowledge related to diabetes mellitus among patients attending a dental college in Salem city-A cross sectional study. Braz. Dent. Sci. 2017;20:93–100.

Publication types

  • Search in MeSH

Related information

Grants and funding, linkout - more resources, full text sources.

  • Europe PubMed Central
  • PubMed Central
  • Genetic Alliance
  • MedlinePlus Health Information

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

MINI REVIEW article

Prediction of type-2 diabetes mellitus disease using machine learning classifiers and techniques.

\nB. Shamreen Ahamed

  • Department of Computer Science and Engineering, College of Engineering and Technology, SRM Institute of Science and Technology, Chennai, India

The technological advancements in today's healthcare sector have given rise to many innovations for disease prediction. Diabetes mellitus is one of the diseases that has been growing rapidly among people of different age groups; there are various reasons and causes involved. All these reasons are considered as different attributes for this study. To predict type-2 diabetes mellitus disease, various machine learning algorithms can be used. The objective of using the algorithm is to construct a predictive model to critically predict whether a person is affected by diabetes. The classifiers taken are logistic regression, XGBoost, gradient boosting, decision trees, ExtraTrees, random forest, and light gradient boosting machine (LGBM). The dataset used is PIMA Indian Dataset sourced from UC Irvine Repository. The performance of these algorithms is compared in reference to the accuracy obtained. The results obtained from these classifiers show that the LGBM classifier has the highest accuracy of 95.20% in comparison with the other algorithms.

Introduction

Diabetes mellitus (DM) is considered as a chronic disease that has been affecting people of all age groups. The exact cause of the disease is still unknown. However, some of the factors or causes include age, family history, other relative diseases, pregnancy, fluctuating glucose levels, blood pressure, etc. ( Dash et al., 2019 ). Diabetes is a disease that can be controlled under medication; however, a complete cure through medicines is not possible as of today. Diabetes can belong to one of the four broad categories, such as type-1, type-2, gestational diabetes, or prediabetes ( Nibareke and Laassiri, 2020 ). There are some sub-types classified under these four categories as well. “Type-1 diabetes” is also known as “insulin-dependent diabetes,” which occurs when the insulin release cell is damaged and unable to produce insulin ( Martinsson et al., 2020 ). In “type-2” diabetes, adequate amount of insulin is not produced in the body ( Wang et al., 2015 ). This commonly happens at an average above age of 40 years. The “gestational diabetes (GDM)” occurs mostly during pregnancy. The last one among the main four categories, “prediabetes,” occurs when the blood sugar level is higher than normal but not as high as type-2 diabetes ( Mujumdar and Vaidehi, 2019 ).

In the recent years, many researchers are using the concept of machine learning to predict the DM disease. Some of the commonly used algorithms include logistic regression (LR), XGBoost (XGB), gradient boosting (GB), decision trees (DTs), ExtraTrees, random forest (RF), and light gradient boosting machine (LGBM). Each classifier has its own advantages over the other classifiers ( Prabha et al., 2021 ). However, the classifier that gives the highest accuracy is determined in implementation.

This study is divided into different sections as follows: Section Related Works represents the related works in DM. Section Theoretical Concepts of the Classifiers determines the theoretical concepts of the various algorithms used. Section Results and Discussion determines the architecture and implementation of the classifiers. Section Conclusion and Future Work explains the conclusions and future works of the study.

Related Works

The following researchers have used the concept of machine learning for predicting DM disease.

Khaleel and Al-Bakry (2021) have created a model to detect whether a person is affected with DM disease. The concept of machine learning (ML) is used for the detection procedures. The PIMA dataset is used for the study. The algorithms used are LR, Naive bayes (NB), and K-nearest neighbour (KNN). The accuracy obtained are 94, 79, and 69% from these algorithms. The measures such as precision, recall, and F-measure are taken into consideration and LR is considered to produce the highest accuracy.

Ahmed et al. (2021) have used ML algorithms, namely, DT, KNN, NB, RF, GB, LR, and support vector machine (SVM) for predicting DM. Preprocessing techniques, such as label–encoding–normalization, are used to increase the accuracy. Two different datasets are used. One dataset provides the highest accuracy for SVM with 80.26% and for the second dataset, the highest accuracy is given by DT and RF with 96.81%.

Maniruzzaman et al. (2018) have used the ML technique based on risk-stratification is developed, optimized and evaluated. Features are optimized using six feature selection techniques. Then PIMA Indian diabetes dataset (PIDD) is used. The 10 different classifiers are used. Both RF selection and RF classification techniques yield an accuracy of 92.26%.

Kumari et al. (2021) have used two datasets including PIDD and breast cancer dataset, which were taken from the UC Irvine (UCI) Repository. Three ML classifiers are used for prediction. They are RF, LR, and Naive Bayes. The accuracy obtained is the highest for both datasets with a percentage of 79.08% for PIMA data and 97.27% for breast cancer data using soft voting classifier.

Tigga and Garg (2020) have developed a prediction model for DM disease. A dataset was collected for the study consisting of 952 instances and 18 attributes. The PIMA dataset was also used. The machine learning classifiers used are RF, LR, KNN, SVM, NB, and DT. The accuracy obtained was the highest for RF with a percentage of 94.10% for collected data and 75% for PIMA dataset.

Diwani and Sam (2014) have developed a prediction model using 10-fold-cross-validation on the training and testing data. The Waikato environment for knowledge analysis tool has been used along with Naive Bayes and DTs algorithm. The accuracy obtained is the highest for Naive Bayes with 76.30%.

Butt et al. (2021) have proposed a machine learning based approach for early-stage identification, classification, and prediction of diabetes disease. The PIMA Indian dataset has been used. The classifiers used are RF, multilayer perceptron (MLP) and LR. The accuracy obtained is highest for MLP with 87.26%.

Theoretical Concepts of the Classifiers

The various classifiers that are used is explained in the following sub-sections.

Logistic Regression

It is a statistics-based model that uses logical function to develop a binary-dependent variable. The relationship between dependent and independent variables is estimated based on probabilities ( Diwani and Sam, 2014 ). The dependent variable is categorical in this method. Mathematically it is expressed as follows ( Kaur and Chhabra, 2014 ):

The probability that Y = 1 given X which is given as “ theta ”

The XGBoost

It is the implementation of gradient boosted DTs that are created sequentially. An important feature is its weights. Each individual variable is assigned a particular weight that are given to the DTs to obtain the results ( Butt et al., 2021 ). The prediction scores of each individual DT is given by

where the number of trees is denoted by k , the functional space is given as f , and the possible set available is given as F ( Patil et al., 2019 ).

Gradient Boosting

Many weak learners are combined into a predictive model typically in the form of DTs ( Sehly and Mezher, 2020 ). It is mainly used when we want to decrease the bias error. A gradient-descent technique is chosen to obtain values of the coefficients ( Posonia et al., 2020 ).

The loss function used is ( y 1 − y 1′) 2 . y 1 is the actual value and y 1′ is the final predicted value by this model. So y 1′ is replaced with G n ( X ), which represents the actual target ( Ke et al., 2017 ). It is mathematically expressed as follows:

Decision Trees

It is a supervised-learning algorithm ( Islam et al., 2020 ). It works with categorical and continuous input and output variables. It is used to represent whether it belongs to classification or regression procedures ( Chen and Guestrin, 2016 ). The types of DTs are as follows: ID3, ID 4.5, CART, and CHAID. The measures used on DT are as follows: Entropy, Gini index, and standard deviation ( Khanam and Foo, 2021 ). It is mathematically calculated as follows ( Ambigavathi and Sridharan, 2018 ):

Extra Trees

Extra trees (ETs) are also called as “extremely randomized trees classifier.” It is a type of “ensemble learning technique” which combines many decorrelated DTs to result as a single tree classification ( Chen et al., 2017 ). It differs from RF in a way in which DTs are built. The entropy is calculated as follows:

where the number of unique class labels is given as c 1, the proportion of rows with output label is given as p i 1 ( Sisodia and Sisodia, 2018 ).

Then the “information gain” is calculated using the following formula ( Ke et al., 2017 ):

Random Forest

The RF combines the output of multiple DT to reach a single result. The DT is taken as a base and row sampling as well as column sampling. The number of base learners is increased and the variance is decreased or vice versa . For cross-validation, K can be used. It is considered as an important bagging method ( Mamuda and Sathasivam, 2017 ).

Random Forest =DT (base learner) + bagging (Row sampling with replacement) + feature

bagging (column sampling) + aggregation

(mean/median, majority vote)

Light Gradient Boosting Machine

The performance of LGBM is considered to be high-performance and is represented as “GB framework” based on DT algorithm ( Ahamed and Arya, 2021 ). It is majorly used for classifying and ranking. It splits the tree leaf-wise with best-fit. It can be measured using the data improvement technique and can be given by calculating the variance after segregating ( Zhu et al., 2020 ). It can be represented as follows:

System Architecture

The data needed for the study are initially collected and stored in the database. The dataset PIMA is taken from UCI Repository for execution. The dataset is then pre-processed using different exploratory data analysis techniques. The dataset is divided into “training data” and “testing data.” The various algorithms mentioned are then compared and the best working algorithm producing the highest accuracy is taken as the best predictive model for predicting DM disease. The architectural structure depicted in Figure 1 .

www.frontiersin.org

Figure 1 . Architectural design.

Results and Discussion

The results and accuracy percentage calculated are given in the form of a table ( Table 1 ).

www.frontiersin.org

Table 1 . Accuracy percentage.

The algorithms considered are LR, XGB, GB, DT, ET, RF, and LGBM. The accuracy obtained is the highest for LGBM with 95.2%.

Conclusion and Future Work

These discussions here were considered and we identified that “LGBM algorithm” worked best for the dataset taken by producing an accuracy that was higher in comparisons with the other algorithms. However, in future, different dataset can be taken and compared with the different classifiers to classify which algorithm can produce the best result. Also, the parameters using in LGBM can be further finetuned and an advanced LGBM algorithm can be used and the prediction accuracy percentage can be increased.

Author Contributions

BA and MA: material preparation, data collection, analysis, resources, and writing—review and editing. BA: first draft of the manuscript and investigation. MA: conceptualization, supervision, and visualization. AN: coding and idea of research. All authors contributed to the study conception and design, involved in the idea for the article, performed the literature search, data analysis, drafted, and critically revised the work.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Ahamed, B. S., and Arya, M. S. (2021). Prediction of Type-2 diabetes using the LGBM classifier methods and techniques. Turk. J. Comput. Math. Educ . 12, 223–231. Available online at: https://www.proquest.com/docview/2622815314

Google Scholar

Ahmed, N., Ahammed, R., Islam, M. M., Uddin, M. A., Akhter, A., Talukder, M. A., et al. (2021). Machine learning based diabetes prediction and development of smart web application. Int. J. Cogn. Comp. Eng . 2, 229–241. doi: 10.1016/j.ijcce.2021.12.001

PubMed Abstract | CrossRef Full Text | Google Scholar

Ambigavathi, M., and Sridharan, D. (2018). “Big data analytics in healthcare,” in IEEE Tenth International Conference on Advanced Computing (ICoAC) , 269–276. doi: 10.1109/ICoAC44903.2018.8939061

Butt, U. M., Letchmunan, S., Ali, M., Hassan, F. H., Baqir, A., and Sherazi, H. H. (2021). Machine learning based diabetes classification and prediction for healthcare applications. J. Healthc. Eng . 2021, 9930985. doi: 10.1155/2021/9930985

Chen, T., and Guestrin, C. (2016). “XGBoost: a scalable tree boosting system,” in KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 785–794. doi: 10.1145/2939672.2939785

Chen, W., Chen, S., Zhang, J. H., and Wu, T. (2017). “A hybrid prediction model for type 2 diabetes using K-means and decision tree,” in 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS) , 386–390. doi: 10.1109/ICSESS.2017.8342938

CrossRef Full Text | Google Scholar

Dash, S., Shakyawar, S. K., Sharma, M., and Kaushik, S. (2019). Big data in healthcare: management, analysis and future prospects. J. Big Data 6, 54. doi: 10.1186/s40537-019-0217-0

Diwani, S. A., and Sam, A. (2014). Diabetes forecasting using supervised learning techniques. Adv. Comp. Sci. Int. J . 3, 10–18. Available online at: http://www.acsij.org/acsij/article/view/156

Islam, M. S., Qaraqe, M. K., Abbas, H. T., Erraguntla, M., and Abdul-Ghani, M. (2020). “The prediction of diabetes development: a machine learning framework,” in 2020 IEEE 5th Middle East and Africa Conference on Biomedical Engineering, MECBME 2020 (IEEE Computer Society). doi: 10.1109/MECBME47393.2020.9292043

Kaur, G., and Chhabra, A. (2014). Improved J48 classification algorithm for the prediction of diabetes. Int. J. Comp. Appli . 98, 13–17. doi: 10.5120/17314-7433

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., et al. (2017). “LightGBM: a highly effificient gradient boosting decision tree,” in NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems , 3149–3157.

Khaleel, F. A., and Al-Bakry, A. M. (2021). Diagnosis of diabetes using machine learning algorithms. Mater. Today Proc . doi: 10.1016/j.matpr.2021.07.196

Khanam, J. J., and Foo, S. Y. (2021). A comparison of machine learning algorithms for diabetes prediction. ICT Exp. 7, 432–439. doi: 10.1016/j.icte.2021.02.004

Kumari, S., Kumar, D., and Mittal, M. (2021). An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int. J. Cogn. Comp. Eng . 2, 40–46. doi: 10.1016/j.ijcce.2021.01.001

Mamuda, M., and Sathasivam, S. (2017). “Predicting the survival of diabetes using neural network,” in Proceedings of the AIP Conference Proceedings (Bydgoszcz), 40–46. doi: 10.1063/1.4995878

Maniruzzaman, M., Rahman, M., Al-MehediHasan, M., Suri, H. S., Abedin, M., El-Baz, A., et al. (2018). Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J. Med. Syst . 42, 92. doi: 10.1007/s10916-018-0940-7

Martinsson, J., Schliep, A., Eliasson, B., and Mogren, O. (2020). Blood glucose prediction with variance estimation using recurrent neural networks. J. Healthc. Inform. Res. 4, 1–18. doi: 10.1007/s41666-019-00059-y

Mujumdar, A., and Vaidehi, V. (2019). Diabetes prediction using machine learning algorithms. Proc. Comp. Sci. 165, 292–299. doi: 10.1016/j.procs.2020.01.047

Nibareke, T., and Laassiri, J. (2020). Using big data-machine learning models for diabetes prediction and flight delays analytics. J. Big Data 7, 78. doi: 10.1186/s40537-020-00355-0

Patil, M. K., Sawarkar, S. D., and Narwane, M. S. (2019). Designing a model to detect diabetes using machine learning. Int. J. Eng. Res. Technol . 8, 333–340. Available online at: https://www.ijert.org/designing-a-model-to-detect-diabetes-using-machine-learning

Posonia, A. M., Vigneshwari, S., and Rani, D. J. (2020). “Machine learning based diabetes prediction using decision tree J48,” in 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS) , 498–502. doi: 10.1109/ICISS49785.2020.9316001

Prabha, A., Yadav, J., Rani, A., and Singh, V. (2021). Design of intelligent diabetes mellitus detection system using hybrid feature selection based XGBoost classifier. Comp. Biol. Med . 136, 104664. doi: 10.1016/j.compbiomed.2021.104664

Sehly, R., and Mezher, M. (2020). “Comparative analysis of classification models for pima dataset,” in International Conference on Computing and Information Technology (ICCIT-1441) , 1–5. doi: 10.1109/ICCIT-144147971.2020.9213821

Sisodia, D., and Sisodia, D. S. (2018). Prediction of diabetes using classification algorithms. Proc. Comp. Sci . 132, 1578–1585. doi: 10.1016/j.procs.2018.05.122

Tigga, N. P., and Garg, S. (2020). Prediction of type 2 diabetes using machine learning classification methods. Proc. Comp. Sci . 167, 706–716. doi: 10.1016/j.procs.2020.03.336

Wang, F., Stiglic, G., Obradovic, Z., and Davidson, I. (2015). Guest editorial: special issue on data mining for medicine and healthcare. Data Min. Knowl. Disc . 29, 867–870. doi: 10.1007/s10618-015-0414-1

Zhu, T., Li, K., Chen, J., Herrero, P., and Georgiou, P. (2020). Dilated recurrent neural networks for glucose forecasting in type 1 diabetes. J. Healthc. Inform. Res . 4, 308–324. doi: 10.1007/s41666-020-00068-2

Keywords: prediction, machine learning, classifiers, accuracy, comparison

Citation: Ahamed BS, Arya MS and Nancy V AO (2022) Prediction of Type-2 Diabetes Mellitus Disease Using Machine Learning Classifiers and Techniques. Front. Comput. Sci. 4:835242. doi: 10.3389/fcomp.2022.835242

Received: 14 December 2021; Accepted: 08 April 2022; Published: 10 May 2022.

Reviewed by:

Copyright © 2022 Ahamed, Arya and Nancy V. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: B. Shamreen Ahamed, shamu1502@gmail.com

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Accessibility Links

  • Skip to content
  • Skip to search IOPscience
  • Skip to Journals list
  • Accessibility help
  • Accessibility Help

Click here to close this panel.

Purpose-led Publishing is a coalition of three not-for-profit publishers in the field of physical sciences: AIP Publishing, the American Physical Society and IOP Publishing.

Together, as publishers that will always put purpose above profit, we have defined a set of industry standards that underpin high-quality, ethical scholarly communications.

We are proudly declaring that science is our only shareholder.

Research on Diabetes Prediction Method Based on Machine Learning

Jingyu Xue 1 , Fanchao Min 1 and Fengying Ma 1

Published under licence by IOP Publishing Ltd Journal of Physics: Conference Series , Volume 1684 , The 2020 International Seminar on Artificial Intelligence, Networking and Information Technology 18-20 September 2020, Shanghai, China Citation Jingyu Xue et al 2020 J. Phys.: Conf. Ser. 1684 012062 DOI 10.1088/1742-6596/1684/1/012062

Article metrics

7342 Total downloads

Share this article

Author e-mails.

[email protected]

e-mail: [email protected]

[email protected]

Author affiliations

1 Qilu University of Technology, Jinan, Shandong, China

Buy this article in print

Diabetes mellitus (DM) is a metabolic disease characterized by high blood sugar. The main clinical types are type 1 diabetes and type 2 diabetes. Now, the proportion of young people suffering from type 1 diabetes has increased significantly. Type 1 diabetes is chronic when it occurs in childhood and adolescence, and has a long incubation period. The early symptoms of the onset are not obvious, which may lead to failure to detect in time and delay treatment. Long-term high blood sugar can cause chronic damage and dysfunction of various tissues, especially eyes, kidneys, heart, blood vessels and nerves. Therefore, the early prediction of diabetes is particularly important. In this paper, we use supervised machine-learning algorithms like Support Vector Machine (SVM), Naive Bayes classifier and LightGBM to train on the actual data of 520 diabetic patients and potential diabetic patients aged 16 to 90. Through comparative analysis of classification and recognition accuracy, the performance of support vector machine is the best.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence . Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Analysis and Prediction of Diabetes Using Machine Learning

International Journal of Emerging Technology and Innovative Engineering, Volume 5, Issue 4, April 2019

9 Pages Posted: 23 Apr 2019

Sri Krishna College of Technology, Students

S. Subashree

Date Written: April 2, 2019

Healthcare industry contains very large and sensitive data and needs to be handled very carefully. Diabetes Mellitus is one of the growing extremely fatal diseases all over the world. Medical professionals want a reliable prediction system to diagnose Diabetes. Different machine learning techniques are useful for examining the data from diverse perspectives and synopsizing it into valuable information. The accessibility and availability of huge amounts of data will be able to provide us useful knowledge if certain data mining techniques are applied to it. The main goal is to determine new patterns and then to interpret these patterns to deliver significant and useful information for the users. Diabetes contributes to heart disease, kidney disease, nerve damage, and blindness. Mining the diabetes data in an efficient way is a crucial concern. The data mining techniques and methods will be discovered to find the appropriate approaches and techniques for efficient classification of Diabetes dataset and in extracting valuable patterns. In this study, medical bioinformatics analyses have been accomplished to predict diabetes. The WEKA software was employed as a mining tool for diagnosing diabetes. The Pima Indian diabetes database was acquired from UCI repository used for analysis. The dataset was studied and analyzed to build an effective model that predicts and diagnoses diabetes disease. In this study, we aim to apply the bootstrapping resampling technique to enhance the accuracy and then applying Naïve Bayes, Decision Trees and (KNN) and compare their performance.

Keywords: Healthcare, Diabetes, Classification, K-nearest neighbors, Decision Trees, Naive Bayes

Suggested Citation: Suggested Citation

S. Saru (Contact Author)

Sri krishna college of technology, students ( email ).

Coimbatore, Tamil Nadu 641042 India

Do you have a job opening that you would like to promote on SSRN?

Paper statistics, related ejournals, econometrics: econometric & statistical methods - special topics ejournal.

Subscribe to this fee journal for more curated articles on this topic

Artificial Intelligence eJournal

Applied computing ejournal, computation theory ejournal, computer science education ejournal, electrical engineering ejournal, computer science negative results ejournal, bioengineering ejournal.

  • DOI: 10.1038/s41598-024-64048-x
  • Corpus ID: 270391147

Development and validation of a machine learning-based readmission risk prediction model for non-ST elevation myocardial infarction patients after percutaneous coronary intervention

  • Yanxu Liu , Linqin Du , +14 authors Rongchuan Yue
  • Published in Scientific Reports 11 June 2024

27 References

Leveraging machine learning techniques to forecast patient prognosis after percutaneous coronary intervention., machine learning enhances the performance of short and long-term mortality prediction model in non-st-segment elevation myocardial infarction, risk prediction for 30-day heart failure-specific readmission or death after discharge: data from the korean acute heart failure (korahf) registry., machine learning prediction models for in-hospital mortality after transcatheter aortic valve replacement., prognostic impact of b-type natriuretic peptide on long-term clinical outcomes in patients with non-st-segment elevation acute myocardial infarction without creatine kinase elevation., prognostic impact of admission high-sensitivity c-reactive protein in acute myocardial infarction patients with and without diabetes mellitus, thirty-day readmissions after chronic total occlusion percutaneous coronary intervention in the united states: insights from the nationwide readmissions database., predictive model for heart failure readmission using nationwide readmissions database, prediction of unplanned 30-day readmission for icu patients with heart failure, length of stay and risk of very early readmission in acute heart failure., related papers.

Showing 1 through 3 of 0 Related Papers

Comparative Analysis of Supervised Machine Learning Algorithms for COVID-19 Prediction

  • Rubina Shaheen Department of computer engineering, University of Engineering and Technology, Lahore
  • Beenish Akram University of Engineering and Technology Lahore
  • Amna Zafar Department of computer science, University of Engineering and Technology, Lahore
  • Talha Waheed Department of computer science, University of Engineering and Technology, Lahore

With the emergence of COVID-19 as an unprecedented pandemic, the health structure of both the developed and underdeveloped world not only seemed stranded but terrible. The human interface was faced with the dilemma of infection causing the health workers fall prey to the disease while identifying the presence of the disease among the patients. Given the nature of the disease, it is needed to mitigate the effects of spread by resorting to technological advancements for diagnosis of the disorder using machine learning algorithms. In this paper, three supervised machine learning algorithms; Decision Tree, Naïve Bayes, and Logistic Regression have been utilized for the prediction of the disease encompassing nine attributes considering various combinations of symptoms. A comparative analysis of the algorithms used revealed that Decision Trees with 99% accuracy and 98% precision, rendered it the most viable and accurate technique for the diagnosis of COVID-19 disease.

Assaf, D., Gutman, Y. A., Neuman, Y., Segal, G., Amit, S., Gefen Halevi, S., ... & Tirosh, A. (2020). Utilization of machine-learning models to accurately predict the risk for critical COVID-19. Internal and emergency medicine, 15, 1435-1443.

Islam, M. M., Karray, F., Alhajj, R., & Zeng, J. (2021). A review on deep learning techniques for the diagnosis of novel coronavirus (COVID-19). IEEE Access, 9, 30551-30572.

Chamola, V., Hassija, V., Gupta, V., & Guizani, M. (2020). A comprehensive review of the COVID-19 pandemic and the role of IoT, drones, AI, blockchain, and 5G in managing its impact. IEEE

Access, 8, 90225-90265.

Worldometer, C. U. (Feb 25, 2024). Cases and Deaths from Covid19 virus pandemic.

Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., ... & Cao, B. (2020). Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. The Lancet, 395(10223), 497-506.

Vetter, P., Vu, D. L., L’Huillier, A. G., Schibler, M., Kaiser, L., & Jacquerioz, F. (2020). Clinical features of covid-19. Bmj, 369.

Shi, F., Wang, J., Shi, J., Wu, Z., Wang, Q., Tang, Z., ... & Shen, D. (2020). Review of artificial intelligence techniques in imaging data acquisition, segmentation, and diagnosis for COVID-19. IEEE reviews in biomedical engineering, 14, 4-15.

McCall, B. (2020). COVID-19 and artificial intelligence: protecting health-care workers and curbing the spread. The Lancet Digital Health, 2(4), e166-e167.

Vaishya, R., Javaid, M., Khan, I. H., & Haleem, A. (2020). Artificial Intelligence (AI) applications for COVID-19 pandemic. Diabetes & Metabolic Syndrome: Clinical Research & Reviews, 14(4), 337-339.

Zoabi, Y., Deri-Rozov, S., & Shomron, N. (2021). Machine learning-based prediction of COVID-19 diagnosis based on symptoms. npj digital medicine, 4(1), 1-5.

Villavicencio, C. N., Macrohon, J. J. E., Inbaraj, X. A., Jeng, J. H., & Hsieh, J. G. (2021). Covid-19 prediction applying supervised machine learning algorithms with comparative analysis using weka. Algorithms, 14(7), 201.

Prakash, K. B., Imambi, S. S., Ismail, M., Kumar, T. P., & Pawan, Y. N. (2020). Analysis, prediction and evaluation of covid-19 datasets using machine learning algorithms. International Journal, 8(5), 2199-2204.

Khanday, A. M. U. D., Rabani, S. T., Khan, Q. R., Rouf, N., & Mohi Ud Din, M. (2020). Machine learning based approaches for detecting COVID-19 using clinical text data. International Journal

of Information Technology, 12, 731-739.

Aktar, S., Ahamad, M. M., Rashed-Al-Mahfuz, M., Azad, A. K. M., Uddin, S., Kamal, A. H. M., ... & Moni, M. A. (2021). Machine learning approach to predicting COVID-19 disease severity based on clinical blood test data: statistical analysis and model development. JMIR medical informatics, 9(4), e25884.

Menni, C., Valdes, A. M., Freidin, M. B., Sudre, C. H., Nguyen, L. H., Drew, D. A., ... & Spector, T. D. (2020). Real-time tracking of self-reported symptoms to predict potential COVID-19. Nature

medicine, 26(7), 1037-1040.

Tayarani, M. (2020). Applications of artificial intelligence in battling against covid-19: A literature review. Chaos, Solitons and Fractals, 110338.

Sultana, J., Singha, A. K., Siddiqui, S. T., Nagalaxmi, G., Sriram, A. K., & Pathak, N. (2022). COVID-19 Pandemic Prediction and Forecasting Using Machine Learning Classifiers. Intelligent Automation & Soft Computing, 32(2).

Moulaei, K., Shanbehzadeh, M., Mohammadi-Taghiabad, Z., & Kazemi-Arpanahi, H. (2022). Comparing machine learning algorithms for predicting COVID-19 mortality. BMC medical

informatics and decision making, 22(1), 2.

Al-Waisy, A. S., Al-Fahdawi, S., Mohammed, M. A., Abdulkareem, K. H., Mostafa, S. A., Maashi, M. S., ... & Garcia-Zapirain, B. (2023). COVID-CheXNet: hybrid deep learning framework for

identifying COVID-19 virus in chest X-rays images. Soft computing, 27(5), 2657-2672.

Przystalski, K., & Thanki, R. M. (2023). Explainable Machine Learning in Medicine. Springer Nature.

Jaiswal, J. K., & Samikannu, R. (2017, February). Application of random forest algorithm on feature subset selection and classification and regression. In 2017 world congress on computing and communication technologies (WCCCT) (pp. 65-68). IEEE

How to Cite

  • Endnote/Zotero/Mendeley (RIS)
  • Computer Science and Information Technology

Copyright (c) 2024 Rubina Shaheen, Beenish Akram, Amna Zafar, Talha Waheed (Author)

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License .

Make a Submission

1.How to create OJS Account

2.How to submit the article in SSURJET

3.How to Upload Revised paper

4.Format of the SSURJET article at the time of submission.

5.How to add Co-Authors.

diabetes prediction using machine learning research paper 2022

  • IoT based Fire Alerting Smart System 119
  • A Technical Review of MPPT Algorithms for Solar Photovoltaic System: SWOT Analysis of MPPT Algorithms 111
  • Labview Based Automated External Defibrillator 104
  • Comparative Analysis of Anomaly Detection Techniques Using Generative Adversarial Network 103
  • Impact of Air Quality on Ecological Sustainability of Old Commercial Hub at Faisalabad 99

Visitor Counter

web hit counter

Sir Syed University Research Journal of Engineering & Technology, Karachi, 75300, Pakistan. 

More information about the publishing system, Platform and Workflow by OJS/PKP.

IDBNWP: Improved deep belief network for workload prediction: Hybrid optimization for load balancing in cloud system

  • Published: 24 June 2024

Cite this article

diabetes prediction using machine learning research paper 2022

  • A. Ajil 1 , 2 &
  • E. Saravana Kumar 1  

The achievement of cloud environment is determined by the efficiency of its load balancing with proper allocation of resources. The proactive forecasting of future workload, accompanied by the allocation of resources, has emerged as a primary method for addressing other inbuilt problems, such as the underneath or over utilization of physical machines, resource wastage, VM migration, Quality-of-Services (QoS) violations, load balancing, and so on. In this paper, we have introduced a novel workload prediction and load balancing approach which includes two major phases like workload prediction with deep learning and optimal load balancing. In the initial workload prediction stage, we have proposed an Improved Deep Belief Network (IDBN), which efficiently predict the load as under load, overload or equally balanced. Afterwards, the load gets balanced by the utilization of the hybrid optimization named Bald Eagle Assisted Butterfly Optimization Algorithm (BEABOA), which consider the constraints like makespan (70), communication cost (4000), response time (1.1), turnaround time (2), migration cost (0.1) during the process of optimal load balancing. Also, the outcomes demonstrate that this proposed workload prediction and load balancing approach can offer superior outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

diabetes prediction using machine learning research paper 2022

Data availability

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study

Prassanna J, Venkataraman N (2021) Adaptive regressive holt–winters workload prediction and firefly optimized lottery scheduling for load balancing in cloud. Wireless Netw 27(8):5597–5615

Article   Google Scholar  

Jena UK, Das PK, Kabat MR (2020) Hybridization of meta-heuristic algorithm for load balancing in cloud computing environment. J King Saud Univ-Comput Inf Sci

Gao X, Liu R, Kaushik A (2020) Hierarchical multi-agent optimization for resource allocation in cloud computing. IEEE Trans Parallel Distrib Syst 32(3):692–707

Kaur H, Anand A (2022) Review and analysis of secure energy efficient resource optimization approaches for virtual machine migration in cloud computing. Measurement: Sens 100504

Katal A, Dahiya S, Choudhury T (2022) Energy efficiency in cloud computing data centers: a survey on software technologies. Clust Comput 1–31

Venkata Subramanian N, Shankar Sriram VS (2022) An effective secured dynamic network-aware multi-objective cuckoo search optimization for live VM migration in sustainable data centers. Sustainability 14(20):13670

Shafiq DA, Jhanjhi NZ, Abdullah A, Alzain MA (2021) A load balancing algorithm for the data centres to optimize cloud computing applications. IEEE Access 9:41731–41744

Yadav AK, Bharti RK, Raw RS (2021) SA 2 -MCD: secured architecture for allocation of virtual machine in multitenant cloud databases. Big Data Res 24:100187

Kazeem Moses A, Joseph Bamidele A, Roseline Oluwaseun O, Misra S, Abidemi Emmanuel A (2021) Applicability of MMRR load balancing algorithm in cloud computing. Int J Comput Math: Comput Syst Theor 6(1):7–20

MathSciNet   Google Scholar  

Shafiq DA, Jhanjhi NZ, Abdullah A (2021) Load balancing techniques in cloud computing environment: A review. J King Saud Univ-Comput Inf Sci

Mishra SK, Sahoo B, Parida PP (2020) Load balancing in cloud computing: a big picture. J King Saud Univ-Comput Inf Sci 32(2):149–158

Google Scholar  

Chiang ML, Cheng HS, Liu HY, Chiang CY (2021) SDN-based server clusters with dynamic load balancing and performance improvement. Clust Comput 24(1):537–558

Sefati S, Mousavinasab M, Zareh Farkhady R (2022) Load balancing in cloud computing environment using the Grey wolf optimization algorithm based on the reliability: performance evaluation. J Supercomput 78(1):18–42

Shirvani MH (2020) A hybrid meta-heuristic algorithm for scientific workflow scheduling in heterogeneous distributed computing systems. Eng Appl Artif Intell 90:103501

Fasihi M, Tavakkoli-Moghaddam R, Najafi SE, Hajiaghaei M (2021) Optimizing a bi-objective multi-period fish closed-loop supply chain network design by three multi-objective meta-heuristic algorithms. Sci Iran

Dehghan-Sanej K, Eghbali-Zarch M, Tavakkoli-Moghaddam R, Sajadi SM, Sadjadi SJ (2021) Solving a new robust reverse job shop scheduling problem by meta-heuristic algorithms. Eng Appl Artif Intell 101:104207

Parvizi E, Rezvani MH (2020) Utilization-aware energy-efficient virtual machine placement in cloud networks using NSGA-III meta-heuristic approach. Clust Comput 23(4):2945–2967

Barthwal V, Rauthan MMS (2021) AntPu: a meta-heuristic approach for energy-efficient and SLA aware management of virtual machines in cloud computing. Memetic Comput 13(1):91–110

Haris M, Zubair S (2021) Mantaray modified multi-objective Harris hawk optimization algorithm expedites optimal load balancing in cloud computing. J King Saud Univ-Comput Inf Sci

Annie Poornima Princess G, Radhamani AS (2021) A hybrid meta-heuristic for optimal load balancing in cloud computing. J Grid Comput 19(2):1–22

Kumar J, Singh AK, Buyya R (2021) Self directed learning based workload forecasting model for cloud resource management. Inf Sci 543:345–366

Amekraz Z, Hadi MY (2022) CANFIS: A chaos adaptive neural fuzzy inference system for workload prediction in the cloud. IEEE Access 10:49808–49828

Singh AK, Saxena D, Kumar J, Gupta V (2021) A quantum approach towards the adaptive prediction of cloud workloads. IEEE Trans Parallel Distrib Syst 32(12):2893–2905

Kumar J, Saxena D, Singh AK, Mohan A (2020) Biphase adaptive learning-based neural network model for cloud datacenter workload forecasting. Soft Comput 24(19):14593–14610

Jeddi S, Sharifian S (2020) A hybrid wavelet decomposer and GMDH-ELM ensemble model for Network function virtualization workload forecasting in cloud computing. Appl Soft Comput 88:105940

Banerjee S, Roy S, Khatua S (2021) Efficient resource utilization using multi-step-ahead workload prediction technique in cloud. J Supercomput 77(9):10636–10663

Singh LK, PoojaGarg H, Khanna M (2022) Deep learning system applicability for rapid glaucoma prediction from fundus images across various data sets. Evolving Syst 13(6):807–836

Singh LK, Khanna M, Thawkar S, Singh R (2024) Deep-learning based system for effective and automatic blood vessel segmentation from Retinal fundus images. Multimed Tools Appl 83(2):6005–6049

Khanna M, Singh LK, Thawkar S, Goyal M (2023) Deep learning based computer-aided automatic prediction and grading system for diabetic retinopathy. Multimed Tools Appl 82(25):39255–39302

Khanna M, Singh LK, Thawkar S, Goyal M (2024) PlaNet: a robust deep convolutional neural network model for plant leaves disease recognition. Multimed Tools Appl 83(2):4465–4517

Pradeep J, Raja Ratna S, Dhal PK, DayaSagar KV, Ranjit PS, Rastogi RVK, Rajaram A (2024) DeepFore: A deep reinforcement learning approach for power forecasting in renewable energy systems. Electr Power Components Syst 21:1–17

Xu, Minxian, Chenghao Song, Huaming Wu, Sukhpal Singh Gill, Kejiang Ye, and Chengzhong Xu. "esDNN: deep neural network based multivariate workload prediction in cloud computing environments."  ACM Transactions on Internet Technology (TOIT)  22, no. 3 (2022): 1-24.

Chiranjeevi, Phaneendra, and A. Rajaram. "A lightweight deep learning model based recommender system by sentiment analysis."  Journal of Intelligent & Fuzzy Systems  Preprint (2023): 1-14.

Ruan, Li, Yu Bai, Shaoning Li, Shuibing He, and Limin Xiao. "Workload time series prediction in storage systems: a deep learning based approach."  Cluster Computing  (2023): 1-11.

Bi J, Li S, Yuan H, Zhou MengChu (2021) Integrated deep learning method for workload and resource prediction in cloud systems. Neurocomputing 424:35–48

Babu PA, Rai AK, Ramesh JVN, Nithyasri A, Sangeetha S, Kshirsagar PR, Rajendran A, Rajaram A, Dilipkumar S (2024) An explainable deep learning approach for oral cancer detection. J Electr Eng Technol 19(3):1837–1848

Zekrifa DMS, Lamani D, Chaitanya GK, Kanimozhi KV, Saraswat A, Sugumar D, Vetrithangam D, Koshariya AK, Manjunath MS, Rajaram A (2024) Advanced deep learning approach for enhancing crop disease detection in agriculture using hyperspectral imaging. J Intell Fuzzy Syst Prepr 1–14

Maguluri LP, Chouhan K, Balamurali R, Rani R, Hashmi A, Kiran A, Rajaram A (2024) Adversarial deep learning for improved abdominal organ segmentation in CT scans. Multimed Tools Appl 12:1–23

Saxena D, Singh AK (2022) Auto-adaptive learning-based workload forecasting in dynamic cloud environment. Int J Comput Appl 44(6):541–551

https://www.analyticsvidhya.com/blog/2022/03/an-overview-of-deep-belief-network-dbn-in-deep-learning/

Qiu F, Zhang B, Guo J (2016) A deep learning approach for VM workload prediction in the cloud. In: 2016 17th IEEE/ACIS international conference on software engineering, Artificial intelligence, networking and parallel/distributed computing (SNPD). IEEE, pp 319–324

Arora S, Singh S (2019) Butterfly optimization algorithm: a novel approach for global optimization. Soft Comput 23(3):715–734

Alsattar HA, Zaidan AA, Zaidan BB (2020) Novel meta-heuristic bald eagle search optimisation algorithm. Artif Intell Rev 53(3):2237–2264

https://research.google/tools/datasets/google-cluster-workload-traces-2019/

Balaji K, Kiran PS, Kumar MS (2021) An energy efficient load balancing on cloud computing using adaptive cat swarm optimization

Download references

Acknowledgement

There is no acknowledgement involved in this work.

No funding is involved in this work.

Author information

Authors and affiliations.

Department of Computer Science and Engineering, The Oxford College of Engineering, Visvesvaraya Technological University, Belagavi, Karnataka, India

A. Ajil & E. Saravana Kumar

School of Computer Science and Engineering, REVA University, Bengaluru, Karnataka, India

You can also search for this author in PubMed   Google Scholar

Contributions

All authors are contributed equally to this work

Corresponding author

Correspondence to A. Ajil .

Ethics declarations

Ethics approval and consent to participate:.

No participation of humans takes place in this implementation process

Human and animal rights

No violation of Human and Animal Rights is involved.

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Ajil, A., Kumar, E.S. IDBNWP: Improved deep belief network for workload prediction: Hybrid optimization for load balancing in cloud system. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19495-z

Download citation

Received : 11 October 2023

Revised : 17 April 2024

Accepted : 26 May 2024

Published : 24 June 2024

DOI : https://doi.org/10.1007/s11042-024-19495-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Workload prediction
  • Quality-of-Services (QoS)
  • Optimal load balancing
  • Improved Deep Belief Network (IDBN)
  • Bald Eagle Assisted Butterfly Optimization Algorithm (BEABOA)
  • Find a journal
  • Publish with us
  • Track your research

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

curroncol-logo

Article Menu

  • Subscribe SciFeed
  • Recommended Articles
  • Author Biographies
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Testing machine learning models to predict postoperative ileus after colorectal surgery.

diabetes prediction using machine learning research paper 2022

1. Introduction

2. materials and methods, 2.1. study design and participants, 2.2. data acquisition and variable selection, 2.3. statistical analysis, 2.4. model training and validation, model descriptions, 2.5. model performance, 3.1. baseline characteristics, 3.2. comorbidities of importance, 3.3. ml model performance, 4. discussion, 5. conclusions, author contributions, institutional review board statement, informed consent statement, data availability statement, conflicts of interest, abbreviations.

Ada boosting classifieradaptive boosting classifier
ASA physical statusAmerican Society of Anesthesiologists Physical Status
AUC ROC curvearea under the curve receiver operating characteristic curve
BMIbody mass index
CCICharlson Comorbidity Index
CVAcerebral vascular accident
ECIElixhauser Comorbidity Index
EHRelectronic health record
EDAexploratory data analysis
XG boosting classifierextreme gradient boosting classifier
IQRinter-quantile range
kNNk-nearest neighbors imputation
LOSlength of stay
MLmachine learning
NSQIPNational Surgical Quality Improvement Program index
POEMperioperative evaluation and management
POIpostoperative ileus
PPVpositive predictive values
SMOTEsynthetic minority oversampling technique
UHCUniversity Health System Consortium
  • Merath, K.; Hyer, J.M.; Mehta, R.; Farooq, A.; Bagante, F.; Sahara, K.; Tsilimigras, D.I.; Beal, E.; Paredes, A.Z.; Wu, L.; et al. Use of Machine Learning for Prediction of Patient Risk of Postoperative Complications After Liver, Pancreatic, and Colorectal Surgery. J. Gastrointest. Surg. 2020 , 24 , 1843–1851. [ Google Scholar ] [ CrossRef ]
  • Steinberg, S.M.; Popa, M.R.; Michalek, J.A.; Bethel, M.J.; Ellison, E.C. Comparison of Risk Adjustment Methodologies in Surgical Quality Improvement. Surgery 2008 , 144 , 662–669. [ Google Scholar ] [ CrossRef ]
  • Bellman, R. Control Theory. Sci. Am. 1964 , 211 , 186–201. [ Google Scholar ] [ CrossRef ]
  • Baloch, Z.Q.; Raza, S.A.; Pathak, R.; Marone, L.; Ali, A. Machine Learning Confirms Nonlinear Relationship between Severity of Peripheral Arterial Disease, Functional Limitation and Symptom Severity. Diagnostics 2020 , 10 , 515. [ Google Scholar ] [ CrossRef ]
  • Lones, M.A. How to Avoid Machine Learning Pitfalls: A Guide for Academic Researchers. arXiv 2023 , arXiv:2108.02497. [ Google Scholar ]
  • Matsui, R.; Nagakari, K.; Igarashi, M.; Hatta, R.; Otsuka, T.; Nomoto, J.; Kohama, S.; Azuma, D.; Takehara, K.; Mizuno, T.; et al. Impact of Post-Operative Paralytic Ileus on Post-Operative Outcomes after Surgery for Colorectal Cancer: A Single-Institution, Retrospective Study. Surg. Today 2022 , 52 , 1731–1740. [ Google Scholar ] [ CrossRef ]
  • Centers for Disease Control and Prevention (CDC). International Classification of Diseases, Tenth Revision (ICD-10). Available online: https://www.cdc.gov/nchs/icd/icd10.htm (accessed on 23 January 2024).
  • Fouad, K.M.; Ismail, M.M.; Azar, A.T.; Arafa, M.M. Advanced methods for missing values imputation based on similarity learning. PeerJ Comput. Sci. 2021 , 7 , e619. [ Google Scholar ] [ CrossRef ]
  • Choudhary, M.; Jain, S.; Arya, G. Classical Models Vs Deep Leaning: Time Series Analysis. In Advancements in Interdisciplinary Research ; Springer: Berlin/Heidelberg, Germany, 2023; Volume 1738. [ Google Scholar ] [ CrossRef ]
  • Walters, S.J.; Campbell, M.J. The use of bootstrap methods for analysing Health-Related Quality of Life outcomes (particularly the SF-36). Heal. Qual. Life Outcomes 2004 , 2 , 70. [ Google Scholar ] [ CrossRef ]
  • Harre, F.E., Jr.; Lee, K.L.; Pollock, B.G. Regression models in clinical studies: Determining relationships between predictors and response. J. Natl. Cancer Inst. 1988 , 80 , 1198–1202. [ Google Scholar ] [ CrossRef ]
  • Heagerty, P.J.; Zheng, Y. Survival model predictive accuracy and ROC curves. Biometrics 2005 , 61 , 92–105. [ Google Scholar ] [ CrossRef ]
  • Corey, K.M.; Kashyap, S.; Lorenzi, E.; Lagoo-Deenadayalan, S.A.; Heller, K.; Whalen, K.; Balu, S.; Heflin, M.T.; McDonald, S.R.; Swaminathan, M.; et al. Development and validation of machine learning models to identify high-risk surgical patients using automatically curated electronic health record data (Pythia): A retrospective, single-site study. PLoS Med. 2018 , 15 , e1002701. [ Google Scholar ] [ CrossRef ]
  • Lin, V.; Tsouchnika, A.; Allakhverdiiev, E.; Rosen, A.W.; Gögenur, M.; Clausen, J.S.R.; Bräuner, K.B.; Walbech, J.S.; Rijnbeek, P.; Drakos, I.; et al. Training prediction models for individual risk assessment of postoperative complications after surgery for colorectal cancer. Tech. Coloproctol. 2022 , 26 , 665–675. [ Google Scholar ] [ CrossRef ]
  • Hosaka, H.; Takeuchi, M.; Imoto, T.; Yagishita, H.; Yu, A.; Maeda, Y.; Kobayashi, Y.; Kadota, Y.; Odaira, M.; Toriumi, F.; et al. Machine Learning-based Model for Predicting Postoperative Complications among Patients with Colonic Perforation: A Retrospective study. J. Anus Rectum Colon 2021 , 5 , 274–280. [ Google Scholar ] [ CrossRef ]
  • Wells, C.I.; Milne, T.G.E.; Seo, S.H.B.; Chapman, S.J.; Vather, R.; Bissett, I.P.; O’Grady, G. Post-operative ileus: Definitions, mechanisms and controversies. ANZ J. Surg. 2021 , 92 , 62–68. [ Google Scholar ] [ CrossRef ]
  • Millan, M.; Biondo, S.; Fraccalvieri, D.; Frago, R.; Golda, T.; Kreisler, E. Risk factors for prolonged postoperative ileus after colorectal cancer surgery. World J. Surg. 2011 , 36 , 179–185. [ Google Scholar ] [ CrossRef ]
  • Kronberg, U.; Kiran, R.P.; Soliman, M.S.M.M.; Hammel, J.P.; Galway, U.; Coffey, J.C.; Fazio, V.W. A characterization of factors determining postoperative ileus after laparoscopic colectomy enables the generation of a novel predictive score. Ann. Surg. 2011 , 253 , 78–81. [ Google Scholar ] [ CrossRef ]
  • Rybakov, E.G.; Shelygin, Y.A.; Khomyakov, E.A.; Zarodniuk, I.V. Risk factors for postoperative ileus after colorectal cancer surgery. Color. Dis. 2018 , 20 , 189–194. [ Google Scholar ] [ CrossRef ]
  • IBM Corp. IBM SPSS Statistics for Windows ; Version 26.0; IBM Corp: Armonk, NY, USA, 2019. [ Google Scholar ]
  • Haeuser, L.; Herzog, P.; Ayub, A.; Nguyen, D.-D.; Noldus, J.; Cone, E.B.; Mossanen, M.; Trinh, Q.-D. Comparison of Comorbidity Indices for Prediction of Morbidity and Mortality after Major Surgical Procedures. Am. J. Surg. 2021 , 222 , 998–1004. [ Google Scholar ] [ CrossRef ]
  • Charlson, M.E.; Pompei, P.; Ales, K.L.; MacKenzie, C.R. A new method of classifying prognostic comorbidity in longitudinal studies: Development and validation. J. Chronic Dis. 1987 , 40 , 373–383. [ Google Scholar ] [ CrossRef ]
  • Russell, M.M. The National Surgical Quality Improvement Program: Background and Methodology. Semin. Colon Rectal Surg. 2012 , 23 , 141–145. [ Google Scholar ] [ CrossRef ]
  • Menendez, M.E.; Neuhaus, V.; van Dijk, N.C.; Ring, D. The Elixhauser comorbidity method outperforms the CCI in predicting inpatient death after orthopaedic surgery. Clin. Orthop. Relat. Res. 2014 , 472 , 2878–2886. [ Google Scholar ] [ CrossRef ]
  • Elixhauser, A.; Steiner, C.; Harris, D.R.; Coffey, R.M. Comorbidity measures for use with administrative data. Med. Care 1998 , 36 , 8–27. [ Google Scholar ] [ CrossRef ]
  • Moltó, A.; Dougados, M. Comorbidity indices. Clin. Exp. Rheumatol. 2014 , 32 (Suppl. 85), 131–134. [ Google Scholar ]
  • Ondeck, N.T.; Bovonratwet, P.; Ibe, I.K.; Bohl, D.D.; McLynn, R.P.; Cui, J.J.; Baumgaertner, M.R.; Grauer, J.N. Discriminative Ability for Adverse Outcomes After Surgical Management of Hip Fractures: A Comparison of the Charlson Comorbidity Index, Elixhauser Comorbidity Measure, and Modified Frailty Index. J. Orthop. Trauma 2018 , 32 , 231–237. [ Google Scholar ] [ CrossRef ]
  • Mehta, H.B.; Dimou, F.; Adhikari, D.; Tamirisa, N.P.; Sieloff, E.B.; Williams, T.P.B.; Kuo, Y.-F.; Riall, T.S.M. Comparison of Comorbidity Scores in Predicting Surgical Outcomes. Med. Care 2016 , 54 , 180–187. [ Google Scholar ] [ CrossRef ]
  • Wei, R.; Guan, X.; Liu, E.; Zhang, W.; Lv, J.; Huang, H.; Zhao, Z.; Chen, H.; Liu, Z.; Jiang, Z.; et al. Development of a machine learning algorithm to predict complications of total laparoscopic anterior resection and natural orifice specimen extraction surgery in rectal cancer. Eur. J. Surg. Oncol. EJSO 2023 , 49 , 1258–1268. [ Google Scholar ] [ CrossRef ]
  • Weller, G.B.; Lovely, J.; Larson, D.W.; Earnshaw, B.A.; Huebner, M. Leveraging electronic health records for predictive modeling of post-surgical complications. Stat. Methods Med. Res. 2017 , 27 , 3271–3285. [ Google Scholar ] [ CrossRef ]
  • Rencuzogullari, A.; Benlice, C.; Costedio, M.; Remzi, F.H.; Gorgun, E. Nomogram-Derived Prediction of Postoperative Ileus after Colectomy: An Assessment from Nationwide Procedure-Targeted Cohort. Am. Surg. 2017 , 83 , 564–572. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

Data Science Stage Sub-StageDescriptionTools/Metrics
Data Acquisition Source ImportLoad data from CSV files, databases, etc.File paths, data size
Cleaning and Preprocessing Check for missing values, inconsistencies, duplicates. Format data types.Imputation methods, error checking tools
Exploratory Data Analysis (EDA)Feature DistributionAnalyze data distribution for each feature using histograms, boxplots.Visualizations, skewness measures
Feature RelationshipsIdentify relationships between features and target variable using scatter plots, correlation matrices.Correlation coefficients, feature importance scores
Outlier and Bias DetectionCheck for outliers and potential biases using boxplots, statistical tests.Outlier detection algorithms, bias analysis tools
Imbalanced Data HandlingClass Imbalance Assessment Calculate class imbalance ratio, visualize class distribution using pie charts.Class imbalance ratio, visualization tools
Mitigation Strategy Decision Choose appropriate strategy: SMOTE, undersampling, oversampling, none.Imbalance severity, data type, problem type
Data Oversampling (Optional)SMOTE ApplicationApply SMOTE or other oversampling techniques to increase minority class.SMOTE algorithms, minority class size increase
Oversampling ControlEnsure oversampling does not introduce overfitting or class overlap.Cross-validation, visualization
Data Undersampling (Optional)Undersampling TechniquesApply undersampling techniques to reduce majority class.Random undersampling, stratified undersampling
Undersampling ControlEnsure undersampling does not introduce bias or loss of information.Class balance metrics, cross-validation
Model Selection and TrainingFeature Engineering (Optional)Create new features based on existing ones (ratios, transformations).Feature engineering algorithms, interpretability measures
Model SelectionChoose suitable ML algorithms based on data type, problem type, and interpretability needs.Logistic Regression, Random Forest, Decision Trees, Gradient Boosting, Extreme Gradient Boosting
Model Training and RegularizationSplit data into training, validation, and test sets. Train models with cross-validation and regularization (L1, L2).Train/validation/test ratios, regularization parameters
Model Evaluation and TestingModel ValidationEvaluate model performance on validation set using accuracy, precision, recall, F1-score, AUC-ROC (for imbalanced data).Validation set metrics, model comparison tools
Best Model SelectionCompare performance across models and select the best one.Validation metrics comparison, statistical tests
Model TestingEvaluate final model on unseen test set to assess real-world performance.Test set metrics, model generalization analysis
Error AnalysisAnalyze model errors and identify potential limitations.Error analysis tools, visualization
ProductionInterpretation and DeploymentInterpret model results and explain predictions. Deploy model and monitor performance.Explainable AI tools, model monitoring systems
Variable of ImportanceNo Ileus (n = 296) SD/RangeIleus (n = 20)SD/RangeChi-Squarep-Value
Gender 2.6030.107
  Male153 14 (70%)
  Female143 6 (30%)
Age (mean/SD)58+/−12.3362+/−10.05 0.055
BMI (median/range)21.817.31–56.1030.520.94–41.53 0.00
NISQP (median/range)33.613.01–46.1256.245.1–78.4 0.00
Length of Stay (Days) (median/range)3.741–2011.646–25 0.00
Cost of Care (Ratio)1.0+/−0.361.77+/−0.34
Co-Morbidity
  Kidney Disease50 5 (25%) 0.6290.428
  Anemia77 5 (25%) 0.0330.855
  Arrhythmia41 4 (20%) 0.4580.498
  Rheumatoid Arthritis32 4 (20%) 1.6130.204
Surgical Approach
  Coloanal Anastomosis9 4 (20%) 0.7030.402
  Extended Right Hemicolectomy14 2 (10%) 0.2590.611
  Left Hemicolectomy21 1 (5%) 0.0910.763
  Low Anterior Resection161 3 (15%) 0.0850.771
  Right Hemicolectomy64 3 (15%) 0.0910.763
  Sigmoid Colectomy13 1 (5%) 0.1540.695
  Subtotal Colectomy (Ileosigmoid)1 0 0.0000.996
  Total Colectomy, Ileorectal1 1 (5%) 1.0470.306
  Transverse Colectomy1 0 0.0000.996
  Ultra Low Anterior Resection11 1 (5%) 0.1670.683
Surgery Type 3.8480.050
  Minimally Invasive Surgery (MIS) 248 (95%) 13 (5%)
  Open Approach48 (87.3%) 7 (12.7%)
Co-Morbidity Sample Size Frequency % of Sample
HTN 316 178 56.3%
CAD 316 62 19.6%
Past MI 316 17 5.4%
CHF 316 25 7.9%
CABG Stent 316 20 6.3%
Arrhythmia 316 41 13.0%
AICD 316 1 0.3%
Pacemaker 316 51 16.1%
Valvular 316 18 5.7%
PVD 316 7 2.2%
Anemia 316 77 24.4%
Diabetes 316 63 19.9%
Hypothyroidism 316 56 17.7%
Electrolyte Disturbance 316 308 97.5%
Asthma 316 60 19.0%
COPD 316 30 9.5%
OSA 316 48 15.2%
CVA 316 65 20.6%
TIA 316 6 1.9%
Seizures 316 7 2.2%
Neuromuscular Disease 316 0 0.0%
Hepatitis 316 26 8.2%
Cirrhosis 316 13 4.1%
AIDS_HIV 316 3 0.9%
Dyslipidemia 316 118 37.3%
Kidney Disease 316 50 15.8%
RA 316 32 10.1%
Depression 316 39 12.3%
Dementia 316 39 12.3%
AdaBoost Tuned with Grid Search AdaBoost Tuned
with Random Search
XGboost Tuned with Grid Search XGboost Tuned with Random Search
Accuracy 0.942 0.942 0.852 0.852
Recall 0.083 0.083 0.833 0.833
Precision 1.000 1.000 0.278 0.278
F1 0.154 0.154 0.417 0.417
The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Brydges, G.; Chang, G.J.; Gan, T.J.; Konishi, T.; Gottumukkala, V.; Uppal, A. Testing Machine Learning Models to Predict Postoperative Ileus after Colorectal Surgery. Curr. Oncol. 2024 , 31 , 3563-3578. https://doi.org/10.3390/curroncol31060262

Brydges G, Chang GJ, Gan TJ, Konishi T, Gottumukkala V, Uppal A. Testing Machine Learning Models to Predict Postoperative Ileus after Colorectal Surgery. Current Oncology . 2024; 31(6):3563-3578. https://doi.org/10.3390/curroncol31060262

Brydges, Garry, George J. Chang, Tong J. Gan, Tsuyoshi Konishi, Vijaya Gottumukkala, and Abhineet Uppal. 2024. "Testing Machine Learning Models to Predict Postoperative Ileus after Colorectal Surgery" Current Oncology 31, no. 6: 3563-3578. https://doi.org/10.3390/curroncol31060262

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. (PDF) DIABETES PREDICTION USING MACHINE LEARNING

    diabetes prediction using machine learning research paper 2022

  2. (PDF) Diabetes Prediction Using Machine Learning

    diabetes prediction using machine learning research paper 2022

  3. (PDF) An effective correlation-based data modeling framework for

    diabetes prediction using machine learning research paper 2022

  4. (PDF) Diabetes Prediction using Machine Learning

    diabetes prediction using machine learning research paper 2022

  5. (PDF) DIABETES PREDICTION USING MACHINE LEARNING ALGORITHMS

    diabetes prediction using machine learning research paper 2022

  6. Diabetes Prediction using Machine Learning Techniques

    diabetes prediction using machine learning research paper 2022

VIDEO

  1. Machine Learning applied to diabetes prediction

  2. Machine Learning applied to Diabetes prediction w/Python

  3. DiabNet

  4. Diabetes Disease Prediction Using Machine Learning Algorithms

  5. Heart Disease Prediction Using Machine Learning Algorithms

  6. C10_Heart Disease prediction using Machine learning algorithms

COMMENTS

  1. (PDF) Diabetes Prediction Using Machine Learning

    as 592 million. Diabetes is a disease caused due to the increase level of blood. glucose. This h igh blood glucose produces the symptoms of frequent urination, increased thirst, and increased ...

  2. Diabetes prediction using Machine Learning algorithms and

    cal domain especially in diagnosing the diseases such as diabetes. Therefore, the commonly used machine learning classification [1] namely SVM, KNN, ANN, Naive Bayes, Logistic regression, and Decision. Tree are applied to identify diabetes patients at an early period.On the other side, Ontology is one of the most adopted approache.

  3. A Novel Proposal for Deep Learning-Based Diabetes Prediction

    For more comprehensive information on the applications of artificial intelligence in the medical field, research studies by ... performed diabetes prediction using many machine learning methods such as naïve Bayes (NB), SVM and logistic regression. The best accuracy was obtained with SVM with 77.37%. ... 8 June 2022). The National Institute of ...

  4. [Retracted] A Novel Diabetes Healthcare Disease Prediction Framework

    The authors perform a review of the literature on machine models and suggest an intelligent framework for diabetes prediction based on their findings. Machine learning models are critically examined, and an intelligent machine learning-based architecture for diabetes prediction is proposed and evaluated by the authors.

  5. Machine Learning Models for Data-Driven Prediction of Diabetes by

    Table 5 and Table 6 compare training machine-learning models for diabetes prediction using unbalanced and balanced data. It can be seen that the result parameters (such as accuracy and sensitivity) of all machine-learning models have met the standard, and the difference is relatively small but very low in specificity and very different; this is ...

  6. Diabetes Prediction using Machine Learning Algorithms

    Various prediction models have been developed and implemented by various researchers using variants of data mining techniques, machine learning algorithms or also combination of these techniques. Dr Saravana Kumar N M, Eswari, Sampath P and Lavanya S (2015) implemented a system using Hadoop and Map Reduce technique for analysis of Diabetic data.

  7. Diabetes prediction using machine learning and explainable AI

    Diabetes can be a reason for reducing life expectancy and quality. Predicting this chronic disorder earlier can reduce the risk and complications of many diseases in the long run. In this paper, an automatic diabetes prediction system using various machine learning approaches has been proposed.

  8. A Novel Diabetes Healthcare Disease Prediction Framework Using Machine

    The study's primary goal was to see how big data analytics and machine learning-based techniques may be used in diabetes. The examination of the results shows that the suggested ML-based framework may achieve a score of 86. Health experts and other stakeholders are working to develop categorization models that will aid in the prediction of ...

  9. Early Prediction of Diabetes Using an Ensemble of Machine Learning Models

    Feature papers represent the most advanced research with significant potential for high impact in the field. ... Paul, D.; Ghosh, P. Analysing feature importances for diabetes prediction using machine learning. In Proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver ...

  10. Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers

    Research Article Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation B. Shamreen Ahamed ,1 Meenakshi S. Arya,2 and Auxilia Osvin V. Nancy1 1College of Engineering and Technology, SRM Institute of Science and Technology, Vadapalani Campus, No. 1, Jawaharlal Nehru Road, Vadapalani, Chennai, Tamil Nadu, India

  11. Diabetes Prediction Using Machine Learning Algorithms

    Diabetes patients with poor control is a kind of diabetes that affects the body's metabolism characterized by an abnormal rise in blood sugar levels due to insulin deficiency, tissue insulin sensitivity, or both. In the sphere of medicine, the diabetes system is quite beneficial. Diabetes prevention is critical, and utmost caution should be exercised to avoid it. Diabetes has now become a ...

  12. Prediction of Diabetes using Machine Learning

    Download Citation | On Oct 13, 2022, C.S. Manikandababu and others published Prediction of Diabetes using Machine Learning | Find, read and cite all the research you need on ResearchGate

  13. Diabetes Prediction Using Machine Learning

    Prediction of Diabetes and Symptoms of Covid-19 Using Machine Learning Classifiers. Conference Paper. May 2022. Steffy T. Baby. Basil Xavier Simon. G. Jaspher W. Kathrine. Request PDF | On Dec 27 ...

  14. Diabetes prediction using supervised machine learning

    Supervised Learning is a machine learning technique that is used for machine learning with labeled datasets in order to identify input labels in order to make predictions and classifications [1]. 1.1. Research Problem In this study, the research problem is because there were 463 million diabetics in 2019.

  15. A survey on diabetes risk prediction using machine learning approaches

    The goal of this study was to use machine learning classification approaches based on observable sample attributes to predict diabetes at an early stage. The k-NN, SVM, functional tree (FT), and RFCs were employed as classifiers. k-NN had the highest accuracy of 98%, followed by SVM at 94%, FT at 93%, and RF at 97%.

  16. Early Prediction of Diabetes Using an Ensemble of Machine Learning

    In conjunction with the suggested ensemble model, our statistical imputation and RF-based feature selection techniques produced the best results for early diabetes prediction. Moreover, the presented new dataset will contribute to developing and implementing robust ML models for diabetes prediction utilizing population-level data. Keywords ...

  17. Prediction of Type-2 Diabetes Mellitus Disease Using Machine Learning

    The technological advancements in today's healthcare sector have given rise to many innovations for disease prediction. Diabetes mellitus is one of the diseases that has been growing rapidly among people of different age groups; there are various reasons and causes involved. All these reasons are considered as different attributes for this study. To predict type-2 diabetes mellitus disease ...

  18. Diabetes prediction using machine learning and explainable AI techniques

    In this paper, an automatic diabetes prediction system has been developed using a private dataset of female patients in Bangladesh and various machine learning techniques. The authors used the Pima Indian diabetes dataset and collected addi-tional samples from 203 individuals from a local textile factory in Bangladesh.

  19. PDF Diabetes Prediction Using Machine Learning Algorithms

    Salliah Shafia and Prof. Gufran Ahmad Ansari designed a model for Early Prediction of Diabetes Disease & Classification of Algorithms Using Machine Learning Approach. this research uses the WEKA tools to predict diabetes in patients from Pima India Diabetes Data Set consists of 7 attributes and 767 entries and in this paper,

  20. Research on Diabetes Prediction Method Based on Machine Learning

    Long-term high blood sugar can cause chronic damage and dysfunction of various tissues, especially eyes, kidneys, heart, blood vessels and nerves. Therefore, the early prediction of diabetes is particularly important. In this paper, we use supervised machine-learning algorithms like Support Vector Machine (SVM), Naive Bayes classifier and ...

  21. Analysis and Prediction of Diabetes Using Machine Learning

    Diabetes contributes to heart disease, kidney disease, nerve damage, and blindness. Mining the diabetes data in an efficient way is a crucial concern. The data mining techniques and methods will be discovered to find the appropriate approaches and techniques for efficient classification of Diabetes dataset and in extracting valuable patterns.

  22. Proteomic prediction of diverse incident diseases: a machine learning

    We show the value of broad-capture proteomic biomarker discovery studies across multiple diseases of diverse causes, pointing to those that might benefit the most from proteomic approaches, and the potential to derive common sparse biomarker panels for prediction of multiple diseases at once. This framework could enable follow-up studies to explore the generalisability of proteomic models and ...

  23. Diabetes Prediction Using Machine Learning Algorithm

    Diabetes mellitus (DM), commonly known as diabetes, is a group of diseases that are defined by chronic high blood. glucose levels due to abnormalities in insulin secretion, insulin action, or both ...

  24. A Novel Study on Machine Learning Algorithm‐Based Cardiovascular

    Hybrid machine learning models have been applied to predict heart diseases as well as perform optimum classification methods for prediction. Hybrid models give a better optimum output depending on the machine learning method implemented for the execution . Similarly, random forest, decision trees, and hybrid algorithms have been used to predict ...

  25. Development and validation of a machine learning-based readmission risk

    The findings indicated that the LR model exhibited the most optimal performance in terms of AUC, accuracy, sensitivity, and specificity for the occurrence of readmissions after direct PCI in NSTEMI patients. To investigate the factors that influence readmissions in patients with acute non-ST elevation myocardial infarction (NSTEMI) after percutaneous coronary intervention (PCI) by using ...

  26. Comparative Analysis of Supervised Machine Learning Algorithms for

    Given the nature of the disease, it is needed to mitigate the effects of spread by resorting to technological advancements for diagnosis of the disorder using machine learning algorithms. In this paper, three supervised machine learning algorithms; Decision Tree, Naïve Bayes, and Logistic Regression have been utilized for the prediction of the ...

  27. Diabetes Prediction Using Machine Learning Algorithms

    A Comparative Analysis of Diabetes Prediction Models using Machine Learning Algorithms. Conference Paper. Mar 2022. S. Sarathambekai. Vairam Th. Sathyaseelan Krishnaraj. Jawhar MG.

  28. IDBNWP: Improved deep belief network for workload prediction ...

    In 2021, Sounak Banerjee et al suggested a multi-step-ahead task forecasting approach using machine learning techniques and allocating the resources in accordance with this prediction in a way that allows the resources to be used more effectively and, as a result, lowers the data center's overall energy consumption. Relying on Bitbrains' actual ...

  29. Testing Machine Learning Models to Predict Postoperative Ileus after

    Background: Postoperative ileus (POI) is a common complication after colorectal surgery, leading to increased hospital stay and costs. This study aimed to explore patient comorbidities that contribute to the development of POI in the colorectal surgical population and compare machine learning (ML) model accuracy to existing risk instruments. Study Design: In a retrospective study, data were ...

  30. Machine Learning Based Assistance to Healthcare Professionals in

    Abstract We research into the clinical, biochemical and neuroimaging factors associated with the outcome of stroke patients to generate a predictive model using machine learning techniques for ...