A Prescriptive Analytics Model for Early Detection of Diabetes Propensity
Abstract
Many Prescriptive analytics applications in healthcare use patient data to enable early detection of the propensity of a patient developing serious diseases like diabetes, stroke, cancer, and heart disease. This study aims to build an analytics model utilizing readily available data to explore the possibility of early detection of a person’s propensity to develop diabetes. Changes in parameters in the blood and urine test reports of patients were used as predictors in the model. While experienced Doctors would see this trend immediately if these records are stored and made available at the time of consultation, a model can serve as an early warning system as the data is being entered or shared with the customer.
We utilized data with a diagnostic laboratory on patients on an annual health plan with the laboratory. The data consisted of 1200 records and tracked the following variables Age, Hemoglobin, Glucose (Fasting) - FBS, Glucose (PP-2 hours), Urea, Creatinine, Cholesterol Total, Triglycerides (TGL), HDL Cholesterol, LDL Cholesterol, VLDL Cholesterol, Total Cholesterol/HDL Ratio, LDL/HDL Ratio, over a period of 2 years. Some of these patients tested positive for diabetes at various points in their visits to the clinic. Our data consisted of 70% men, 30% women with members in the age group of 25 – 55. The reported ranges of values of the selected predictor variables did not have gender variations.
We created multiple models using standard analytics techniques K-NN and Naïve Bayes similar to (Sing, 2014). Naïve Bayes model was more useful as the probabilities of the person developing diabetes were generated by the model and could connect to the end user needs. The system we are creating is basically a warning system which helps in shortlisting records that have higher probability of developing diabetes for further assessment. The results from our initial models had a high precision on out of sample data with a false positive ratio of 12%. It will be useful when patient volumes are high to identify the ones that require further investigation immediately. These models are currently under testing on live data inside the organization.
Copyright (c) 2022 Rajendra Desai, Sirichandana V V
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.