Acta Diabetol. 2025 Apr 1. doi: 10.1007/s00592-025-02496-1. Online ahead of print.
ABSTRACT
BACKGROUND: Type 2 diabetes and coronary heart disease exhibit heightened prevalence in the Chinese population, posing as leading causes of mortality. The combination of diabetes and coronary heart disease, due to its challenging diagnosis and poor prognosis, imposes a significant disease burden. In recent years, machine learning has frequently been employed in diagnostic applications within medical fields; however, predictive models for type 2 diabetes complicated by coronary heart disease have been confronted with issues such as lower predictive performance and interference from other comorbidities during prediction.
METHODS: This study enhances the predictive accuracy, sensitivity, specificity, F1 score, and AUC of models forecasting the coexistence of diabetes and coronary heart disease. We developed an advanced prediction model using XGBoost combined with SHAP for feature analysis. Through comparative feature selection, hyperparameter optimization, and computational efficiency analysis, we identified optimal conditions for model performance. External validation with independent datasets confirmed the model's robustness and generalizability, supporting its potential implementation in clinical practice.
RESULTS: This study compared three models-Random Forest, LightGBM, and XGBoost-and found that XGBoost exhibited superior performance in both efficacy and computational efficiency. The accuracy (Acc) of the XGBoost model was 0.8910, which improved to 0.8942 after hyperparameter tuning. External validation using datasets from Pingyang Hospital and Heji Hospital in Shanxi Province, China, yielded an AUC of 0.7897, demonstrating robust generalizability. By integrating SHAP (SHapley Additive exPlanations) for interpretability, our study identified bilirubin levels, basophil count, cholesterol levels, and age as key features for predicting the coexistence of type 2 diabetes mellitus (T2DM) and coronary heart disease (CHD). These findings are seamlessly consistent with the feature importance rankings determined by the XGBoost algorithm. The model demonstrates moderate predictive performance (AUC = 0.7879 in external validation) with practical interpretability, offering potential utility in improving diagnostic efficiency for T2DM-CHD comorbidity in resource-limited settings. However, its clinical implementation requires further validation in diverse populations.
PMID:40167635 | DOI:10.1007/s00592-025-02496-1