Exploring the relationship between heavy metals and diabetic retinopathy: a machine learning modeling approach … – Nature.com

Characteristics of the study population

Table 1 demonstrated the general characteristics of the participants with and without diabetic retinopathy in this study. A total of 1,042 American adults were included, of whom 212 were diagnosed with DR and 830 with non-DR. The mean age of the total population was about 62years. Women comprised 53% of the included people, slightly more than men (47%). Non-Hispanic whites (40%), married (61%), those with high school or above levels of education (60%), never smoking (52%), former alcohol drinking (34%), self-reported history of hypertension (75%) and hyperlipidemia (88%), and those with PIR1.00 (78%), respectively, accounted for the largest proportions of the total population. Smoking status, urinary creatinine level (-87.5 vs 103) and mean concentrations of heavy metal, such as Sb (-7.30 vs -7.42), Tl (-6.71 vs -6.56) and Pt (-9.41 vs -9.58) differed significantly between DR and non-DR patients.

Figure1 showed the correlation between each of the 13 heavy metals and the baseline characteristics of the population. The results indicated that most of the metals are correlated with each other to varying degrees, with a relatively strong correlation between TI and Cs (r=0.54) and the similar relationship between Co and Ba (r=0.44). Curves based on correlations of heavy metals with baseline population characteristics and other details are shown in Fig. S1. We also assessed multicollinearity between all selected metals and covariates using variance inflation factors (VIFs), which showed that there was no multicollinearity.

The results of Pearson's correlation analysis among the metal factors and baseline variables.

Figure2 showed the efficacy of the 11 machine learning models included to predict DR risk based on the testing set, and the results of the training set are shown in Fig. S2, presented as ROC analysis curves. The AUC value of the KNN model is 1.000, of the GBM model is 0.991, of the RF model is 0.988, of the C5.0 model is 0.987, of the NN model is 0.966, of the XGBoost model is 0.961, of the SVM model is 0.939, of the MLP model is 0.911, of the NB model is 0.831, of the GP model is 0.800, of the LR model is 0.622. Tables S1 and Tables 2 provide the perfomance indicators of the 11 models used in this study in the training set and validation set, respectively, and show the confusion matrices used by 11 machine learning algorithms. The results show that among these machine learning models, the KNN model exhibits the best prediction performance. As a result, the prediction model based on the KNN model was finally selected for subsequent analyses.

The ROC of the 11 machine learning models in testing set.

PFI analysis provided insights into the relative importance of all variables in the KNN model. We used the IML method to assess the contribution weights of heavy metal exposure (Ba, Cd, Co, Cs, Pb, Sb, etc.) and people's baseline characteristics (age, sex, BMI, education level, ethnicity, smoking and drinking status, etc.) in the prediction model, which is presented in Fig.3A. The results of the analyses showed that the first five variables (Sb, Ba , Pt, Ur, As) were relatively more important variables in the prediction model. Among them, Sb level contributed a weight of 1.7306321.791722 in predicting DR risk, which was significantly higher than all other included variables. The critical variables only compared to Sb level also include Ba, Pt, Ur, As, which are also relatively sensitive metals in predicting the development of DR. The contribution weights of Ba, Pt, Ur, As were 1.5604741.602271, 1.5660631.633790, 1.5113661.540538, 1.4563521.496473 respectively. It is worth noting that the contribution weights of demographic characteristics and lifestyle-related variables in the prediction of DR risk in our results were lower compared to heavy metal exposure, and all baseline characteristics variables except age were weaker than heavy metal exposure.

The contribution of metal factors and baseline variables in predictive model. (A) The forest map based on PFI analysis displays the corresponding contribution weights of heavy metals and baseline variables and their corresponding standard deviations; (B) The SHAP summary plot of all variables and DR risk. The width of the range of the horizontal bars can be interpreted as the effect on the model predictions, with the wider the range, the greater the effect. The direction on the x-axis represented the likelihood of developing DR (right) or not developing (left); (C) The SHAP features importance plot of heavy metals and DR risk. The magnitude of the effect of each feature on the model output was measured by the average of the absolute values of the SHAP values for all samples, ranked from top to bottom by their magnitude of effect; D) The SHAP summary plot of heavy metals and DR risk.

Furthermore, we further validated the relationship of each variable with the predicted DR risk by the SHAP method after screening the KNN model. The SHAP summary plot (Fig.3B) showed the overall effect of heavy metals and baseline variables on DR risk, and was ranked in descending order according to the importance of the feature. In this case, a positive SHAP value indicates that the value of the feature is positively associated with DR risk, and the larger the value, the greater the contribution. The results showed that the top five potentially critical factors influencing higher predicted DR risk were, in descending order, Sb, Pt, As, Tl, Ba. Moreover, Sb had higher contribution weight in the prediction model than any other heavy metal or baseline variable under two different analysis methods, which is in line with the results of the SHAP summary plot between heavy metals and predicted DR risk (Fig.3C,D).

The predictive performance of the selected KNN model was further explained by PDP analysis, and the relationships between six key heavy metals (Sb, Ba, Pt, As, Tl, Cd) and the predicted values of DR are shown in Fig.4, while the results for the remaining heavy metals are shown in Fig. S3. The results show that some of the heavy metals, including As, Co, Sb, and Tu, showed a significant trend of increasing predicted risk of DR with elevated levels of these heavy metals in the log-transformed interval of the relatively high concentrations. The predicted risk of DR was significantly increased when the log-transformed levels of some heavy metals, including As, Co, Sb, and Tu, were elevated at relatively high concentration, but there was no significant correlation between Pt and the predicted risk of DR at high concentration. However, there was no significant correlation between increasing or decreasing levels of Cs, Hg, and Pb and DR risk. These findings suggest that timely detection of key metal levels in vivo may play an essential role in predicting the development of DR.

Relationships between key metal including (A) Sb, (B) Ba, (C) Pt, (D) As, (E) Tl, (F) Cd and predictive DR risk. The x-axis of the plot represented the log-transformed values of each metal.

We performed the analysis of heavy metal exposure interaction properties by PDPs model. The results in Fig.5A show that the corresponding variables with overall interaction strength greater than 0.2 were Sb, age, Tu, Pt, As, Cd, and Ur, with Sb having the most significant interaction effect. The interaction performance of the baseline variables for the prediction of DR risk remained weaker than that of heavy metals. Therefore, we further performed the interaction analysis of Sb with other variables. Figure5B revealed that the interaction between Sb and age ranked the highest among all metal pairs, with overall interaction strength greater than 0.4. In addition to the strong synergistic effect of Sb with As, Tl, and Cs, ethnicity had an effect on the prediction of DR risk by Sb, with overall interaction strength greater than 0.2. The results suggest that monitoring Sb levels, especially in older populations, may be more critical in controlling the development of DR.

Interaction effects of variables on DR. (A) Interactions between heavy metals and baseline variables on DR; (B) Interactions between Sb levels and other variables on DR. The range of the straight line represents the overall interaction strength, the wider the range, the greater the effect.

More here:
Exploring the relationship between heavy metals and diabetic retinopathy: a machine learning modeling approach ... - Nature.com

Related Posts

Comments are closed.