Development and Validation of Predictive Models for Gestational Diabetes Treatment Modality Using Supervised Machine Learning: A Population-Based Cohort Study | BMC Medicine


Study population and design

The study population was drawn from members of Kaiser Permanente Northern California (KPNC), an integrated health care delivery system serving 4.5 million members. KPNC members represent approximately 30% of the underlying population and are socio-demographically representative of the population residing in the geographic areas served [11, 12]. The integrated information system allows predictors and outcomes to be quantified throughout pregnancy. Individuals with DG are identified by searching the KPNC Pregnancy Glucose Tolerance and DG Register, which is an active surveillance register that uploads laboratory data to determine screening and diagnosis of DG, where pre-existing type 1 or 2 diabetes is automatically excluded. Specifically, KPNC pregnant women are screened universally (98%) for DG with the 50g 1h Glucose Challenge Test (GCT) at 24-28 weeks of gestation. [1]. If the screening test is abnormal, a 100 g 3-h oral glucose tolerance test (OGTT) is performed after an 8-12 h fast. DG is determined by meeting one of the following criteria: (1) ≥ 2 OGTT plasma glucose values ​​meeting or exceeding the Carpenter-Coustan thresholds: 1-hr 180 mg/dL, 2-hr 155 mg/dL, and 3-hr 140mg/dl; or (2) 1-h GCT ≥ 180 mg/dL and fasting glucose ≥ 95 mg/dL performed alone or during OGTT [13, 14]. Plasma glucose measurements were performed using the hexokinase method at the KPNC Regional Laboratory, which participated in the College of American Pathologists Accreditation and Surveillance Program. [15]. This data-only project was approved by KPNC’s institutional review board, which waived the requirement for informed consent from participants.

Among 405,557 pregnancies with gestational age at delivery not= 42), deriving an analytical sample of 30,474 pregnancies complicated by GDM. We further derived a discovery set containing 27,240 DG-complicated pregnancies from 2007 to 2016 and a temporal/future validation set of 3,234 DG-complicated pregnancies in 2017 (Fig. 1).

Fig. 1

Flow chart for developing a cohort of pregnancies with gestational diabetes 2007-2017. DG: gestational diabetes mellitus

Verification of results

People diagnosed with gestational diabetes received a universal referral to KPNC Regional Perinatal Service Center for the supplementary care program beyond their standard of prenatal care. NTM was the first-line treatment. If glycemic control goals were not achieved with NTM alone, pharmacological treatment was initiated. Based on advice regarding the risks and benefits of oral antidiabetic agents over insulin, pharmacological treatment was chosen via a shared patient-physician decision-making model: (1) with oral antidiabetic agents such as glyburide and metformin added to the MNT and if optimal glycemic control continued to fail, the oral medication was changed to insulin therapy, and (2) or with insulin therapy initiated directly beyond the MNT (a supplementary table shows this more in detail [see Additional file 1]). We searched the pharmaceutical information management database for prescriptions of oral agents (glyburide 97.9%, metformin, or other) and insulin after a diagnosis of GDM. Treatment modality was grouped into NTM only and pharmacological treatment (oral agents and/or insulin) beyond NTM. Notably, despite a large overall sample size, we grouped oral agents (32.6% of the overall population) and insulin (6.2%) into pharmacological treatment due to a potency insufficient to predict insulin separately as an outcome.

Candidate predictors

Based on risk factors associated with GDM treatment modality and clinician feedback, we selected 176 (64 continuous and 112 categorical) candidate sociodemographic, behavioral, and clinical predictors obtained from electronic health records for development. of models. The candidate predictors were divided into four tiers based on availability at different stages of pregnancy (an additional table shows this in more detail [see Additional file 2]): level 1 predictors (not= 68) were available at the start of pregnancy and dated back 1 year before the index pregnancy; level 2 predictors (not= 26) were measured from the last menstrual period until before the diagnosis of GDM; level 3 predictors (not= 12) were available at the time of GDM diagnosis; and level 4 (not= 70) included self-monitoring of blood glucose levels (SMBG), as the primary measure of glycemic control during pregnancy, as recommended by the American Diabetes Association [5], measured the first week after diagnosis of DG. All predictors, levels 1 to 4, were measured before the outcome of interest (i.e., the last line of GDM treatment). Pregnant women with DG in our study population had an average of 11.8 weeks (SD: 6.6 weeks) of SMBG measurements between diagnosis of DG and delivery. We included data 1 week after diagnosis of DG to allow earlier prediction, as it takes an average of 5.6 weeks from diagnosis of DG to the optimal treatment offered. It should be noted that people with DG were universally offered enrollment in a supplementary care program for DG run by nurses and dietitians via telemedicine from KPNC’s Regional Perinatal Services Center. [16]. All people with DG were instructed to self-monitor and record glucose readings four times a day: fasting before breakfast and 1 hour after starting each meal. SMBG measurements were then reported to nurses or dietitians in weekly phone calls from enrollment until delivery and the data was recorded in the Patient-Reported Hair Glucose Clinical Database. patients.

statistical analyzes


We imputed missing values ​​with the random forest algorithm because the algorithm does not require parametric model assumptions, which reduces the efficiency of the predictor (an additional table shows this in more detail [see Additional file 2]). We assessed the estimate of true imputation error using the normalized root mean square error and the proportion of misclassified entries for continuous and categorical variables, respectively. Both values ​​were close to 0, indicating good performance in imputation (an additional table shows this in more detail [see Additional file 3]). After pretreatment, we used you-test and Pearson’s chi-square test to compare participant characteristics between discovery and temporal/future validation sets. We performed the Mann-Kendall test to examine secular trends in GDM treatment modalities over calendar years. The discovery set (2007-2016) was stratified by calendar year and treatment modality for increased cross-validation. The temporal/future validation set (2017) was stratified by treatment modality for calculating cross-validation prediction performance.

Selection of variables and development and comparison of complete models

We performed classification and regression tree prediction (CART), absolute minimum reduction and selection operator (LASSO) regression, and super learner (SL) prediction with level 1, 1–2 predictors , 1–3 and 1–4, respectively. CART and LASSO regressions were chosen as simple prediction methods over SL. The SL defines a set of candidate machine learning algorithms, namely the library, and combines the prediction results through meta-learning through cross-validation [17]. SL has the asymptotic property that it is at least as good (in risk, defined by negative log-likelihood) as the best fitting algorithm in the library [17]. Although the variables included in the final SL set cannot be easily interpreted for their individual contributions, SL can be used for optimal prediction performance and to compare simpler, less adaptive approaches. [17].

We adjusted the prediction methods as follows. In CART, the Gini index measured the composition of subset heterogeneity to the outcome, and the maximum depth (6) was defined as the stopping criterion. Considering the potential errors of the hazard curve estimation, the regularization parameter in the LASSO regression was selected from the cross-validated error within one standard error of its minimum value [18]. For the SL, we considered a simple library and a complex library for comparison. The simple library included mean response, LASSO and CART regression; the complex library has been extended by additionally including random forest and extreme gradient boost (XGBoost). Several XGBoosts were considered, where their tuning parameters were set to 10, 20, 50 shafts, 1 to 6 max depths, and 0.001, 0.01, and 0.1 shrinkage for regularization.

For models using predictors at each level, prediction results were evaluated using cross-validated receiver operating characteristic curves and the area under the receiver operating characteristic curve statistics (AUC ) in discovery and temporal/future validation sets. We used Delong’s test to compare the AUCs between different prediction algorithms at the same predictor level and within the same prediction algorithm across levels, respectively. [19]. We used permutation-based variable importance to calculate AUCs with 5 simulations and obtained the top 10 important features. By permuting one variable at a time, the method calculated the difference in AUC before and after the permutation to assign a measure of importance [20]. The model with the highest AUC in the validation set was selected as the final complete model.

Development of simpler models

To improve interpretability and potential clinical adoption, we used 10-fold cross-validation logistic regression to develop simpler models in the discovery set based on a minimal set of the most important features at each level, as opposed to to the full set of features used in the complex. SL. We further selected the interaction term(s) considering all cross products through stepwise forward and backward selection by Akaike’s information criterion . We assessed the predictive performance (i.e., simplicity and cross-validated ASCs) of these simpler models on the validation set. Additionally, calibration was examined by assessing the quality of an uncalibrated model via the built-in calibration index, which captured the predicted probability distribution, coupled with a calibration plot. The calibration method (i.e. isotonic regression) was implemented for recalibration in case of observed over or under estimation.


Comments are closed.