Synopsis
Environmental pollutant exposures have been associated with various health outcomes. Most epidemiologic studies have examined environmental pollutants individually, but in real life, we are exposed to pollutant mixtures, not single chemicals. Although multi-pollutant approaches have been recognized recently, analytical challenges exist such as numerous potential exposures of interest, high degrees of correlation between some of these exposures, and high dimensions. New methods for epidemiologic analysis of mixtures are being developed, but it is not well understood how well they perform or how well they identify key, causal risk factors.
Dataset
exercise_data.csv
: This is the dataset used in Park et al. (2017)1. It contains 9664 subjects and 40 variables from NHANES 2003-2014. See the Data Dictionary at the end.
Outcome: LBXSGTSI, gamma-glutamyl transferase (GGT)
Exposure: 20 metals measured in blood and urine
Potential confounders: age, sex, race/ethnicity, education, smoking status, body mass index (BMI), urinary creatinine
Objectives
To identify important predictors of GGT
, using three different methods discussed above:
ONE from conventional methods (e.g., forward selection, backward elimination, stepwise)
ONE from regularized (shrinkage) regression methods (e.g., LASSO, ENET, adaptive LASSO, and adaptive ENET)
ONE from CART or random forests.
For the conventional methods and regularized regression methods, include quadratic terms and pairwise interactions among 20 predictors.
Variable Dictionary
No | Variable | Label |
---|---|---|
1 | seqn | Participant ID |
2 | cycle | NHANES cycle (3=2003-2004, 4=2005-2006, 5=2007-2008, 6=2009-2010, 7=2011-2012, 8=2013-2014) |
3 | sex | Sex (1 male, 2 female) |
4 | age | Age (years) |
5 | raceeth | Race/Ethnicity (1 Mexican American, 2 Other Hispanic, 3 Non-Hispanic white, 4 Non-Hispanic black, 5 Other) |
6 | psu | Primary Sampling Unit (PSU) |
7 | strata | Sampling Strata |
8 | bmi | Body Mass Index (kg/m2) |
9 | wt2yr | Two-year weights of subsample |
10 | ucr | Creatinine, urine (mg/dL) |
11 | ggt | Gamma-Glutamyl Transferase (GGT) (U/L) |
12 | bpb | Lead, blood (ug/dL) |
13 | bcd | Cadmium, blood (ug/L) |
14 | bhg | Total Mercury, blood (ug/L) |
15 | utas | Urinary total Arsenic (\(\mu g/L\)) |
16 | uas3 | Urinary Arsenous acid (\(\mu g/L\)) |
17 | uas5 | Urinary Arsenic acid (\(\mu g/L\)) |
18 | uab | Urinary Arsenobetaine (\(\mu g/L\)) |
19 | uac | Urinary Arsenocholine (\(\mu g/L\)) |
20 | udma | Urinary Dimethylarsonic acid (\(\mu g/L\)) |
21 | umma | Urinary Monomethylacrsonic acid (\(\mu g/L\)) |
22 | uba | Barium, urine (\(\mu g/L\)) |
23 | ucd | Cadmium, urine (\(\mu g/L\)) |
24 | uco | Cobalt, urine (\(\mu g/L\)) |
25 | ucs | Cesium, urine (\(\mu g/L\)) |
26 | umo | Molybdenum, urine (\(\mu g/L\)) |
27 | upb | Lead, urine (\(\mu g/L\)) |
28 | usb | Antimony, urine (\(\mu g/L\)) |
29 | utl | Thallium, urine (\(\mu g/L\)) |
30 | utu | Tungsten, urine (\(\mu g/L\)) |
31 | uur | Uranium, urine (\(\mu g/L\)) |
32 | sbp | Systolic BP (mmHg) |
33 | dbp | Diastolic BP (mmHg) |
34 | htn | Hypertension status (1=yes, 0=no) |
35 | pmon | Person Months of Follow-up from Exam Date |
36 | d_total | Total mortality (0=alive, 1=death) |
37 | d_cvd | CVD mortality (0=alive, 1=death) |
38 | d_cancer | Cancer mortality (0=alive, 1=death) |
39 | smkstat | Smoking status (1 never, 2 former, 3 current) |
40 | educ | Education (1 <High School, 2 High School, 3 College+) |
Park SK, Zhao Z, Mukherjee B. Construction of environmental risk score beyond standard linear models using machine learning methods: Application to metal mixtures, oxidative stress and cardiovascular disease in NHANES. Environ Health 2017;16(1):102.↩