Real Data Exercise

Synopsis

Environmental pollutant exposures have been associated with various health outcomes. Most epidemiologic studies have examined environmental pollutants individually, but in real life, we are exposed to pollutant mixtures, not single chemicals. Although multi-pollutant approaches have been recognized recently, analytical challenges exist such as numerous potential exposures of interest, high degrees of correlation between some of these exposures, and high dimensions. New methods for epidemiologic analysis of mixtures are being developed, but it is not well understood how well they perform or how well they identify key, causal risk factors.

Dataset

exercise_data.csv: This is the dataset used in Park et al. (2017)1. It contains 9664 subjects and 40 variables from NHANES 2003-2014. See the Data Dictionary at the end.

Objectives

To identify important predictors of GGT, using three different methods discussed above:

For the conventional methods and regularized regression methods, include quadratic terms and pairwise interactions among 20 predictors.

Variable Dictionary

No Variable Label
1 seqn Participant ID
2 cycle NHANES cycle (3=2003-2004, 4=2005-2006, 5=2007-2008, 6=2009-2010, 7=2011-2012, 8=2013-2014)
3 sex Sex (1 male, 2 female)
4 age Age (years)
5 raceeth Race/Ethnicity (1 Mexican American, 2 Other Hispanic, 3 Non-Hispanic white, 4 Non-Hispanic black, 5 Other)
6 psu Primary Sampling Unit (PSU)
7 strata Sampling Strata
8 bmi Body Mass Index (kg/m2)
9 wt2yr Two-year weights of subsample
10 ucr Creatinine, urine (mg/dL)
11 ggt Gamma-Glutamyl Transferase (GGT) (U/L)
12 bpb Lead, blood (ug/dL)
13 bcd Cadmium, blood (ug/L)
14 bhg Total Mercury, blood (ug/L)
15 utas Urinary total Arsenic (\(\mu g/L\))
16 uas3 Urinary Arsenous acid (\(\mu g/L\))
17 uas5 Urinary Arsenic acid (\(\mu g/L\))
18 uab Urinary Arsenobetaine (\(\mu g/L\))
19 uac Urinary Arsenocholine (\(\mu g/L\))
20 udma Urinary Dimethylarsonic acid (\(\mu g/L\))
21 umma Urinary Monomethylacrsonic acid (\(\mu g/L\))
22 uba Barium, urine (\(\mu g/L\))
23 ucd Cadmium, urine (\(\mu g/L\))
24 uco Cobalt, urine (\(\mu g/L\))
25 ucs Cesium, urine (\(\mu g/L\))
26 umo Molybdenum, urine (\(\mu g/L\))
27 upb Lead, urine (\(\mu g/L\))
28 usb Antimony, urine (\(\mu g/L\))
29 utl Thallium, urine (\(\mu g/L\))
30 utu Tungsten, urine (\(\mu g/L\))
31 uur Uranium, urine (\(\mu g/L\))
32 sbp Systolic BP (mmHg)
33 dbp Diastolic BP (mmHg)
34 htn Hypertension status (1=yes, 0=no)
35 pmon Person Months of Follow-up from Exam Date
36 d_total Total mortality (0=alive, 1=death)
37 d_cvd CVD mortality (0=alive, 1=death)
38 d_cancer Cancer mortality (0=alive, 1=death)
39 smkstat Smoking status (1 never, 2 former, 3 current)
40 educ Education (1 <High School, 2 High School, 3 College+)


  1. Park SK, Zhao Z, Mukherjee B. Construction of environmental risk score beyond standard linear models using machine learning methods: Application to metal mixtures, oxidative stress and cardiovascular disease in NHANES. Environ Health 2017;16(1):102.