Download - JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 [email protected] Essentials of Complex Data Analysis Utilizing National Survey

JENNIFER SAYLOR, PHD, RN, ANCS-BC

UNIVERSITY OF DELAWARESEPTEMBER 14, [email protected]

Essentials of Complex Data Analysis Utilizing National

Survey Data

National Surveys: Advantages

Guides actions & policies to improve the

healthAllows researchers to ask & answer

questions on a population level from previously collected data

Available data without any replication of

effort

Prohibitive cost of obtaining primary data

Probability-based complex sample

designs

National Surveys: Challenges

Locating a dataset that includes the variables to address the research question

Additions or deletions of variables and differences in methods of assessment of variables in different waves (years) in each survey

Merging multiple data files

Analyzing data using complex sample design

Complex Sample Designs

Used to acquire representation of an entire population using a sample of the population

More efficient than simple random samples: Do not require complete enumeration of the population Allows researchers to visit compact areas to obtain in

person data (interview or laboratory) Able to oversample small or sensitive subgroups to

adequately represent their variability Assuming simple random sampling gives low

variance estimates without accounting for the correlation among individuals within clusters yielding biased results

Complex Sample Analysis

Accurately estimates population means and frequencies of the population from the sample after taking into account over or under sampling of specific groups

Statistical Programs to address sampling design elements such as stratification, clusters, and weights SUDAAN®, Complex Sample analysis in SPSS®, & survey

procedures in SAS® Weighting without complex samples analysis leads to

grossly reduced estimates of population variability Estimates are computed as if the measures were obtained from

the number of cases in the entire population rather than the number of cases in the sample in the data set

Clusters include participants who are more similar to one another than those in another cluster

National Health and Nutrition Examination Survey

Combination of health and nutrition questionnaires and physical examination to assess the health and nutritional status of adults and children in the United States1.

National Health and Nutrition Examination Survey (NHANES) data can be used to: Produce estimates of personal health conditions Vital statistics Establish national standards for measurements (i.e. blood pressure) Analyze risk factors for diseases Examine disparities in health status

60-minute Interview: demographic, socioeconomic, dietary, & health-related questions

Physical Examination: most performed in mobile examination centers- medical, dental, physiological measurements, & laboratory tests depending on the participants’ age & gender

NHANES Sampling Plan

Probability-based complex sample design represents the civilian, non-institutionalized U.S. population.

Excludes individuals: Residing in nursing homes Armed forces Institutionalized U.S. nationals living outside the U.S

Stage 1: Fifteen Primary Sampling Units (PSUs)- counties or small groups of contiguous counties depending on the population of the counties; sample size in each PSU is approximately 5,000 examined participants per year.Stage 2: Segments within the PSUs are selected- a cluster of households in a block or a group of blocks depending on population density Stage 3: Households within the segments are selected Stage 4: One or more participants within the households are randomly selected

Diagram: NHANES Sampling Plan2

NHANES 2007-2008 Sampling

Oversamples small and sensitive subgroups: Persons over 60 African Americans Low income population Entire Hispanic population (not only Mexican

Americans)A representative sample of these groups by age,

sex, and income Reliable and precise health status indicator

estimatesEach NHANES 2007-2008 participant represents

approximately 50,000 other U.S. residents.

NHANES Survey Weights

Purpose: account for oversampling, survey non-response, & post-stratification

Sample weights are assigned to each person based on the number of people they represent within the U.S. Census non-institutionalized civilian population

NHANES provides three weights (2 & 4 year increments) Interview weights: all people interviewed Medical examination weights: interviewed & medical

examination Fasting laboratory weights: interviewed, medical

examination, & fasting laboratory tests

Creating Analytical File from NHANES

Download data files & codebooks Combination of 11 individual data files in NHANES 07-08

Transfer text data files to statistical package (SPSS, SAS)

Screen each data files for unused variables & delete If names of variables or responses changed between

multiple survey years, rename variables and recode responses before merging

Merge different data files by sequence number assigned to each participant Data collected varied by age; Files do not have the same

number of records in each file Recode 07-08 variables to create the study variable

2007-2008 Files Merged to Create Analytical File

Data Analysis Prep In SPSS complex sample analysis: a complex sample plan file is created with NHANES 2007-2008 2 year fasting laboratory weight (WTSAF2YR) and design variables: strata (SDMVSTR) and cluster (SDMVPSU

From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237.

Comparison of Descriptive Statistics of Categorical Data

Frequencies: weighting & complex sample results are the same because the sample size is the same.

Race/Ethnicity variable: Un-Weighting- Racial

minorities account for 52% of the sample (oversampled)

Complex Samples- Racial minorities account for only 30% when estimated for the U.S. population using (more representative of the U.S.)

Comparison of Descriptive Statistics of Continuous Data

Mean for each continuous variable changes when weighted Mean remains the same with weighting and complex samples

analysis The proportion of cases with each value remains constant

Standard error of the mean for weighting Almost non-existent when weighting since the sample size appears to be

the entire population Complex samples analysis mean and standard error

Accurate because the mean is estimated for the entire population based on calculations of the # of cases from which data are obtained

Note. Not all the variables from the metabolic syndrome study are presented in the table; * measured using Patient Health QuestionnaireFrom: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237

Comparison of Logistic Regression: Education Regressed on Metabolic

Syndrome

Logistic Regression with two dichotomous variables Education < a high school education =0 and > high school education =1 Metabolic syndrome is coded as absent =0 and present =1

Chi-Square is the statistical methodology, which analyzes frequencies that are not affected by dispersion (only small differences)

Un-weighted analysis: those who have less than high-school education are 62% more likely to have metabolic syndrome

Weighted and complex samples analysis: likelihood increases to 74% Odds Ratio is the same for the weighted and complex sampling analysis,

but the 95% CI are unrealistically narrow for the weighted analysis

Notes. See below for coding . From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237

Comparison of Linear Regression: Depressive Symptoms Regressed on Diet

Linear Regression with two continuous variablesLinear Un-weighted and weighted data: biased results

Depressive symptoms- measured via Patient Health Questionnaire (PHQ-9)

Diet- measured as the number of calories consumed per dayComplex samples analysis: depressive symptoms do

not predict (p = .151) diet Depressive symptoms predict diet (p = .006, p < .001,

respectively

Notes. See below for coding . From: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237

Limitations: NHANES & Other Secondary Data

Cross-sectional data: unable to determine causality

Unable to control definitions of variables, measurement, & data collection Unable to exclude subjects with a history of

psychosis since NHANES does not collect these data Definition of smoking limited due to collected

NHANES 07-08 data in the metabolic syndrome study

Unable to choose how depressive symptoms were measured

Conclusion

Use of national data sets allows use of extensive, expensive, well documented survey data for exploratory questions but limits analysis to those variables included in the data set

Large sample: examine multiple predictors & interactive relationships

Challenges of National databases Merging data files Differentiating the availability of data in different waves of

surveys Using complex sampling techniques to provide a

representative sampleComplex samples data analysis programs allows

inclusion of sampling design elements (Stratification, Clusters, & weights) Provides unbiased population estimates of frequencies, means,

& variability Provide results representative of the US population

FOR FURTHER INFORMATION, PLEASE CONTACT ME AT

[email protected].

THANK YOU

Questions or Comments?

mailto:[email protected]

References

1 NCHS. (2009). NHANES 2007-2008 public data general release file documentation, from http://www.cdc.gov/nchs/nhanes/nhanes2007-2008/generaldoc_e.htm

2 NCHS. (2010). Continuous NHANES Web Tutorial: Sample Design, from http://www.cdc.gov/nchs/tutorials/Nhanes/SurveyDesign/SampleDesign/intro.htm

Labeled Tables: Saylor, J., Friedmann, E. & Lee, H. J. (2012). Navigating complex sample analysis using national survey data. Nursing Research, 61 (3), 231–237.

http://www.cdc.gov/nchs/nhanes/nhanes2007-2008/generaldoc_e.htm

http://www.cdc.gov/nchs/nhanes/nhanes2007-2008/generaldoc_e.htm

http://www.cdc.gov/nchs/tutorials/Nhanes/SurveyDesign/SampleDesign/intro.htm



Download - JENNIFER SAYLOR, PHD, RN, ANCS-BC UNIVERSITY OF DELAWARE SEPTEMBER 14, 2012 [email protected] Essentials of Complex Data Analysis Utilizing National Survey

Top Related