planning how to create the variables you need from the variables you have jane e. miller, phd the...

26
Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition.

Upload: mae-brooks

Post on 16-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Planning how to create the variables you need

from the variables you have

Jane E. Miller, PhD

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 2: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Overview

• Why researchers sometimes need to create new variables to conduct their analysis

• Why it is important to plan ahead for how to create those new variables

• What information is required to identify the new variables needed for the research question

• How to write clear instructions on how to get from the variables you have to the variables you need

The Chicago Guide to Writing about Numbers, 2nd Edition.

Page 3: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Why create new variables?• For many statistical analyses, variables available on

the original data set are not yet in the form needed to address the research question of interest.

• Examples:– You want to study total family income, but the data set has

separate variables measuring income components such as earned income, government benefits, and alimony.

– You want to compare outcomes for age groups (children, working age adults, and the elderly), but the data set reports respondent’s age in single years.

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 4: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Conceptualizing the new variable should precede programming it

• Important to separate – Researching and planning how those variables

should be defined– Programming the new variable in an electronic

database

• Each of those tasks – Has its own challenging aspects– Uses different

• Skills• Resources

Page 5: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Some common patterns of creating new from existing variables

• A categorical version of a continuous variable• A simplified (collapsed) categorical variable• A binary indicator from a continuous variable• A new continuous variable that combines 2+

continuous variables• A mathematical transformation of a continuous

variable

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 6: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

A categorical version of a continuous variable

• Original variable– Age in years (continuous)

• Needed variable– Age group (categorical)

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 7: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

A simplified (collapsed) categorical variable

• Original variable– Ten-category ethnicity variable

• Needed variable– Three-category ethnicity variable

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 8: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

A binary indicator from a continuous variable

• Original variable– Birth weight in grams (continuous)

• Needed variable– Indicator of low birth weight status (yes or no)

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 9: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

A new continuous variable that aggregates 2+ continuous variables

Original variable(s) New variableSeparate measures of income for each family member

Total family income

Multiple attitudinal items A composite attitudinal scale

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 10: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

A new continuous variable calculated from 2+ continuous variables

Original variable(s) New variableSeparate measure of county-level population and poverty rate

Number of poor persons in the county = population × % poor

Separate measures of weight (kg.) and height (meters)

Body Mass Index = weight/(height2)

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 11: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

A mathematical transformation of a continuous variable

Original variable(s) New variableIncome in dollars Logged income

Income in dollars Income in thousands of dollars

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 12: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Planning steps for creating new variables

• Finding relevant variables in the original data set• Becoming acquainted with the units and categories

for available variables• Consulting the published literature on the topic to

see how those concepts have been measured or classified by other researchers

• Identifying pertinent formulas and thresholds• Writing out the logic or math needed to create the

new variables from existing variables

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 13: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Steps toward creating a new variable

1. Identify the name(s) of the original variable(s) in the data set that contain the data needed to create the new variable.

2. For the new variable, devise– A name (acronym) to convey

• Content (meaning) of the new variable• The dates or survey rounds when the data were

collected, if pertinent– A label (short descriptive phrase) for the new variable

• Mention units, if pertinent

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 14: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

For new continuous variables

• Write the formula to calculate the value of the new variable from the original variables.

• Specify the units of the original variable(s) and the new variable.

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 15: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Example: Calculating course grades from component test scores

• For a hypothetical college course, the overall course grade is based on three exam scores– Two mid-term exams (EXAM1 and EXAM2)

• Each scored from 0 to 25 points

– A final exam (FINAL)• Scored from 0 to 50 points

• For each student, the instructor wants to calculate– The percentage of questions s/he got correct on exam 1– Total numeric course grade– Course letter grade, based on standard grade cutoffs

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 16: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Calculating percentage of exam questions correct from number of questions correct

• Logic: From the information in the data set, how does one calculate the percentage of questions correct?

• Concepts: Percentage of questions correct is number of questions correct divided by the total number of questions on the exam, multiplied by 100.

• Formula: Replace concepts with names of variables:

PCCOREX1 = (EXAM1/25) * 100

STEP 2: name for new variable, not

yet in data set.

STEP 1: Identify existing variables, already in data set from which new variable will be calculated.

STEP 3: Write the mathematical formulaThe Chicago Guide to Writing about Numbers, 2nd edition.

Page 17: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Creating a variable for total numeric course grade from exam scores

• Logic: From the information in the data set, how does one calculate total numeric course grade?

• Concepts: Overall numeric course grade is the sum of the three exam scores.

• Formula: Replace concepts with names of variables:

TOTGRADE = EXAM1 + EXAM2 + FINAL

STEP 2: name for new variable, not

yet in data set.

STEP 1: Identify existing variables, already in data set from which new variable will be calculated.

STEP 3: Write the mathematical formulaThe Chicago Guide to Writing about Numbers, 2nd edition.

Page 18: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

For new categorical variables• Write the logical steps to classify the values of the

original variable into the values of the new variable.• Show how every possible value of the original

variable maps into a value of the new variable.• List the

– Value label (descriptive phrase) for each value (category) of the new variable;

– Code (numeric value) that the new variable will take on for each value or set of values of the original variable.

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 19: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Classifying numeric course grades into letter grade ranges

TOTGRADE Variable Label: Numeric course grade

LETTRGRDVariable Label: Final letter grade

Values of original variableValues (codes) of

new variable Value labels<60 1 F60 TO 69 2 D70 TO 79 3 C80 TO 89 4 B90 OR HIGHER 5 A

STEP 2: name for new variable, not yet in data set.

STEP 1: Identify existing variables from which new variable will be

created.

STEP 3: Write the logic for classifying the numeric scores into letter grade ranges, based on the university’s standard grade cutoffs. E.g., scores below 60 are classified an “F.”

Page 20: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Missing values for the new variable

• Provide instructions to ensure that cases that have missing values on the original variables will also have missing values for new variables that are based on them.

• Needed whether the new variable was created using– A formula– Classification instructions

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 21: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Summary• It is often necessary to create new variables to

answer one’s research question. • Planning steps for creating new variables include

– Identifying source variables available in a data set– Finding references about how such variables are

conventionally analyzed– Becoming familiar with units or categories of the variables– Writing formulas or classification instructions to create the

new variables from the original variables– Providing instructions about missing values for the original

and new variables

The Chicago Guide to Writing about Numbers, 2nd Edition.

Page 22: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Summary, cont.

• With the formulas and classification instructions for creating the new variables, one can then use a spreadsheet or statistical software to create those variables within an electronic data set.

• Separate – The researching and planning steps – The programming steps

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 23: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Suggested resources

• Miller, J. E. 2015. The Chicago Guide to Writing about Numbers, 2nd Edition. University of Chicago Press, chapter 10.

The Chicago Guide to Writing about Numbers, 2nd edition.

Page 24: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Suggested practice exercises

The Chicago Guide to Writing about Numbers, 2nd Edition.

NAME of original variable ______________________LABEL for original variable ______________________

NAME of new variable _______________________

LABEL for new variable _______________________

Values of original variable Values (codes) of new variableValue labels of new

variable

Instructions and a planning template can be downloaded from the supplemental online materials at http://press.uchicago.edu/books/miller/numbers/index.htm

Page 25: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Suggested online appendixes• How to Create the Variables You Need from the

Variables You Have– Exercise includes

• Step-by-step instructions• A template planning grid for a new categorical variable

– Paper for instructors on how to teach the concepts and skills

• Getting to Know Your Variables– Exercise to familiarize researchers with the concepts, units,

categories of variables in their data set– Paper for instructors on how to teach the concepts and skills

The Chicago Guide to Writing about Numbers, 2nd Edition.

Page 26: Planning how to create the variables you need from the variables you have Jane E. Miller, PhD The Chicago Guide to Writing about Numbers, 2 nd edition

Contact information

Jane E. Miller, [email protected]

Online materials available athttp://press.uchicago.edu/books/miller/numbers/index.html

The Chicago Guide to Writing about Numbers, 2nd Edition.