planning how to create the variables you need from the variables you have jane e. miller, phd the...

Post on 16-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Planning how to create the variables you need

from the variables you have

Jane E. Miller, PhD

The Chicago Guide to Writing about Numbers, 2nd edition.

Overview

• Why researchers sometimes need to create new variables to conduct their analysis

• Why it is important to plan ahead for how to create those new variables

• What information is required to identify the new variables needed for the research question

• How to write clear instructions on how to get from the variables you have to the variables you need

The Chicago Guide to Writing about Numbers, 2nd Edition.

Why create new variables?• For many statistical analyses, variables available on

the original data set are not yet in the form needed to address the research question of interest.

• Examples:– You want to study total family income, but the data set has

separate variables measuring income components such as earned income, government benefits, and alimony.

– You want to compare outcomes for age groups (children, working age adults, and the elderly), but the data set reports respondent’s age in single years.

The Chicago Guide to Writing about Numbers, 2nd edition.

Conceptualizing the new variable should precede programming it

• Important to separate – Researching and planning how those variables

should be defined– Programming the new variable in an electronic

database

• Each of those tasks – Has its own challenging aspects– Uses different

• Skills• Resources

Some common patterns of creating new from existing variables

• A categorical version of a continuous variable• A simplified (collapsed) categorical variable• A binary indicator from a continuous variable• A new continuous variable that combines 2+

continuous variables• A mathematical transformation of a continuous

variable

The Chicago Guide to Writing about Numbers, 2nd edition.

A categorical version of a continuous variable

• Original variable– Age in years (continuous)

• Needed variable– Age group (categorical)

The Chicago Guide to Writing about Numbers, 2nd edition.

A simplified (collapsed) categorical variable

• Original variable– Ten-category ethnicity variable

• Needed variable– Three-category ethnicity variable

The Chicago Guide to Writing about Numbers, 2nd edition.

A binary indicator from a continuous variable

• Original variable– Birth weight in grams (continuous)

• Needed variable– Indicator of low birth weight status (yes or no)

The Chicago Guide to Writing about Numbers, 2nd edition.

A new continuous variable that aggregates 2+ continuous variables

Original variable(s) New variableSeparate measures of income for each family member

Total family income

Multiple attitudinal items A composite attitudinal scale

The Chicago Guide to Writing about Numbers, 2nd edition.

A new continuous variable calculated from 2+ continuous variables

Original variable(s) New variableSeparate measure of county-level population and poverty rate

Number of poor persons in the county = population × % poor

Separate measures of weight (kg.) and height (meters)

Body Mass Index = weight/(height2)

The Chicago Guide to Writing about Numbers, 2nd edition.

A mathematical transformation of a continuous variable

Original variable(s) New variableIncome in dollars Logged income

Income in dollars Income in thousands of dollars

The Chicago Guide to Writing about Numbers, 2nd edition.

Planning steps for creating new variables

• Finding relevant variables in the original data set• Becoming acquainted with the units and categories

for available variables• Consulting the published literature on the topic to

see how those concepts have been measured or classified by other researchers

• Identifying pertinent formulas and thresholds• Writing out the logic or math needed to create the

new variables from existing variables

The Chicago Guide to Writing about Numbers, 2nd edition.

Steps toward creating a new variable

1. Identify the name(s) of the original variable(s) in the data set that contain the data needed to create the new variable.

2. For the new variable, devise– A name (acronym) to convey

• Content (meaning) of the new variable• The dates or survey rounds when the data were

collected, if pertinent– A label (short descriptive phrase) for the new variable

• Mention units, if pertinent

The Chicago Guide to Writing about Numbers, 2nd edition.

For new continuous variables

• Write the formula to calculate the value of the new variable from the original variables.

• Specify the units of the original variable(s) and the new variable.

The Chicago Guide to Writing about Numbers, 2nd edition.

Example: Calculating course grades from component test scores

• For a hypothetical college course, the overall course grade is based on three exam scores– Two mid-term exams (EXAM1 and EXAM2)

• Each scored from 0 to 25 points

– A final exam (FINAL)• Scored from 0 to 50 points

• For each student, the instructor wants to calculate– The percentage of questions s/he got correct on exam 1– Total numeric course grade– Course letter grade, based on standard grade cutoffs

The Chicago Guide to Writing about Numbers, 2nd edition.

Calculating percentage of exam questions correct from number of questions correct

• Logic: From the information in the data set, how does one calculate the percentage of questions correct?

• Concepts: Percentage of questions correct is number of questions correct divided by the total number of questions on the exam, multiplied by 100.

• Formula: Replace concepts with names of variables:

PCCOREX1 = (EXAM1/25) * 100

STEP 2: name for new variable, not

yet in data set.

STEP 1: Identify existing variables, already in data set from which new variable will be calculated.

STEP 3: Write the mathematical formulaThe Chicago Guide to Writing about Numbers, 2nd edition.

Creating a variable for total numeric course grade from exam scores

• Logic: From the information in the data set, how does one calculate total numeric course grade?

• Concepts: Overall numeric course grade is the sum of the three exam scores.

• Formula: Replace concepts with names of variables:

TOTGRADE = EXAM1 + EXAM2 + FINAL

STEP 2: name for new variable, not

yet in data set.

STEP 1: Identify existing variables, already in data set from which new variable will be calculated.

STEP 3: Write the mathematical formulaThe Chicago Guide to Writing about Numbers, 2nd edition.

For new categorical variables• Write the logical steps to classify the values of the

original variable into the values of the new variable.• Show how every possible value of the original

variable maps into a value of the new variable.• List the

– Value label (descriptive phrase) for each value (category) of the new variable;

– Code (numeric value) that the new variable will take on for each value or set of values of the original variable.

The Chicago Guide to Writing about Numbers, 2nd edition.

Classifying numeric course grades into letter grade ranges

TOTGRADE Variable Label: Numeric course grade

LETTRGRDVariable Label: Final letter grade

Values of original variableValues (codes) of

new variable Value labels<60 1 F60 TO 69 2 D70 TO 79 3 C80 TO 89 4 B90 OR HIGHER 5 A

STEP 2: name for new variable, not yet in data set.

STEP 1: Identify existing variables from which new variable will be

created.

STEP 3: Write the logic for classifying the numeric scores into letter grade ranges, based on the university’s standard grade cutoffs. E.g., scores below 60 are classified an “F.”

Missing values for the new variable

• Provide instructions to ensure that cases that have missing values on the original variables will also have missing values for new variables that are based on them.

• Needed whether the new variable was created using– A formula– Classification instructions

The Chicago Guide to Writing about Numbers, 2nd edition.

Summary• It is often necessary to create new variables to

answer one’s research question. • Planning steps for creating new variables include

– Identifying source variables available in a data set– Finding references about how such variables are

conventionally analyzed– Becoming familiar with units or categories of the variables– Writing formulas or classification instructions to create the

new variables from the original variables– Providing instructions about missing values for the original

and new variables

The Chicago Guide to Writing about Numbers, 2nd Edition.

Summary, cont.

• With the formulas and classification instructions for creating the new variables, one can then use a spreadsheet or statistical software to create those variables within an electronic data set.

• Separate – The researching and planning steps – The programming steps

The Chicago Guide to Writing about Numbers, 2nd edition.

Suggested resources

• Miller, J. E. 2015. The Chicago Guide to Writing about Numbers, 2nd Edition. University of Chicago Press, chapter 10.

The Chicago Guide to Writing about Numbers, 2nd edition.

Suggested practice exercises

The Chicago Guide to Writing about Numbers, 2nd Edition.

NAME of original variable ______________________LABEL for original variable ______________________

NAME of new variable _______________________

LABEL for new variable _______________________

Values of original variable Values (codes) of new variableValue labels of new

variable

Instructions and a planning template can be downloaded from the supplemental online materials at http://press.uchicago.edu/books/miller/numbers/index.htm

Suggested online appendixes• How to Create the Variables You Need from the

Variables You Have– Exercise includes

• Step-by-step instructions• A template planning grid for a new categorical variable

– Paper for instructors on how to teach the concepts and skills

• Getting to Know Your Variables– Exercise to familiarize researchers with the concepts, units,

categories of variables in their data set– Paper for instructors on how to teach the concepts and skills

The Chicago Guide to Writing about Numbers, 2nd Edition.

Contact information

Jane E. Miller, PhDjmiller@ifh.rutgers.edu

Online materials available athttp://press.uchicago.edu/books/miller/numbers/index.html

The Chicago Guide to Writing about Numbers, 2nd Edition.

top related