survey documentation and analysis (sda)

74
Survey Survey Documentation and Documentation and Analysis (SDA) Analysis (SDA)

Upload: thelma

Post on 12-Jan-2016

28 views

Category:

Documents


1 download

DESCRIPTION

Survey Documentation and Analysis (SDA). Workshop Agenda. Overview What is online analysis? Available SDA data sets Statistical procedures (Frequencies, Crosstabs, Regression) Recoding, subsetting, downloading Teaching resources for SDA and developing instructional materials. SSRIC. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Survey Documentation and Analysis (SDA)

Survey Documentation Survey Documentation and Analysis (SDA)and Analysis (SDA)

Page 2: Survey Documentation and Analysis (SDA)

Workshop AgendaWorkshop Agenda

OverviewOverview What is online analysis?What is online analysis? Available SDA data setsAvailable SDA data sets Statistical procedures (Frequencies, Statistical procedures (Frequencies,

Crosstabs, Regression)Crosstabs, Regression) Recoding, subsetting, downloadingRecoding, subsetting, downloading Teaching resources for SDA and Teaching resources for SDA and

developing instructional materialsdeveloping instructional materials

Page 3: Survey Documentation and Analysis (SDA)

SSRICSSRICSocial Science Research & Instructional CouncilSocial Science Research & Instructional Council

http://www.ssric.orghttp://www.ssric.org

Page 4: Survey Documentation and Analysis (SDA)

The CouncilThe Council Oldest CSU discipline councilOldest CSU discipline council

Founded in 1972Founded in 1972 Representatives from CSU campuses Representatives from CSU campuses

meet three times per yearmeet three times per year Negotiates with data providers for Negotiates with data providers for

access to dataaccess to data Promotes use of data analysis in Promotes use of data analysis in

research and teachingresearch and teaching

Page 5: Survey Documentation and Analysis (SDA)

The CouncilThe Council Annual student research conferenceAnnual student research conference

at CSU Long Beach in 2008at CSU Long Beach in 2008 at CSU Sacramento in 2009at CSU Sacramento in 2009

Sponsors travel to ICPSR summer Sponsors travel to ICPSR summer workshops in Ann Arbor, Michiganworkshops in Ann Arbor, Michigan http://www.ssric.org/participate/icpsr_summerhttp://www.ssric.org/participate/icpsr_summer

Works with Field ResearchWorks with Field Research Question credits to California Field PollQuestion credits to California Field Poll Selects faculty fellowSelects faculty fellow

Page 6: Survey Documentation and Analysis (SDA)

What is Online Analysis?What is Online Analysis?

““Online data analysis" refers to the ability Online data analysis" refers to the ability to perform statistical analysis using special to perform statistical analysis using special Web-based software as an alternative to Web-based software as an alternative to downloading data into a standalone downloading data into a standalone statistical package on your computer. statistical package on your computer.

The software we’re using is called Survey The software we’re using is called Survey Documentation and Analysis (SDA), which Documentation and Analysis (SDA), which was developed at the University of was developed at the University of California, Berkeley. California, Berkeley.

Page 7: Survey Documentation and Analysis (SDA)

Alternative Statistical PackagesAlternative Statistical Packages

You can get a complete list of available You can get a complete list of available online statistical packages at online statistical packages at http://statpages.org/http://statpages.org/

Some of these include:Some of these include: OpenStatOpenStat ViStaViSta StatextStatext SISASISA

Page 8: Survey Documentation and Analysis (SDA)

AdvantagesAdvantages

Many like SDA are free – don’t require a Many like SDA are free – don’t require a site licensesite license

Only require a computer with an internet Only require a computer with an internet connectionconnection

Some like SDA are easy to learnSome like SDA are easy to learn Can show students how to use some of Can show students how to use some of

them in 30 minutes or lessthem in 30 minutes or less

Page 9: Survey Documentation and Analysis (SDA)

DisadvantagesDisadvantages

Some online statistical packages (certainly Some online statistical packages (certainly not all) are limited in what they can do not all) are limited in what they can do statisticallystatistically

Documentation is not very good for someDocumentation is not very good for some Some (like SDA) can only be used with Some (like SDA) can only be used with

data sets that have already been created data sets that have already been created in a format that can be read by that in a format that can be read by that packagepackage

Page 10: Survey Documentation and Analysis (SDA)

Available SDA Data SetsAvailable SDA Data Sets

Page 11: Survey Documentation and Analysis (SDA)

SDA Data SetsSDA Data Sets

While SDA is an extremely easy statistical While SDA is an extremely easy statistical package to learn to use, it’s difficult to package to learn to use, it’s difficult to create SDA data sets.create SDA data sets.

You have to purchase a SDA site license You have to purchase a SDA site license to create a data set and then learn how to to create a data set and then learn how to use it.use it.

So we typically use SDA data sets that So we typically use SDA data sets that have been created for us.have been created for us.

Page 12: Survey Documentation and Analysis (SDA)

Sources for SDA Data SetsSources for SDA Data Sets

SDA Archive located at UC Berkeley (SDA Archive located at UC Berkeley (http://sda.berkeley.edu/archive.htmhttp://sda.berkeley.edu/archive.htm) )

ICPSR Topical Archives (ICPSR Topical Archives (http://www.icpsr.org/cocoon/ICPSR/all/archives.xmlhttp://www.icpsr.org/cocoon/ICPSR/all/archives.xml) )

Field data located at UC Berkeley Field data located at UC Berkeley ((http://ucdata.berkeley.edu/data_record.php?http://ucdata.berkeley.edu/data_record.php?recid=3#analyzerecid=3#analyze) )

List of SDA data sets at CSU Long Beach List of SDA data sets at CSU Long Beach (http://www.csulb.edu/library/eref/datasets.html) (http://www.csulb.edu/library/eref/datasets.html)

University of Denver’s IDEA project University of Denver’s IDEA project (http://www.du.edu/idea/data.htm (http://www.du.edu/idea/data.htm

Page 13: Survey Documentation and Analysis (SDA)

SDA Archive at UC Berkeley SDA Archive at UC Berkeley

(http://sda.berkeley.edu/archive.htm (http://sda.berkeley.edu/archive.htm) ) GSS Cumulative Datafile (1972-2008; GSS Cumulative Datafile (1972-2008;

2008 is a preliminary version).2008 is a preliminary version). ANES Cumulative Datafile (1948-2000) ANES Cumulative Datafile (1948-2000)

and ANES datafiles for 1996, 2000, and and ANES datafiles for 1996, 2000, and 2004.2004.

Census microdata including 2000-2003 Census microdata including 2000-2003 American Community Surveys and 1990 American Community Surveys and 1990 and 2000 U.S. 1% PUMS with separate and 2000 U.S. 1% PUMS with separate files for 2000 and 1990 California PUMS.files for 2000 and 1990 California PUMS.

Page 14: Survey Documentation and Analysis (SDA)

ICPSRICPSR

National Archive of Computerized Data on Aging National Archive of Computerized Data on Aging (http://www.icpsr.umich.edu/NACDA/) (http://www.icpsr.umich.edu/NACDA/)

National Archive of Criminal Justice Data National Archive of Criminal Justice Data (http://www.icpsr.umich.edu/NACJD/) (http://www.icpsr.umich.edu/NACJD/)

Substance Abuse and Mental Health Data Substance Abuse and Mental Health Data Archive (http://www.icpsr.umich.edu/SAMHDA/) Archive (http://www.icpsr.umich.edu/SAMHDA/)

International Archive of Education Data International Archive of Education Data (http://www.icpsr.umich.edu/IAED/) (http://www.icpsr.umich.edu/IAED/)

Page 15: Survey Documentation and Analysis (SDA)

Field DataField Data http://ucdata.berkeley.edu/data_record.php?recid=3#analyzehttp://ucdata.berkeley.edu/data_record.php?recid=3#analyze

Field Polls from 1956 through 2006 are Field Polls from 1956 through 2006 are available as publicly-accessible SDA data available as publicly-accessible SDA data setssets

More recent Field Polls are available as More recent Field Polls are available as SPSS data sets (through FTP) for CSU SPSS data sets (through FTP) for CSU faculty, staff, and students. faculty, staff, and students.

Page 16: Survey Documentation and Analysis (SDA)

Other Sources of SDA Data Sets Other Sources of SDA Data Sets at ICPSRat ICPSR

Voting Behavior: The 2004 Election by Voting Behavior: The 2004 Election by Charles Prysby and Carmine Scavo Charles Prysby and Carmine Scavo (http://www.icpsr.umich.edu/SETUPS/) (http://www.icpsr.umich.edu/SETUPS/)

Investigating Community and Social Investigating Community and Social Capital by Lori Weber Capital by Lori Weber (http://www.icpsr.umich.edu/ICSC/index.ht(http://www.icpsr.umich.edu/ICSC/index.htm) m)

Page 17: Survey Documentation and Analysis (SDA)

Statistical ProceduresStatistical Procedures

Page 18: Survey Documentation and Analysis (SDA)

Available Statistical ProceduresAvailable Statistical Procedures

Frequencies and crosstabulation Frequencies and crosstabulation (discussed in this workshop)(discussed in this workshop)

Comparison of meansComparison of means Correlation matrixCorrelation matrix Comparison of correlationsComparison of correlations Multiple regression (discussed in this Multiple regression (discussed in this

workshop)workshop) Logit/Probit regressionLogit/Probit regression

Page 19: Survey Documentation and Analysis (SDA)

Using SDAUsing SDA

Select the data setSelect the data set Look at the codebookLook at the codebook Decide what statistical procedure to useDecide what statistical procedure to use Fill in what you want to doFill in what you want to do Run itRun it

Page 20: Survey Documentation and Analysis (SDA)

Data SetData Set

We’re going to use the GSS 1972-2008 We’re going to use the GSS 1972-2008 Cumulative Data File (2008 is preliminary Cumulative Data File (2008 is preliminary data) data) http://sda.berkeley.edu/archive.htmhttp://sda.berkeley.edu/archive.htm

We’re going to use three variablesWe’re going to use three variables SEXSEX RELITENRELITEN PORNLAWPORNLAW

Page 21: Survey Documentation and Analysis (SDA)

FrequenciesFrequencies

List the variables you want to useList the variables you want to use ROW: SEX,RELITEN,PORNLAWROW: SEX,RELITEN,PORNLAW

Click on “Run the Table”Click on “Run the Table”

Page 22: Survey Documentation and Analysis (SDA)
Page 23: Survey Documentation and Analysis (SDA)
Page 24: Survey Documentation and Analysis (SDA)

CrosstabsCrosstabs

Now let’s use RELITEN as our Now let’s use RELITEN as our independent variable and PORNLAW as independent variable and PORNLAW as our dependent variable to create two our dependent variable to create two bivariate crosstabulations.bivariate crosstabulations.

List the variablesList the variables ROW: PORNLAWROW: PORNLAW COLUMN: RELITENCOLUMN: RELITEN

Page 25: Survey Documentation and Analysis (SDA)

Crosstabulation ContinuedCrosstabulation Continued

OptionsOptions Percentaging: columnPercentaging: column StatisticsStatistics Question text Question text Color codingColor coding

Run the TableRun the Table

Page 26: Survey Documentation and Analysis (SDA)
Page 27: Survey Documentation and Analysis (SDA)
Page 28: Survey Documentation and Analysis (SDA)

Your TurnYour Turn

Let’s run two more bivariate crosstabsLet’s run two more bivariate crosstabs Independent variable: SEXIndependent variable: SEX Dependent variables: RELITEN and Dependent variables: RELITEN and

PORNLAWPORNLAW Go ahead and run these crosstabsGo ahead and run these crosstabs

Page 29: Survey Documentation and Analysis (SDA)

What Did we Discover?What Did we Discover?

RELITEN is strongly related to PORNLAW.RELITEN is strongly related to PORNLAW. SEX is also related to both RELITEN and PORNLAW.SEX is also related to both RELITEN and PORNLAW. Could the relationship between RELITEN and Could the relationship between RELITEN and

PORNLAW be spurious? SEX is related to both PORNLAW be spurious? SEX is related to both RELITEN and PORNLAW and could be creating the RELITEN and PORNLAW and could be creating the relationship between RELITEN and PORNLAW.relationship between RELITEN and PORNLAW.

How do we test this possibility? Let’s run a three-How do we test this possibility? Let’s run a three-variable crosstabulation with RELITEN as our variable crosstabulation with RELITEN as our independent variable, PORNLAW as our dependent independent variable, PORNLAW as our dependent variable, and SEX as our control variable.variable, and SEX as our control variable.

Page 30: Survey Documentation and Analysis (SDA)

Multivariate CrosstabulationMultivariate Crosstabulation

List the variablesList the variables ROW: PORNLAWROW: PORNLAW COLUMN: RELITENCOLUMN: RELITEN CONTROL: SEXCONTROL: SEX

OptionsOptions Percentaging: columnPercentaging: column StatisticsStatistics Question text Question text Color codingColor coding

Page 31: Survey Documentation and Analysis (SDA)
Page 32: Survey Documentation and Analysis (SDA)
Page 33: Survey Documentation and Analysis (SDA)
Page 34: Survey Documentation and Analysis (SDA)

SpuriousnessSpuriousness

Was the relationship between RELITEN Was the relationship between RELITEN and PORNLAW spurious due to SEX?and PORNLAW spurious due to SEX?

How do you know?How do you know? Does that mean that the relationship can Does that mean that the relationship can

never be spurious?never be spurious?

Page 35: Survey Documentation and Analysis (SDA)

RegressionRegression

Crosstabulation is used when all the Crosstabulation is used when all the variables are categorical.variables are categorical.

What do we do when our variables are What do we do when our variables are continuous (i.e., interval and/or ratio)?continuous (i.e., interval and/or ratio)?

Regression is the answer.Regression is the answer.

Page 36: Survey Documentation and Analysis (SDA)

Bivariate RegressionBivariate Regression Let’s look at the relationship between the Let’s look at the relationship between the

respondent’s socioeconomic status (SEI) and respondent’s socioeconomic status (SEI) and the amount of television one watches the amount of television one watches (TVHOURS).(TVHOURS).

List the variablesList the variables Dependent: TVHOURSDependent: TVHOURS Independent: SEIIndependent: SEI

OptionsOptions T-TestsT-Tests Correlation matrixCorrelation matrix Color codingColor coding Question TextQuestion Text

Page 37: Survey Documentation and Analysis (SDA)
Page 38: Survey Documentation and Analysis (SDA)
Page 39: Survey Documentation and Analysis (SDA)

Multivariate RegressionMultivariate Regression

Now let’s add in another variable: SEXNow let’s add in another variable: SEX But sex is not a continuous variable. How But sex is not a continuous variable. How

do we enter a variable like SEX into the do we enter a variable like SEX into the regression analysis? Answer: create a regression analysis? Answer: create a dummy variable.dummy variable.

Dummy variables take on the values of 1 Dummy variables take on the values of 1 and 0. and 0.

Page 40: Survey Documentation and Analysis (SDA)

Creating a Dummy VariableCreating a Dummy Variable

SEX (d:1)SEX (d:1) SEX is the name of the variable to want to SEX is the name of the variable to want to

make into a dummy variablemake into a dummy variable d indicates that you want to create a dummy d indicates that you want to create a dummy

variablevariable 1 indicates that the value 1 will be assigned 1 indicates that the value 1 will be assigned

the value 1. All other values will be assigned the value 1. All other values will be assigned the value 0.the value 0.

Run the tableRun the table

Page 41: Survey Documentation and Analysis (SDA)
Page 42: Survey Documentation and Analysis (SDA)
Page 43: Survey Documentation and Analysis (SDA)

Recoding, Subsetting, DownloadingRecoding, Subsetting, Downloading

Page 44: Survey Documentation and Analysis (SDA)

Recoding Existing VariablesRecoding Existing VariablesExample (from GSS Cumulative File): ATTEND (How often Example (from GSS Cumulative File): ATTEND (How often

Respondent attends religious services)Respondent attends religious services)

ATTENDATTEND0 Never0 Never1 Less than once a year1 Less than once a year2 Once a year2 Once a year3 Several times a year3 Several times a year4 Once a month4 Once a month5 2 to 3 times a month5 2 to 3 times a month6 Nearly Every Wk6 Nearly Every Wk7 Every week7 Every week8 More than once a week8 More than once a week9 DK/NA (Missing)9 DK/NA (Missing)

ATTENDRATTENDR1 Seldom (0 to 3)1 Seldom (0 to 3)2 Sometimes (4 to 5)2 Sometimes (4 to 5)3 Often (6 to 8)3 Often (6 to 8)9 Missing (9)9 Missing (9)

Page 45: Survey Documentation and Analysis (SDA)
Page 46: Survey Documentation and Analysis (SDA)
Page 47: Survey Documentation and Analysis (SDA)
Page 48: Survey Documentation and Analysis (SDA)
Page 49: Survey Documentation and Analysis (SDA)

Your TurnYour Turn

Recode AGE into the following categories: Recode AGE into the following categories:

1 = 18-291 = 18-29

2 = 30-642 = 30-64

3 = 65 and older3 = 65 and older

Obtain FREQUENCIES for the resultObtain FREQUENCIES for the result

Page 50: Survey Documentation and Analysis (SDA)

For More Information, See:For More Information, See:

http://sda.berkeley.edu/HELPDOCS/helpnehttp://sda.berkeley.edu/HELPDOCS/helpnewv.htm#recodewv.htm#recode

Page 51: Survey Documentation and Analysis (SDA)

Compute a New VariableCompute a New Variable Example (from GSS Cumulative File): Alienation IndexExample (from GSS Cumulative File): Alienation Index

Create measure of ALIENATION from these variables asked in 1978 Create measure of ALIENATION from these variables asked in 1978 only (all coded as 1=agree, 2=disagree, other = missing data)only (all coded as 1=agree, 2=disagree, other = missing data)

ALIENAT1ALIENAT1 PEOPLE RUNNING COUNTRY DONT CARE PEOPLE RUNNING COUNTRY DONT CARE ALIENAT2ALIENAT2 RICH GET RICHER, POOR POORER RICH GET RICHER, POOR POORER ALIENAT3ALIENAT3 WHAT YOU THINK DOESNT COUNT WHAT YOU THINK DOESNT COUNT ALIENAT4ALIENAT4 YOU'RE LEFT OUT OF THINGS YOU'RE LEFT OUT OF THINGS ALIENAT5ALIENAT5 POWERFUL PEOPLE TAKE ADVANTAGE OF YOU POWERFUL PEOPLE TAKE ADVANTAGE OF YOU ALIENAT6ALIENAT6 PEOPLE IN WASH D.C. ARE OUT OF TOUCH PEOPLE IN WASH D.C. ARE OUT OF TOUCH

Page 52: Survey Documentation and Analysis (SDA)
Page 53: Survey Documentation and Analysis (SDA)
Page 54: Survey Documentation and Analysis (SDA)
Page 55: Survey Documentation and Analysis (SDA)

Your TurnYour Turn

Create an index of parental education: Create an index of parental education: (MAEDUC + PAEDUC)/2(MAEDUC + PAEDUC)/2

Page 56: Survey Documentation and Analysis (SDA)

For More Information, See:For More Information, See:

http://sda.berkeley.edu/HELPDOCS/helpnehttp://sda.berkeley.edu/HELPDOCS/helpnewv.htm#computewv.htm#compute

Page 57: Survey Documentation and Analysis (SDA)

Subsetting and DownloadingSubsetting and Downloading

Example: create and download a subset of Example: create and download a subset of the GSS cumulative file, selecting only the GSS cumulative file, selecting only cases from 2008, all Case Identification cases from 2008, all Case Identification variables and some Personal and Family variables and some Personal and Family Information variables (MARITAL, Information variables (MARITAL, AGEWED, DIVORCE, WIDOWED).AGEWED, DIVORCE, WIDOWED).

At end of each intermediate step, click on At end of each intermediate step, click on “Continue” button.“Continue” button.

Page 58: Survey Documentation and Analysis (SDA)
Page 59: Survey Documentation and Analysis (SDA)
Page 60: Survey Documentation and Analysis (SDA)
Page 61: Survey Documentation and Analysis (SDA)
Page 62: Survey Documentation and Analysis (SDA)

SPSS Syntax FileSPSS Syntax File

Page 63: Survey Documentation and Analysis (SDA)

Creating an SPSS system fileCreating an SPSS system file

Run SPSS (syntax) file against data (ASCII) file.Run SPSS (syntax) file against data (ASCII) file. For more information, see For more information, see

http://www.ssric.org/data/icpsr_direct (scroll down)http://www.ssric.org/data/icpsr_direct (scroll down) http://www.ssric.org/data/icpsr_directhttp://www.ssric.org/data/icpsr_direct (scroll to (scroll to

“Syntax Files”)“Syntax Files”) http://www.icpsr.com/cocoon/ICPSR/FAQ/0062.xmlhttp://www.icpsr.com/cocoon/ICPSR/FAQ/0062.xml http://web.pdx.edu/~stipakb/download/Data/SDA_dathttp://web.pdx.edu/~stipakb/download/Data/SDA_dat

a_to_SPSS.pdfa_to_SPSS.pdf (portions outdated) (portions outdated)

Page 64: Survey Documentation and Analysis (SDA)

File DirectoryFile Directory

Page 65: Survey Documentation and Analysis (SDA)
Page 66: Survey Documentation and Analysis (SDA)
Page 67: Survey Documentation and Analysis (SDA)

Your TurnYour Turn

Subset and download your own custom Subset and download your own custom GSS SPSS system file.GSS SPSS system file.

Page 68: Survey Documentation and Analysis (SDA)

Sample Instructional Applications:Sample Instructional Applications:Crosstabs With a Control VariableCrosstabs With a Control Variable

Page 69: Survey Documentation and Analysis (SDA)

Example 1Example 1

GSS Cumulative File (selecting 2002 and GSS Cumulative File (selecting 2002 and 2004 only): 2004 only):

1.1. Crosstab Voting in 2000 election Crosstab Voting in 2000 election (VOTE00) by computer usage (VOTE00) by computer usage (COMPUSE). (COMPUSE).

2.2. Repeat, but with a control for Repeat, but with a control for respondent’s education level (DEGREE). respondent’s education level (DEGREE).

Page 70: Survey Documentation and Analysis (SDA)

Example 2Example 2

ANES 2004 Study:ANES 2004 Study:Instructor’s note: In addition to using this example in teaching use of Instructor’s note: In addition to using this example in teaching use of

control variables, I also use it in teaching about reactivity in control variables, I also use it in teaching about reactivity in interviewing.interviewing.

1.1. Run frequency distribution for V5205 (Working mother can have Run frequency distribution for V5205 (Working mother can have warm relationship with kids). warm relationship with kids).

2.2. Crosstab V5205 with V1109a (Respondent gender). Weight by Crosstab V5205 with V1109a (Respondent gender). Weight by Post-election weightPost-election weight

3.3. Repeat, but use V4103 (Interviewer gender) as independent Repeat, but use V4103 (Interviewer gender) as independent variablevariable

4.4. Run frequency distribution for V4103Run frequency distribution for V41035.5. Repeat #1 with a control for V4103Repeat #1 with a control for V41036.6. Repeat #2 with a control for V1109a Repeat #2 with a control for V1109a

Page 71: Survey Documentation and Analysis (SDA)

Teaching Resources for SDATeaching Resources for SDAand Developing Instructional Materialsand Developing Instructional Materials

Page 72: Survey Documentation and Analysis (SDA)

ICPSR Web-Based Instructional MaterialsICPSR Web-Based Instructional Materialshttp://www.icpsr.umich.edu/ICPSR/training/index.html#instructional http://www.icpsr.umich.edu/ICPSR/training/index.html#instructional

Page 73: Survey Documentation and Analysis (SDA)

Investigating Community & Social CapitalInvestigating Community & Social Capitalhttp://www.icpsr.umich.edu/ICSC/index.htmlhttp://www.icpsr.umich.edu/ICSC/index.html

Page 74: Survey Documentation and Analysis (SDA)

Voting Behavior: the 2004 ElectionVoting Behavior: the 2004 Election

http://www.icpsr.umich.edu/SETUPS/index.htmlhttp://www.icpsr.umich.edu/SETUPS/index.html