info 7470/econ 7400/ilrle 7400 solutions to lab 5 john m. abowd and lars vilhuber march 25, 2013

23
INFO 7470/ECON 7400/ILRLE 7400 Solutions to Lab 5 John M. Abowd and Lars Vilhuber March 25, 2013

Upload: brendan-sullivan

Post on 13-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

INFO 7470/ECON 7400/ILRLE 7400Solutions to Lab 5

John M. Abowd and Lars VilhuberMarch 25, 2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

2

LESSONS TO BE LEARNEDSubtitle organization

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

3

Lessons

• Answering data-driven questions• Identify tools to answer the question• Correctly use available metadata• Not all data on the same topic provide the

same answer

3/4/2013 3

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

4

Required tools

• SAS, Stata, R, Python, etc.• Web browser• Search engine…

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

5

NAICS

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

6

NAICS sub-sectors (NAICS3)

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

7

QCEW

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

8

After downloading ZIP file

• For historical data, BLS has packaged an entire year into a single ZIP file (151MB)

• We only need one file from there: county file for Pennsylvania

• What is the state code for PA?– PA -> FIPS=42

• We thus need cn42pa10.enb (note the extension, but no choice: only .enb files available)

• Extract it from the ZIP file, unpacked: 38MB

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

9

How to read it in?

• No information in the ZIP file, but…– On the same FTP server: DOCUMENT/– On the Web page: “Flat file formatters”– On the Web page: “Tools and tutorials”

• Use the template files to construct a SAS program– For Stata: construct a dictionary file– For R: read a fixed format file

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

10

Solution to QCEW

• http://www.vrdc.cornell.edu/info7470/Data/lab5-qcew.sas.txt

• Compare it to the template program provided in BLS’ makesas.zip

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

11

Minor modifications

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

12

Computations

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

13

QCEW Pitfalls

• Industry coding: ftp://ftp.bls.gov/pub/special.requests/cew/DOCUMENT/industry.map

• “Industry Code Map: This is for NAICS based Quarterly Census of Employment and Wages (QCEW) data.”

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

14

Mixed industry codingIndustry Code Industry Title

10 10 Total, all industries

101 101 Goods-producing

1011 1011 Natural resources and mining

11 NAICS 11 Agriculture, forestry…

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

15

QWI

• Challenge: very large files• http://

www.vrdc.cornell.edu/qwipu/R2012Q2/pa/wia/qwi_pa_wia_county_naicssec_pri.csv.bz2 : 81MB compressed, 2.3GB uncompressed

• Read-in requires 8GB of RAM for R…

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

16

Metadata and data

• “How many data rows does the file you downloaded have?”– QCEW: as many as the .enb file has (no embedded

metadata) (88,093)– QWI: count of lines minus 1: the header row is

metadata, not data (8,482,131)– Same reasoning for CBP (2,155,389)

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

17

Reading in QWI

• http://www.vrdc.cornell.edu/qwipu/R2012Q2/pa/wia/sas_import_wia.sas in the same directory

• Very long program, but the very first section is for the file we want: qwi_pa_wia_county_naics3

• Alternatively, use “proc import”, but may not yield correct results.

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

18

After read-in, same as for QCEW

• http://www.vrdc.cornell.edu/info7470/Data/lab5_qwi.sas.txt :

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

19

Solution for QWI

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

20

County Business Patterns

• Straight CSV file, but for entire year (15.2MB ZIP file)

• But: employment refers to March 15, so comparable to the other two

• Caution: file contains all levels of NAICS, right-filled with “////”

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

21

Solution for CBP

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

22

Results

• Not all sources give the same answer…– Differences in source data• Count of individual wage records• Firm-level report of employment at a particular point in

time to state reporting system• Establishment-level report of employment a particular

point in time to federal reporting system

– Differences in data cleaning– Other…

3/4/2013

© John M. Abowd and Lars Vilhuber 2013, all rights reserved

23

Now that you know how

• Try it on Lewis and Clark County, MT• Try it for earlier time periods• Drill down

3/4/2013