r: innovating at the bureau of labor statistics · 3 —u.s. bureau of labor statistics •bls.gov...

43
1 U.S. BUREAU OF LABOR STATISTICS bls.gov R: Innovating at the Bureau of Labor Statistics Arcenis Rojas Economist Division of Consumer Expenditure Surveys Federal Committee on Statistical Methodology March 2018

Upload: others

Post on 28-May-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

1 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

R: Innovating at the Bureau of Labor Statistics

Arcenis RojasEconomist

Division of Consumer Expenditure Surveys

Federal Committee on Statistical MethodologyMarch 2018

Page 2: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

2 — U.S. BUREAU OF LABOR STATISTICS • bls.gov2 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Overview

IPP: Division of International Prices

PPI: Division of Industrial Prices and Price Indexes

CE: Division of Consumer Expenditure Survey

OCWC: Office of Compensation and Working Conditions

OSMR: Office of Survey Methods and Research

Page 3: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

3 — U.S. BUREAU OF LABOR STATISTICS • bls.gov3 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Overview

Automation (IPP)

Quality control (PPI)

Real-time response rates (OCWC)

Data visualization (CE)

Other R Shiny applications

R packages

Page 4: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

4 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

R Shiny Applications

Page 5: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

5 — U.S. BUREAU OF LABOR STATISTICS • bls.gov5 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Sample Refinement Automation

International Prices Program

Receive data from Census and Customs

Must verify Establishment ID Number (EIN), name, and address to provide to field economists

1700 export collections units per sample

2400 import collection units per sample

6 IPP sample team members

16 copies, 20 pastes, and 46 clicks per unit

Page 6: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

6 — U.S. BUREAU OF LABOR STATISTICS • bls.gov6 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Data Sources

Page 7: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

7 — U.S. BUREAU OF LABOR STATISTICS • bls.gov7 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

4

1

32

5

Page 8: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

8 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Left Side

Page 9: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

9 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Right Side

Page 10: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

10 — U.S. BUREAU OF LABOR STATISTICS • bls.gov10 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Search Results

Page 11: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

11 — U.S. BUREAU OF LABOR STATISTICS • bls.gov11 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Export Addresses at a Glance

Page 12: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

12 — U.S. BUREAU OF LABOR STATISTICS • bls.gov12 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

80-100 hours per sample of time savings

Much less clicking

Better and more thorough sample review

More time to review more problematic collection units

Benefits of Automation

Page 13: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

13 — U.S. BUREAU OF LABOR STATISTICS • bls.gov13 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Sample Refinement Automation

Ara Khatchadourian: [email protected]

Rob Sutton: [email protected]

Page 14: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

14 — U.S. BUREAU OF LABOR STATISTICS • bls.gov14 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Industrial Prices Visualization Dashboard

Page 15: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

15 — U.S. BUREAU OF LABOR STATISTICS • bls.gov15 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Index Comparisons

Page 16: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

16 — U.S. BUREAU OF LABOR STATISTICS • bls.gov16 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Index Review and Revision

Page 17: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

17 — U.S. BUREAU OF LABOR STATISTICS • bls.gov17 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Visualization Dashboard

Neil Wagner: [email protected]

Steve York: [email protected]

Page 18: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

18 — U.S. BUREAU OF LABOR STATISTICS • bls.gov18 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Interactive CE Visualization Tool

Page 19: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

19 — U.S. BUREAU OF LABOR STATISTICS • bls.gov19 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

CE Public-Use Microdata (PUMD)

Public-Use Microdata

Family-level characteristics

Expenditures by Universal Classification Code (UCC)

Member-level characteristics

Expenditures and their characteristics by type of expenditure (EXPN… > 50 files each year!)

And more!

Page 20: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

20 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Files Required for Analysis

Family Characteristics File (34,177 Observations)

Expenditures File(1,720,755 Observations)

Page 21: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

21 — U.S. BUREAU OF LABOR STATISTICS • bls.gov21 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Required Resources / Skills

Page 22: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

22 — U.S. BUREAU OF LABOR STATISTICS • bls.gov22 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Interactive CE Visualization Tool

Page 23: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

23 — U.S. BUREAU OF LABOR STATISTICS • bls.gov23 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Interactive CE Visualization Tool

1 2 3

4

Page 24: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

24 — U.S. BUREAU OF LABOR STATISTICS • bls.gov24 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Interactive CE Visualization Tool

1

Page 25: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

25 — U.S. BUREAU OF LABOR STATISTICS • bls.gov25 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Interactive CE Visualization Tool

2

Page 26: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

26 — U.S. BUREAU OF LABOR STATISTICS • bls.gov26 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Interactive CE Visualization Tool

3

4

Page 27: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

27 — U.S. BUREAU OF LABOR STATISTICS • bls.gov27 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Interactive CE Visualization Tool

Error Bars

Mean = $30,040.00

CV = 24.12%

Sample Size = 3

Lower Bound = $15,548.70

Upper Bound = $44,531.30

Page 28: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

28 — U.S. BUREAU OF LABOR STATISTICS • bls.gov28 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Benefits to the user

Accessibility: The user can access the app for

free as long as they have internet access on

a device with a web browser

Usability: The user operates only the clean, user-friendly UI to get data, results, and visualizations

Page 29: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

29 — U.S. BUREAU OF LABOR STATISTICS • bls.gov29 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Interactive CE Visualization Tool

Arcenis Rojas: [email protected]

Page 30: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

30 — U.S. BUREAU OF LABOR STATISTICS • bls.gov30 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Real-time Response Rate Tool

Office of Compensation and Working Conditions

Provide real-time response rates to field offices

Focus on problem collection areas

Improved sample representativity

Page 31: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

31 — U.S. BUREAU OF LABOR STATISTICS • bls.gov31 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Real-time Response Rate Tool

Response rates by region and/or establishment size

Detailed summaries for each region

Page 32: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

32 — U.S. BUREAU OF LABOR STATISTICS • bls.gov32 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Real-time Response Rate Tool

Brandon Kopp (OSMR): [email protected]

Randall Powers (OSMR): [email protected]

Arcenis Rojas (CE): [email protected]

Page 33: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

33 — U.S. BUREAU OF LABOR STATISTICS • bls.gov33 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Other Shiny Applications

Choropleth maps of unemployment data (OSMR)

Energy Information Administration analyzer (PPI)

Text analysis Shiny App (Survey Methods)

Page 34: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

34 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

R Packages

Page 35: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

35 — U.S. BUREAU OF LABOR STATISTICS • bls.gov35 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

R Packages

rpms: Recursive Partitioning for Modeling Survey Data package (Survey Methods)

growfunctions: Bayesian Non-Parametric Dependent Models for Time-Indexed Functional Data package (Survey Methods)

Page 36: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

36 — U.S. BUREAU OF LABOR STATISTICS • bls.gov36 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

rpms

Fits a linear model to survey data in each node obtained by recursively partitioning the data.

Adjusts for complex sample design features used to obtain the data.

Produces design-consistent coefficients to the least squares linear model between the dependent and independent variables.

Page 37: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

37 — U.S. BUREAU OF LABOR STATISTICS • bls.gov37 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

rpms

The main function returns the resulting binary tree with the linear model fit at every end-node.

Daniell Toth (OSMR): [email protected]

Page 38: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

38 — U.S. BUREAU OF LABOR STATISTICS • bls.gov38 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

growfunctions

Bayesian Non-Parametric Dependent Models for Time-Indexed Functional Data package (Survey Methods)

Estimates a collection of time-indexed functions under either of Gaussian process (GP) or intrinsic Gaussian Markov random field (iGMRF) prior formulations

Page 39: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

39 — U.S. BUREAU OF LABOR STATISTICS • bls.gov39 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

growfunctions

Dirichlet process mixture allows sub-groupings of the functions to share the same covariance or precision parameters

The GP and iGMRF formulations both support any number of additive covariance or precision terms, respectively, expressing either or both of multiple trend and seasonality.

Page 40: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

40 — U.S. BUREAU OF LABOR STATISTICS • bls.gov40 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

growfunctions

Terrance Savitsky (OSMR): [email protected]

Page 41: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

41 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Challenges

Page 42: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

42 — U.S. BUREAU OF LABOR STATISTICS • bls.gov42 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Challenges

Data confidentiality

Need for an R server to make apps/programs public

Can only put Shiny apps on a webpage via iFrames or setting up an account on a cloud server (i.e., Digital Ocean, R Studio)

Page 43: R: Innovating at the Bureau of Labor Statistics · 3 —U.S. BUREAU OF LABOR STATISTICS •bls.gov Overview Automation (IPP) Quality control (PPI) Real-time response rates (OCWC)

Contact Information

43 — U.S. BUREAU OF LABOR STATISTICS • bls.gov

Arcenis RojasEconomist

Division of Consumer Expenditure Surveyswww.bls.gov/cex

[email protected]