data access in north america current state and future consequences william c. block and lars...
TRANSCRIPT
![Page 1: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/1.jpg)
Data access in North America
Current state and future consequences
William C. Block and Lars Vilhuber
![Page 2: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/2.jpg)
Disclaimer:
The opinions expressed in this presentation are those of the authors and not the National Science Foundation, the U.S. Census Bureau, or any other government agency.
No confidential, restricted-access data was used to prepare this presentation.
![Page 3: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/3.jpg)
Caveats
• Economist• Labor Economist• Micro-data preferred• US bias
![Page 4: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/4.jpg)
Classifying North American data
• Access-type– Public-use data– Contractual access– Restricted-access data
• Data source– Survey data– Administrative data
• Strength of SDL
![Page 5: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/5.jpg)
Ease of access
Degree of detail
![Page 6: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/6.jpg)
RA: Contractual restriction
• Examples:– NLSY (detailed geo)– HRS (additional data)
• Some restrictions on usage in exchange for details
• Few constraints in combining with other data
![Page 7: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/7.jpg)
RA: Remote controlled access from anywhere
• Examples:– CRADC @ Cornell– Data enclave @ NORC– Synthetic data server @ Cornell
• Typically still cross-dataset access restrictions even within the same environment
• Reduced ability to combine with other data
![Page 8: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/8.jpg)
RA: Remote execution
• from anywhere• Examples:
– NCHS micro data ($)– Statistics Canada– (implicit in Synthetic Data Server)
• May be limited in complexity of models that can be estimated
![Page 9: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/9.jpg)
Remote access from controlled location
![Page 10: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/10.jpg)
Remote access from controlled location
• Examples:– Census, BLS, Canadian RDC– Even IAB data (from Cornell)
• Limited access (few locations)• Long application process• Limited ability to add additional data
![Page 11: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/11.jpg)
Detail and access
• As detail increases, access restrictions also increase
• What other methods are used?
![Page 12: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/12.jpg)
Trade-off:geographic detail vs. timeliness
• Decennial Census– Tract level– Limited characteristics
• American Community Survey– More person/household characteristics– Precision increases with multi-year estimates
![Page 13: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/13.jpg)
Trade-off:geographic detail vs. timeliness
• Current Population Survey– Monthly estimates– No sub-state estimates (exception: 12 large
MSAs)
![Page 14: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/14.jpg)
Data without Boundaries
• Increased access to restricted access data• Access to data from multiple jurisdictions• Access to data from multiple “access
domains”• Increasingly detailed public-use data
![Page 15: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/15.jpg)
Increased access to restricted access data
• Expansion of RDC network– USA– Canada
• Expansion of data accessible in RDC network– Agency for Health Care Research (AHRQ)– National Center for Healthcare Statistics (NCHS)
![Page 16: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/16.jpg)
Access to data from multiple jurisdictions
• Long-standing access – IRS, SSA data in Census RDC, can be combined
with Census data sources
• New– Multi-state access (education-oriented longitudinal
data warehouses)
![Page 17: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/17.jpg)
![Page 18: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/18.jpg)
Not everything is advancement
• BLS, Census, other agencies remain distinct and separate (despite CIPSEA)
• No cross-border access (Canadian data in US or vice-versa)
• Multi-jurisdiction access may be reduced, not increased (state employment agencies at Census Bureau) for research purposes
![Page 19: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/19.jpg)
![Page 20: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/20.jpg)
Access to data from multiple “access domains”
• How to get MUCH public-use data into – Census RDC– CRADC?
• No data curation other than own data– > CCBMR (see our presentation at WDA)
• Synthetic data, more detailed geo data– Increased ease of combining data
![Page 21: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/21.jpg)
Other methods
• Increasingly detailed public-use statistics– Use of
• synthetic data
• new methods of SDL
– Quarterly Workforce Indicators– Business Dynamics Statistics– Synthetic SIPP– Synthetic LBD
![Page 22: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/22.jpg)
Example: Abowd and Vilhuber (2012)
• “Did the Housing Price Bubble Clobber Local Labor Market Job and Worker Flows When It Burst?” (AEA, PP, 2012)
• Data sources:– FHFA's Housing Price Index– BLS' National and Local Unemployment Statistics– Census Bureau's Quarterly Workforce Indicators– Our own national aggregation of those
![Page 23: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/23.jpg)
Why do we do this?
![Page 24: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/24.jpg)
Modelling Critique
Research lifecycle
![Page 25: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/25.jpg)
Why?
• Accelerate the research cycle• Increase the body of research for any given
data source• Improve economic/social/demographic/etc.
models through more detailed data
![Page 26: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/26.jpg)
Public-use data very successful
![Page 27: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/27.jpg)
Restricted-access data less so
![Page 28: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/28.jpg)
![Page 29: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/29.jpg)
Richness of data is an incredible asset
• Macro economic CGE models rely on a multitude of parameters – dozens, maybe hundreds
• Micro economic (partial equilibrium) models rely on feasible estimation
• New modeling strategies: networking, micro-simulation
![Page 30: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/30.jpg)
Goal of research
• Understanding of economic and social phenomena– Better model-based predictions – Better experimental analysis
![Page 31: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/31.jpg)
Modelling
![Page 32: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/32.jpg)
Weather modelling
![Page 33: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/33.jpg)
Behind this:
• A set of models• Computed using observed data, simulations• National Centers for Environmental Prediction
has two 156-node compute clusters running 24/7
• Precision of predictions?
![Page 34: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/34.jpg)
Experiments
• Experiments provide useful data under controlled circumstances
• They are sometimes frowned upon...
![Page 35: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/35.jpg)
![Page 36: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/36.jpg)
Nuclear experiments nowadays
![Page 37: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/37.jpg)
ASC computing environment
• Sequoia next-generation BlueGene/P compute cluster:– 98,304 compute nodes – 1.6 million processor cores– 1.6 PB memory
![Page 38: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/38.jpg)
Bad policy and “experiments¨ have bad outcomes
Berlin 1923
Zimbabwe
![Page 39: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/39.jpg)
The logical next step?
• If we can simulate... – atomic bombs– Weather
• Given the right input data (integrated DwB!)...• Can we provide (better) simulations of
economic phenomena and policy?
![Page 40: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/40.jpg)
Let's consider ...
labor market mobility
![Page 41: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/41.jpg)
Sometimes only very little mobility
![Page 42: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/42.jpg)
Sometimes a lot of mobility
![Page 43: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/43.jpg)
Sometimes opportunities next door
May not be included in the data!
![Page 44: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/44.jpg)
… almost certainly for immigrants
![Page 45: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/45.jpg)
Presenting
• The bane of integrated data
Mr. Data-truncation
![Page 46: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/46.jpg)
![Page 47: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/47.jpg)
![Page 48: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/48.jpg)
Current workplace Current residence
![Page 49: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/49.jpg)
Current workplace Current residence
Historical workplaces
![Page 50: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/50.jpg)
Current workplace Current residence
Historical workplaces Higher education
![Page 51: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/51.jpg)
Current workplace Current residence
Historical workplaces Higher education
Primary education
![Page 52: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/52.jpg)
Not just me.
![Page 53: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/53.jpg)
Current workplace Current residence
Historical workplaces Higher education
Parents' workplaces
![Page 54: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/54.jpg)
Sibling locations
![Page 55: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/55.jpg)
Sibling locations Current colleague locations
![Page 56: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/56.jpg)
Sibling locations Current colleague locations
Past colleague locations
![Page 57: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/57.jpg)
Sibling locations Current colleague locations
Past colleague locations
![Page 58: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/58.jpg)
Sibling locations Current colleague locations
Past colleague locations
![Page 59: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/59.jpg)
Sibling locations Current colleague locations
Past colleague locations
![Page 60: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/60.jpg)
Sibling locations Current colleague locations
Past colleague locations
![Page 61: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/61.jpg)
Sibling locations Current colleague locations
Past colleague locations
![Page 62: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/62.jpg)
Sibling locations Current colleague locations
Past colleague locations
![Page 63: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/63.jpg)
It gets worse...
• Siblings in Montana (works in Silicon Valley) and Grenoble (used to live in Egypt)
• Parents somewhere in Europe (long live retirement), with retirement income from two state retirement systems (US and Germany)
![Page 64: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/64.jpg)
Historical data offers some insights
• We can link Tor Janson from Oslo (1880) to his records in the United States
• But we cannot link 21st century Lars Vilhuber
![Page 65: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/65.jpg)
Hourly data available...
![Page 66: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/66.jpg)
And I didn't even mention...
• F...b..k• G....l.• Tw.....
![Page 67: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/67.jpg)
This is not the end
• Suppose we solve most of the data access issues
• What kind of data usage models will we see?
![Page 68: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/68.jpg)
Example mobility
• Kennan and Walker (2003,2011)• Model determinants of individual location and
employment choices along a mobility path• Computational limitations:
– 500 HS dropouts– State-level choices– Only two at any time– > 1 day @ 50CPUs to estimate
![Page 69: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/69.jpg)
Models are always a simplification
• But:– 5.6 million Americans moved to a different state
(IRS SOI, 2008-2009)– 7.4 million moved to a different county in the same
state– 300,000 entered the US, 198,000 left the US
![Page 70: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/70.jpg)
Resources are still limited in RA
![Page 71: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/71.jpg)
… but resources exist where the data is not
![Page 72: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/72.jpg)
Some attempts get close
• “Exploring New Methods for Protecting and Distributing Confidential Research Data” at Michigan (Felicia LeClere) is already working in the cloud
• Census Bureau working with network of researchers, working group on next-generation flexible compute architecture within restricted-access environment
![Page 73: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/73.jpg)
Outlook
![Page 74: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/74.jpg)
Consequences of successful DwB
• If you create it (the integrated data environment), they will come
• … but they may wish for more than you can provide
• Successful data integration must also provide the tools for new (pent-up) modelling strategies
![Page 75: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/75.jpg)
The next frontier
• Tera-scale compute resources for the social sciences, using integrated confidential data
![Page 76: Data access in North America Current state and future consequences William C. Block and Lars Vilhuber](https://reader035.vdocument.in/reader035/viewer/2022062716/56649dbf5503460f94ab407f/html5/thumbnails/76.jpg)