Download - Distributed Meta-Analysis System
Introduction Application Architecture
A Tool for Distributed Meta-AnalysisSustainable Development 10-Year Research Conference
James Rising, Solomon Hsiang
Columbia SIPA, Berkeley GSPP
February 28, 2014
http://shackleton.gspp.berkeley.edu
Introduction Application Architecture
The need for empirical meta-analysis/ŵƉĂĐƚƐ tŽƌůĚ ϮϬϭϯ /ŶƚĞƌŶĂƟŽŶĂů ŽŶĨĞƌĞŶĐĞ ŽŶ ůŝŵĂƚĞ ŚĂŶŐĞ īĞĐƚƐWŽƚƐĚĂŵ DĂLJ ϮϳͲϯϬ ƌĂŌ DĂƌĐŚ Ϯϳ ϮϬϭϯ
0 1 2ï
ï
ï
ï
0
Cotton
Wheat
MaizeSoyAvg.
Change in global mean temperature (°C)
Perc
ent c
hang
e (y
ield
)
Damage to USA agriculture
Empirical&,
(19,6$*(FUND5,&(
&ŝŐƵƌĞ ϱ ůĂĐŬ WƌŽũĞĐƚĞĚ ĐŚĂŶŐĞƐ ŝŶ LJŝĞůĚƐ ƵƐŝŶŐ ƌĞƐƉŽŶƐĞ ĨƵŶĐƟŽŶƐ ŝŶ &ŝŐƵƌĞ ϰ ĂŶĚ ƚŚĞ ƉƌŽũĞĐƚĞĚ ĞdžƉŽƐƵƌĞ ŽĨh^ ĐƌŽƉůĂŶĚƐ ǀĞƌĂŐĞ ĐŚĂŶŐĞƐ ĂƌĞ ĂǀĞƌĂŐĞĚ ďLJ ĐƌŽƉůĂŶĚ ƉůĂŶƚĞĚ ŝŶ Ă ŐŝǀĞŶ ĐƌŽƉ ŽŵƉĂƌĂďůĞ ĚĂŵĂŐĞ ĨƵŶĐƟŽŶƐĨƌŽŵ Es/^' &hE ĂŶĚ Z/ ĂƌĞ ĐŽůŽƌĞĚ
ůĞǀĞů ƌŝƐĞ ŝƐ ŶŽƚ ĚƵĞ ƚŽ ƚŚĞ ƉĞƌŵĂŶĞŶƚ ŝŶƵŶĚĂƟŽŶ ŽĨ ůĂŶĚ ďƵƚ ƚŽ ĞŶŚĂŶĐĞĚ ĞƉŝƐŽĚŝĐ ŇŽŽĚŝŶŐ >ŽĐĂů ƐĞĂ ƐƵƌĨĂĐĞ
ŚĞŝŐŚƚ ŝƐ ƚŚĞ ƐƵŵ ŽĨ ůŽŶŐͲƚĞƌŵ ĂŶƚŚƌŽƉŽŐĞŶŝĐ ĂŶĚ ŶĂƚƵƌĂů ƚƌĞŶĚƐ ŵƵůƟͲLJĞĂƌ ŽĐĞĂŶ ĚLJŶĂŵŝĐ ǀĂƌŝĂďŝůŝƚLJ ƉĞƌŝŽĚŝĐ
ƟĚĂů ƐŝŐŶĂůƐ ĂŶĚ ƐƚŽƌŵ ƐƵƌŐĞƐ dŚĞ ȂϮϬ Đŵ ŽĨ ĐůŝŵĂƟĐĂůůLJͲĚƌŝǀĞŶ ƚǁĞŶƟĞƚŚͲĐĞŶƚƵƌLJ ƐĞĂ ůĞǀĞů ƌŝƐĞ ůĞĚ ƚŽ ĂĐƵƚĞ
ŝŵƉĂĐƚƐ ĨŽƌ ĮŌLJ ƚŚŽƵƐĂŶĚ ĂĚĚŝƟŽŶĂů ƌĞƐŝĚĞŶƚƐ ŽĨ EĞǁ zŽƌŬ ŝƚLJ ǁŚĞŶ ƚŚĞ ĐŝƚLJ ǁĂƐ Śŝƚ ďLJ ^ƵƉĞƌƐƚŽƌŵ ^ĂŶĚLJ
;ůŝŵĂƚĞ ĞŶƚƌĂů ϮϬϭϯͿ ƚŚŝƐ ŝŵƉĂĐƚ ǁĂƐ ĞdžĂĐĞƌďĂƚĞĚ ďĞĐĂƵƐĞ ƚŚĞ ƐƚŽƌŵ ƐƵƌŐĞ ŽĐĐƵƌƌĞĚ ŝŶ ƐƵƉĞƌƉŽƐŝƟŽŶ ǁŝƚŚ
ƉƌĞͲĞdžŝƐƟŶŐ ƐĞĂ ůĞǀĞů ƌŝƐĞ ;ĂƐ ǁĞůů ĂƐ ŚŝŐŚ ƟĚĞͿ /DƐ ĂƐ ƚŚĞLJ ĂƌĞ ĐƵƌƌĞŶƚůLJ ƐƚƌƵĐƚƵƌĞĚ ĚŽ ŶŽƚ ĐĂƉƚƵƌĞ ƚŚĞƐĞ
ŬŝŶĚƐ ŽĨ ĂĐƵƚĞ ŝŵƉĂĐƚƐ ĂŶĚ ŝŶƐƚĞĂĚ ǁŽƵůĚ ŵŽĚĞů ƚŚĞ ŝŵƉĂĐƚ ŽĨ Ă ĐLJĐůŽŶĞͲŝŶĚƵĐĞĚ ƐƵƌŐĞ ĂƐ Ă ƐŵĂůů ŝŶĐƌĞĂƐĞ ŝŶ
ĂǀĞƌĂŐĞ ƐĞĂ ůĞǀĞů ƐƉƌĞĂĚ ĂĐƌŽƐƐ ŵĂŶLJ LJĞĂƌƐ
EĂƚƵƌĂů ƐLJƐƚĞŵ ƐƚŽĐŚĂƐƟĐŝƚLJ ŝƐ ŶŽƚ ůŝŵŝƚĞĚ ƚŽ ƐĞĂ ůĞǀĞů ĂŶĚ ŇŽŽĚŝŶŐ ŝƚ ŝƐ ƵďŝƋƵŝƚŽƵƐ ĂŶĚ ĂīĞĐƚƐ ŵŽƐƚ ĐůŝŵĂƚĞ
ĐŚĂŶŐĞ ŝŵƉĂĐƚƐ Ɛ ĂŶŽƚŚĞƌ ĞdžĂŵƉůĞ ĐŽŶƐŝĚĞƌ ĐŽƌŶ LJŝĞůĚƐ ŵƉůŽLJŝŶŐ ƚŚĞ ŝŵƉĂĐƚ ĨƵŶĐƟŽŶ ŽĨ ^ĐŚůĞŶŬĞƌ ĂŶĚ
ZŽďĞƌƚƐ ;ϮϬϬϵͿ ƚŽ ĚĂŝůLJ ƚĞŵƉĞƌĂƚƵƌĞƐ ĨƌŽŵ dŽƉĞŬĂ <ĂŶƐĂƐ h^ ;EĂƟŽŶĂů ůŝŵĂƚĞ ĂƚĂ ĞŶƚĞƌ ϮϬϭϯͿ ǁĞ ĮŶĚ
ƚŚĂƚ ĐŽƌŶ LJŝĞůĚ ŽǀĞƌ ƚŚĞ ŝŶƚĞƌǀĂů ϭϵϳϯͲϭϵϵϴ ƐŚŽƵůĚ ŚĂǀĞ ǀĂƌŝĞĚ ďĞƚǁĞĞŶ ϲϬй ĂŶĚ ϭϭϬй ŽĨ ŝƚƐ ĞdžƉĞĐƚĞĚ ǀĂůƵĞ ĚƵĞ
ũƵƐƚ ƚŽ ƚŚĞ ĞīĞĐƚƐ ŽĨ ƚĞŵƉĞƌĂƚƵƌĞ ;&ŝŐ ϳͿ dŚĞ ƌĞƚƵƌŶ ŝŶƚĞƌǀĂů ŽĨ Ă ϮϬй ůŽƐƐ ĞǀĞŶƚ ;ŝĞ LJŝĞůĚ ŽĨ ϴϬй ŽĨ ŝƚƐ ĞdžƉĞĐƚĞĚ
ǀĂůƵĞͿ ǁĂƐ ĂďŽƵƚ ϮϬ LJĞĂƌƐ /Ŷ &ŝŐƐ ϴ ĂŶĚ ϵ ǁĞ ĞdžĂŵŝŶĞ ƚŚĞ ƉƌŽũĞĐƚĞĚ ĞīĞĐƚ ŽĨ ǁĂƌŵŝŶŐ ŽŶ ĞdžƉĞĐƚĞĚ ĐƌŽƉ LJŝĞůĚ
ƚŚĞ ϱƚŚͲϵϱƚŚ ƉĞƌĐĞŶƟůĞƐ ŽĨ ĐƌŽƉ LJŝĞůĚ ĂŶĚ ƚŚĞ ƌĞƚƵƌŶ ŝŶƚĞƌǀĂůƐ ŽĨ ĐƌŽƉ ůŽƐƐ ĞǀĞŶƚƐ tŝƚŚ ϭל ŽĨ ǁĂƌŵŝŶŐ ƚŚĞ
ƌĞƚƵƌŶ ŝŶƚĞƌǀĂů ŽĨ Ă ϮϬй ůŽƐƐ ĞǀĞŶƚ ĚƌŽƉƐ ƚŽ ϭϮ LJĞĂƌƐ ǁŝƚŚ Ϯל ŽĨ ǁĂƌŵŝŶŐ ƚŽ ϳ LJĞĂƌƐ ĂŶĚ ǁŝƚŚ ϰל ŽĨ ǁĂƌŵŝŶŐ
ŝƚ ĚƌŽƉƐ ƚŽ ϭϮ LJĞĂƌƐ ƐƟŵĂƚĞƐ ŽĨ ǁĞůĨĂƌĞ ůŽƐƐĞƐ ĂŶĚ ĞīĞĐƟǀĞ ĂĚĂƉƟǀĞ ŵĞĂƐƵƌĞƐ ŵƵƐƚ ďŽƚŚ ƚĂŬĞ ŝŶƚŽ ĂĐĐŽƵŶƚ
ƚŚĞ ŝŶĐƌĞĂƐŝŶŐ ĨƌĞƋƵĞŶĐLJ ŽĨ ĞdžĐĞƉƟŽŶĂůůLJ ůŽǁ ŚĂƌǀĞƐƚ LJĞĂƌƐ ĂŶĚ ƚŚĞ ƐƚƌĂŝŶ ƚŚĞLJ ĞdžĞƌƚ ŽŶ ŝŶƐƵƌĂŶĐĞ ĂŶĚ ƐĂĨĞƚLJ
ŶĞƚǁŽƌŬƐ ŶŽƚ ũƵƐƚ ƚŚĞ ĐŚĂŶŐĞ ŝŶ ĞdžƉĞĐƚĞĚ ĂŐƌŝĐƵůƚƵƌĂů LJŝĞůĚ tŝƚŚ ϭל ŽĨ ǁĂƌŵŝŶŐ ƚŚĞ ŝŶĐƌĞĂƐĞĚ ĨƌĞƋƵĞŶĐLJ ŽĨ
Black: Projected changes in yields using estimated models and the projected exposureof USA croplands. Changes averaged by cropland planted in a given crop. Comparabledamage functions from ENVISAGE, FUND and RICE are colored. From Kopp et al.
(2013).
Introduction Application Architecture
The problem with meta-analysis
Essential (for scientists, policy, modelers), but...
Time-consuming Expensive with hundreds of papers
Infrequent For many questions, once a generation
Shallow Drop all extra factors in relationships
Reductive Need to ignore or limit methodologies, interests
Sweeping General results, missing targeted questions
Weak Methods Unweighted averages common in some fields
Biased Publication, reference, inclusion:
Introduction Application Architecture
Essential Elements
Collecting collective collections
An intelligent database of results
Statistical (Bayesian) tools
Remixing, comparing, combining
But not:
Not asking for or analyzing data
Article authors not required
No computational models
Introduction Application Architecture
The Site
Introduction Application Architecture
Finding Models
Introduction Application Architecture
Performing Meta-analyses
Introduction Application Architecture
How to use it
1 Optional: Find a relevant collection, if one exists.
2 Create new models defining:
Publication author; journal; year; location in article;published/working paper/private
Variables dependent and independent variables(e.g. Yield (MT/Ha) vs. Temperature (C))
Population Optional: methodology, observation count, studyregion and years, population characteristics, . . .
Parameter Estimates
Introduction Application Architecture
Creating Models
Models are parameter estimates, represented as conditionaldistributions.
Using structured model files:
Specifying summary statistics:
Generating stock model types:
Filling out a collection scaffold:
Model types
Introduction Application Architecture
Merging Methods
With multiple models: compare them, combine them, regroupthem, reweight them; distribute them, share them
(a) Pooled estimate:
p(θ) = p(∩Ni=1θ = θi ) ∝
N∏i=1
p(θi )
(b) Bayesian hierarchical estimate:
θi ∼ N (µ, τ 2)
yi ∼ N (θi , σ2i )
Log mortality as a function of temperature (Deschenes andGreenstone, 2011, Barreca et al., 2013). Model weights
Introduction Application Architecture
Smarter Crowdsourcing
Permissions Full control over who can view and edit
Curation Scientists can publish their view of the literature
Moderation Others can submit additions for review
Forms Custom forms to distribute to others
Discussion Every model and collection is a forum
Introduction Application Architecture
Agriculture Example
Cotton Maize Soybeans Wheat
/ŵƉĂĐƚƐ tŽƌůĚ ϮϬϭϯ /ŶƚĞƌŶĂƟŽŶĂů ŽŶĨĞƌĞŶĐĞ ŽŶ ůŝŵĂƚĞ ŚĂŶŐĞ īĞĐƚƐWŽƚƐĚĂŵ DĂLJ ϮϳͲϯϬ ƌĂŌ DĂƌĐŚ Ϯϳ ϮϬϭϯ
0 10 20 30 40ï
0
Mean temperature during growing season (C)
$GGLWLRQDOH[SRVXUHWLP
HPR
0 10 20 30 400
5
1
Mean temperature during growing season (C)
Rel
ativ
e yi
eld
Wheat
0 10 20 30 400
5
1
Mean temperature during growing season (C)
Rel
ativ
e yi
eld
Maize
0 10 20 30 400
5
1
Mean temperature during growing season (C)
Rel
ativ
e yi
eld
Soybeans
0 10 20 30 400
5
1
Mean temperature during growing season (C)
Rel
ativ
e yi
eld
Cotton
0 10 20 30 40ï
0
Mean temperature during all months (C)
$GGLWLRQDOH[SRVXUHWLP
HPR
USA Global agriculture
&ŝŐƵƌĞ ϰ dŽƉ ĨŽƵƌ ƉĂŶĞůƐ ŵƉŝƌŝĐĂůůLJͲĚĞƌŝǀĞĚ ƌĞƐƉŽŶƐĞƐ ŽĨ ĨŽƵƌ ŵĂũŽƌ ĐƌŽƉƐ ƚŽ ŐƌŽǁŝŶŐ ƐĞĂƐŽŶ ƚĞŵƉĞƌĂƚƵƌĞ ŝŶhŶŝƚĞĚ ^ƚĂƚĞƐ ĐŽƵŶƟĞƐ ĨƌŽŵ ,ƐŝĂŶŐ Ğƚ Ăů ;ϮϬϭϮͿ >ŽǁĞƌ ƉĂŶĞůƐ dŚĞ ĂĚĚŝƟŽŶĂů ĂŵŽƵŶƚ ŽĨ ƟŵĞ ƚŚĂƚ ĐƌŽƉůĂŶĚƐŝŶ ƚŚĞ hŶŝƚĞĚ ^ƚĂƚĞƐ ĂŶĚ ƚŚĞ ǁŽƌůĚ ǁŝůů ďĞ ĞdžƉŽƐĞĚ ƚŽ ƚŚĞƐĞ ƚĞŵƉĞƌĂƚƵƌĞƐ ƵŶĚĞƌ Ϯל ǁĂƌŵŝŶŐ ŝŶ ŐůŽďĂů ŵĞĂŶƚĞŵƉĞƌĂƚƵƌĞ
Ɛ ƉĂƌĂŵĞƚƌŝnjĂƟŽŶƐ ŽĨ ĂĚĂƉƚĂƟŽŶ ĂƌĞ ĚĞǀĞůŽƉĞĚ ŝŶ ƚŚĞ ĞŵƉŝƌŝĐĂů ůŝƚĞƌĂƚƵƌĞ ƚŚĞƐĞ ĚĂŵĂŐĞ ĨƵŶĐƟŽŶ ĐĂůŝďƌĂƟŽŶƐ
ĐĂŶ ďĞ ĂĚũƵƐƚĞĚ ĂĐĐŽƌĚŝŶŐůLJ
ϯ dŚĞ ƌŽůĞ ŽĨ ƐƚŽĐŚĂƐƟĐŝƚLJ
DĂŶLJ ŶĂƚƵƌĂů ĂŶĚ ŚƵŵĂŶ ƐLJƐƚĞŵƐ ĂƌĞ ƐƚŽĐŚĂƐƟĐ &Žƌ ĞdžĂŵƉůĞ ǁĞĂƚŚĞƌ ŝƐ ƚŚĞ ŵĂŶŝĨĞƐƚĂƟŽŶ ŽĨ ǀĂƌŝĂŶĐĞ ĂƌŽƵŶĚ
ĐůŝŵĂƚŽůŽŐŝĐĂů ŵĞĂŶƐ ďƵƐŝŶĞƐƐ ĐLJĐůĞƐ ĂƌĞ ŵĂŶŝĨĞƐƚĂƟŽŶƐ ŽĨ ǀĂƌŝĂŶĐĞ ĂƌŽƵŶĚ ůŽŶŐͲƚĞƌŵ ĞĐŽŶŽŵŝĐ ŐƌŽǁƚŚ ůͲ
ƚŚŽƵŐŚ ĐůŝŵĂƚĞ ĚĂŵĂŐĞƐ ĂƌĞ ŽŌĞŶ ƉĂƌƟĂůůLJ ƌĞĂůŝnjĞĚ ƚŚƌŽƵŐŚ ƐŚŝŌƐ ŝŶ ĐůŝŵĂƚŽůŽŐŝĐĂů ĞdžƚƌĞŵĞƐ /DƐ ŚĂǀĞ ŐĞŶĞƌͲ
ĂůůLJ ŶŽƚ ĞdžƉůŝĐŝƚůLJ ŝŶĐůƵĚĞĚ LJĞĂƌͲƚŽͲLJĞĂƌ ǀĂƌŝĂďŝůŝƚLJ /D ǁĞůĨĂƌĞ ĂŶĂůLJƐŝƐ ƚŚĞƌĞĨŽƌĞ ŝŵƉůŝĐŝƚůLJ ĂƐƐƵŵĞƐ ƉĞƌĨĞĐƚůLJͲ
ĨƵŶĐƟŽŶŝŶŐ ŵĂƌŬĞƚƐ ĂŶĚ ŝŶƐƟƚƵƟŽŶƐ ĐĂƉĂďůĞ ŽĨ ƐƉƌĞĂĚŝŶŐ ƌŝƐŬ ŽǀĞƌ ƟŵĞ ĂŶĚ ƚŚƵƐ ĂůůŽǁŝŶŐ ƚŚĞ ǁĞůĨĂƌĞ ŝŵƉĂĐƚ ŽĨ
ĂǀĞƌĂŐĞ ĚĂŵĂŐĞƐ ƚŽ ďĞ Ă ŐŽŽĚ ƐƵďƐƟƚƵƚĞ ĨŽƌ ƚŚĞ ǁĞůĨĂƌĞ ŝŵƉĂĐƚ ŽĨ Ă ƐĞƋƵĞŶĐĞ ŽĨ ĂĐƚƵĂů ůŽƐƐ ƌĞĂůŝnjĂƟŽŶƐ ;ǁŚŝĐŚ
ǀĂƌLJ ĂƌŽƵŶĚ ƚŚŝƐ ĂǀĞƌĂŐĞͿ tŝƚŚŽƵƚ ƐƵĐŚ ŵĂƌŬĞƚƐ Žƌ ŝŶƐƟƚƵƟŽŶƐ ŚŽǁĞǀĞƌ ƚŚĞ ĂďƐĞŶĐĞ ŽĨ ŝŶƚĞƌͲĂŶŶƵĂů ǀĂƌŝĂďŝůŝƚLJ
ůŝŬĞůLJ ůĞĂĚƐ ƚŽ ĂŶ ƵŶĚĞƌĞƐƟŵĂƚĞ ŽĨ ĨƵƚƵƌĞ ǁĞůĨĂƌĞ ůŽƐƐĞƐ
dŚĞ ŝŵƉŽƌƚĂŶĐĞ ŽĨ ƐƚŽĐŚĂƐƟĐŝƚLJ ŝƐ ĐůĞĂƌůLJ ŝůůƵƐƚƌĂƚĞĚ ďLJ ƚŚĞ ĞdžĂŵƉůĞ ŽĨ ƐĞĂ ůĞǀĞů ĐŚĂŶŐĞ DŽƐƚ ĚĂŵĂŐĞ ĚƵĞ ƚŽ ƐĞĂ
Temperature Yield Models
Growing Season Average Temperatures
Yield Impacts by Crop
Baseline Production by County (2007)
Introduction Application Architecture
Agriculture Example
Total Yield Impacts, averaged by production
Introduction Application Architecture
The benefits to you
Performing meta-analyses
Science as advancement of knowledge
Building a public good
Ready-made gold-standard
Ready-made literature comparison
A platform for variable requests
Introduction Application Architecture
Opensourced Library
Introduction Application Architecture
Ticket System
Introduction Application Architecture
New visions
Forge better connections between research and policy andmodelers
Incorporate meta-analysis into all research
Counterbalance publication, reference, and inclusion bias
Thank you!
http://shackleton.gspp.berkeley.edu
Introduction Application Architecture
Model types
Delta Model p(y |x) = δ(y − g(x))
Discrete-Discrete Model p(yj |xi ) = pij
Spline Model p(y |x) =
ea0+b0y+c0y2
for y0 ≤ y < y1
ea1+b1y+c1y2for y1 ≤ y < y2
· · · · · ·
Bin Model p(y |x) =
p1(y |x) for x0 < x ≤ x1
p2(y |x) for x1 < x ≤ x2
· · · · · ·Mean-Size Model IE[Y |X = xi ] = yi with population si
Multivariate Model p(y |x , . . . , z) = q(y |(x , . . . , z))
Return
Introduction Application Architecture
Model weights
Pooled:
p(y |x , f ) ∝∏m
f αmm (y |x)
∝ f1(f |y , x)f2(f |y , x)f2(f |y , x) for α1 = 1 and α2 = 2
Hierachical Bayesian:
p(θ, µ, τ |y) ∝ p(µ, τ)∏m
p(θm|µ, τ)f αm(y |θm)
∝ p(µ, τ)p(θ1|µ, τ)p(θ2|µ, τ)f (y |θ1)f (y |θ2)f (y |θ2) for α1 = 1 and α2 = 2
Return