distributed meta-analysis system

19
Introduction Application Architecture A Tool for Distributed Meta-Analysis Sustainable Development 10-Year Research Conference James Rising, Solomon Hsiang Columbia SIPA, Berkeley GSPP February 28, 2014 http://shackleton.gspp.berkeley.edu

Upload: jarising

Post on 04-Jul-2015

2.348 views

Category:

Technology


0 download

DESCRIPTION

Introduces a tool for meta-analysis, along with a crowd-sourced database of results.

TRANSCRIPT

Page 1: Distributed Meta-Analysis System

Introduction Application Architecture

A Tool for Distributed Meta-AnalysisSustainable Development 10-Year Research Conference

James Rising, Solomon Hsiang

Columbia SIPA, Berkeley GSPP

February 28, 2014

http://shackleton.gspp.berkeley.edu

Page 2: Distributed Meta-Analysis System

Introduction Application Architecture

The need for empirical meta-analysis/ŵƉĂĐƚƐ tŽƌůĚ ϮϬϭϯ /ŶƚĞƌŶĂƟŽŶĂů ŽŶĨĞƌĞŶĐĞ ŽŶ ůŝŵĂƚĞ ŚĂŶŐĞ īĞĐƚƐWŽƚƐĚĂŵ DĂLJ ϮϳͲϯϬ ƌĂŌ DĂƌĐŚ Ϯϳ ϮϬϭϯ

0 1 2ï

ï

ï

ï

0

Cotton

Wheat

MaizeSoyAvg.

Change in global mean temperature (°C)

Perc

ent c

hang

e (y

ield

)

Damage to USA agriculture

Empirical&,

(19,6$*(FUND5,&(

&ŝŐƵƌĞ ϱ ůĂĐŬ WƌŽũĞĐƚĞĚ ĐŚĂŶŐĞƐ ŝŶ LJŝĞůĚƐ ƵƐŝŶŐ ƌĞƐƉŽŶƐĞ ĨƵŶĐƟŽŶƐ ŝŶ &ŝŐƵƌĞ ϰ ĂŶĚ ƚŚĞ ƉƌŽũĞĐƚĞĚ ĞdžƉŽƐƵƌĞ ŽĨh^ ĐƌŽƉůĂŶĚƐ ǀĞƌĂŐĞ ĐŚĂŶŐĞƐ ĂƌĞ ĂǀĞƌĂŐĞĚ ďLJ ĐƌŽƉůĂŶĚ ƉůĂŶƚĞĚ ŝŶ Ă ŐŝǀĞŶ ĐƌŽƉ ŽŵƉĂƌĂďůĞ ĚĂŵĂŐĞ ĨƵŶĐƟŽŶƐĨƌŽŵ Es/^' &hE ĂŶĚ Z/ ĂƌĞ ĐŽůŽƌĞĚ

ůĞǀĞů ƌŝƐĞ ŝƐ ŶŽƚ ĚƵĞ ƚŽ ƚŚĞ ƉĞƌŵĂŶĞŶƚ ŝŶƵŶĚĂƟŽŶ ŽĨ ůĂŶĚ ďƵƚ ƚŽ ĞŶŚĂŶĐĞĚ ĞƉŝƐŽĚŝĐ ŇŽŽĚŝŶŐ >ŽĐĂů ƐĞĂ ƐƵƌĨĂĐĞ

ŚĞŝŐŚƚ ŝƐ ƚŚĞ ƐƵŵ ŽĨ ůŽŶŐͲƚĞƌŵ ĂŶƚŚƌŽƉŽŐĞŶŝĐ ĂŶĚ ŶĂƚƵƌĂů ƚƌĞŶĚƐ ŵƵůƟͲLJĞĂƌ ŽĐĞĂŶ ĚLJŶĂŵŝĐ ǀĂƌŝĂďŝůŝƚLJ ƉĞƌŝŽĚŝĐ

ƟĚĂů ƐŝŐŶĂůƐ ĂŶĚ ƐƚŽƌŵ ƐƵƌŐĞƐ dŚĞ ȂϮϬ Đŵ ŽĨ ĐůŝŵĂƟĐĂůůLJͲĚƌŝǀĞŶ ƚǁĞŶƟĞƚŚͲĐĞŶƚƵƌLJ ƐĞĂ ůĞǀĞů ƌŝƐĞ ůĞĚ ƚŽ ĂĐƵƚĞ

ŝŵƉĂĐƚƐ ĨŽƌ ĮŌLJ ƚŚŽƵƐĂŶĚ ĂĚĚŝƟŽŶĂů ƌĞƐŝĚĞŶƚƐ ŽĨ EĞǁ zŽƌŬ ŝƚLJ ǁŚĞŶ ƚŚĞ ĐŝƚLJ ǁĂƐ Śŝƚ ďLJ ^ƵƉĞƌƐƚŽƌŵ ^ĂŶĚLJ

;ůŝŵĂƚĞ ĞŶƚƌĂů ϮϬϭϯͿ ƚŚŝƐ ŝŵƉĂĐƚ ǁĂƐ ĞdžĂĐĞƌďĂƚĞĚ ďĞĐĂƵƐĞ ƚŚĞ ƐƚŽƌŵ ƐƵƌŐĞ ŽĐĐƵƌƌĞĚ ŝŶ ƐƵƉĞƌƉŽƐŝƟŽŶ ǁŝƚŚ

ƉƌĞͲĞdžŝƐƟŶŐ ƐĞĂ ůĞǀĞů ƌŝƐĞ ;ĂƐ ǁĞůů ĂƐ ŚŝŐŚ ƟĚĞͿ /DƐ ĂƐ ƚŚĞLJ ĂƌĞ ĐƵƌƌĞŶƚůLJ ƐƚƌƵĐƚƵƌĞĚ ĚŽ ŶŽƚ ĐĂƉƚƵƌĞ ƚŚĞƐĞ

ŬŝŶĚƐ ŽĨ ĂĐƵƚĞ ŝŵƉĂĐƚƐ ĂŶĚ ŝŶƐƚĞĂĚ ǁŽƵůĚ ŵŽĚĞů ƚŚĞ ŝŵƉĂĐƚ ŽĨ Ă ĐLJĐůŽŶĞͲŝŶĚƵĐĞĚ ƐƵƌŐĞ ĂƐ Ă ƐŵĂůů ŝŶĐƌĞĂƐĞ ŝŶ

ĂǀĞƌĂŐĞ ƐĞĂ ůĞǀĞů ƐƉƌĞĂĚ ĂĐƌŽƐƐ ŵĂŶLJ LJĞĂƌƐ

EĂƚƵƌĂů ƐLJƐƚĞŵ ƐƚŽĐŚĂƐƟĐŝƚLJ ŝƐ ŶŽƚ ůŝŵŝƚĞĚ ƚŽ ƐĞĂ ůĞǀĞů ĂŶĚ ŇŽŽĚŝŶŐ ŝƚ ŝƐ ƵďŝƋƵŝƚŽƵƐ ĂŶĚ ĂīĞĐƚƐ ŵŽƐƚ ĐůŝŵĂƚĞ

ĐŚĂŶŐĞ ŝŵƉĂĐƚƐ Ɛ ĂŶŽƚŚĞƌ ĞdžĂŵƉůĞ ĐŽŶƐŝĚĞƌ ĐŽƌŶ LJŝĞůĚƐ ŵƉůŽLJŝŶŐ ƚŚĞ ŝŵƉĂĐƚ ĨƵŶĐƟŽŶ ŽĨ ^ĐŚůĞŶŬĞƌ ĂŶĚ

ZŽďĞƌƚƐ ;ϮϬϬϵͿ ƚŽ ĚĂŝůLJ ƚĞŵƉĞƌĂƚƵƌĞƐ ĨƌŽŵ dŽƉĞŬĂ <ĂŶƐĂƐ h^ ;EĂƟŽŶĂů ůŝŵĂƚĞ ĂƚĂ ĞŶƚĞƌ ϮϬϭϯͿ ǁĞ ĮŶĚ

ƚŚĂƚ ĐŽƌŶ LJŝĞůĚ ŽǀĞƌ ƚŚĞ ŝŶƚĞƌǀĂů ϭϵϳϯͲϭϵϵϴ ƐŚŽƵůĚ ŚĂǀĞ ǀĂƌŝĞĚ ďĞƚǁĞĞŶ ϲϬй ĂŶĚ ϭϭϬй ŽĨ ŝƚƐ ĞdžƉĞĐƚĞĚ ǀĂůƵĞ ĚƵĞ

ũƵƐƚ ƚŽ ƚŚĞ ĞīĞĐƚƐ ŽĨ ƚĞŵƉĞƌĂƚƵƌĞ ;&ŝŐ ϳͿ dŚĞ ƌĞƚƵƌŶ ŝŶƚĞƌǀĂů ŽĨ Ă ϮϬй ůŽƐƐ ĞǀĞŶƚ ;ŝĞ LJŝĞůĚ ŽĨ ϴϬй ŽĨ ŝƚƐ ĞdžƉĞĐƚĞĚ

ǀĂůƵĞͿ ǁĂƐ ĂďŽƵƚ ϮϬ LJĞĂƌƐ /Ŷ &ŝŐƐ ϴ ĂŶĚ ϵ ǁĞ ĞdžĂŵŝŶĞ ƚŚĞ ƉƌŽũĞĐƚĞĚ ĞīĞĐƚ ŽĨ ǁĂƌŵŝŶŐ ŽŶ ĞdžƉĞĐƚĞĚ ĐƌŽƉ LJŝĞůĚ

ƚŚĞ ϱƚŚͲϵϱƚŚ ƉĞƌĐĞŶƟůĞƐ ŽĨ ĐƌŽƉ LJŝĞůĚ ĂŶĚ ƚŚĞ ƌĞƚƵƌŶ ŝŶƚĞƌǀĂůƐ ŽĨ ĐƌŽƉ ůŽƐƐ ĞǀĞŶƚƐ tŝƚŚ ϭל ŽĨ ǁĂƌŵŝŶŐ ƚŚĞ

ƌĞƚƵƌŶ ŝŶƚĞƌǀĂů ŽĨ Ă ϮϬй ůŽƐƐ ĞǀĞŶƚ ĚƌŽƉƐ ƚŽ ϭϮ LJĞĂƌƐ ǁŝƚŚ Ϯל ŽĨ ǁĂƌŵŝŶŐ ƚŽ ϳ LJĞĂƌƐ ĂŶĚ ǁŝƚŚ ϰל ŽĨ ǁĂƌŵŝŶŐ

ŝƚ ĚƌŽƉƐ ƚŽ ϭϮ LJĞĂƌƐ ƐƟŵĂƚĞƐ ŽĨ ǁĞůĨĂƌĞ ůŽƐƐĞƐ ĂŶĚ ĞīĞĐƟǀĞ ĂĚĂƉƟǀĞ ŵĞĂƐƵƌĞƐ ŵƵƐƚ ďŽƚŚ ƚĂŬĞ ŝŶƚŽ ĂĐĐŽƵŶƚ

ƚŚĞ ŝŶĐƌĞĂƐŝŶŐ ĨƌĞƋƵĞŶĐLJ ŽĨ ĞdžĐĞƉƟŽŶĂůůLJ ůŽǁ ŚĂƌǀĞƐƚ LJĞĂƌƐ ĂŶĚ ƚŚĞ ƐƚƌĂŝŶ ƚŚĞLJ ĞdžĞƌƚ ŽŶ ŝŶƐƵƌĂŶĐĞ ĂŶĚ ƐĂĨĞƚLJ

ŶĞƚǁŽƌŬƐ ŶŽƚ ũƵƐƚ ƚŚĞ ĐŚĂŶŐĞ ŝŶ ĞdžƉĞĐƚĞĚ ĂŐƌŝĐƵůƚƵƌĂů LJŝĞůĚ tŝƚŚ ϭל ŽĨ ǁĂƌŵŝŶŐ ƚŚĞ ŝŶĐƌĞĂƐĞĚ ĨƌĞƋƵĞŶĐLJ ŽĨ

Black: Projected changes in yields using estimated models and the projected exposureof USA croplands. Changes averaged by cropland planted in a given crop. Comparabledamage functions from ENVISAGE, FUND and RICE are colored. From Kopp et al.

(2013).

Page 3: Distributed Meta-Analysis System

Introduction Application Architecture

The problem with meta-analysis

Essential (for scientists, policy, modelers), but...

Time-consuming Expensive with hundreds of papers

Infrequent For many questions, once a generation

Shallow Drop all extra factors in relationships

Reductive Need to ignore or limit methodologies, interests

Sweeping General results, missing targeted questions

Weak Methods Unweighted averages common in some fields

Biased Publication, reference, inclusion:

Page 4: Distributed Meta-Analysis System

Introduction Application Architecture

Essential Elements

Collecting collective collections

An intelligent database of results

Statistical (Bayesian) tools

Remixing, comparing, combining

But not:

Not asking for or analyzing data

Article authors not required

No computational models

Page 5: Distributed Meta-Analysis System

Introduction Application Architecture

The Site

Page 6: Distributed Meta-Analysis System

Introduction Application Architecture

Finding Models

Page 7: Distributed Meta-Analysis System

Introduction Application Architecture

Performing Meta-analyses

Page 8: Distributed Meta-Analysis System

Introduction Application Architecture

How to use it

1 Optional: Find a relevant collection, if one exists.

2 Create new models defining:

Publication author; journal; year; location in article;published/working paper/private

Variables dependent and independent variables(e.g. Yield (MT/Ha) vs. Temperature (C))

Population Optional: methodology, observation count, studyregion and years, population characteristics, . . .

Parameter Estimates

Page 9: Distributed Meta-Analysis System

Introduction Application Architecture

Creating Models

Models are parameter estimates, represented as conditionaldistributions.

Using structured model files:

Specifying summary statistics:

Generating stock model types:

Filling out a collection scaffold:

Model types

Page 10: Distributed Meta-Analysis System

Introduction Application Architecture

Merging Methods

With multiple models: compare them, combine them, regroupthem, reweight them; distribute them, share them

(a) Pooled estimate:

p(θ) = p(∩Ni=1θ = θi ) ∝

N∏i=1

p(θi )

(b) Bayesian hierarchical estimate:

θi ∼ N (µ, τ 2)

yi ∼ N (θi , σ2i )

Log mortality as a function of temperature (Deschenes andGreenstone, 2011, Barreca et al., 2013). Model weights

Page 11: Distributed Meta-Analysis System

Introduction Application Architecture

Smarter Crowdsourcing

Permissions Full control over who can view and edit

Curation Scientists can publish their view of the literature

Moderation Others can submit additions for review

Forms Custom forms to distribute to others

Discussion Every model and collection is a forum

Page 12: Distributed Meta-Analysis System

Introduction Application Architecture

Agriculture Example

Cotton Maize Soybeans Wheat

/ŵƉĂĐƚƐ tŽƌůĚ ϮϬϭϯ /ŶƚĞƌŶĂƟŽŶĂů ŽŶĨĞƌĞŶĐĞ ŽŶ ůŝŵĂƚĞ ŚĂŶŐĞ īĞĐƚƐWŽƚƐĚĂŵ DĂLJ ϮϳͲϯϬ ƌĂŌ DĂƌĐŚ Ϯϳ ϮϬϭϯ

0 10 20 30 40ï

0

Mean temperature during growing season (C)

$GGLWLRQDOH[SRVXUHWLP

HPR

0 10 20 30 400

5

1

Mean temperature during growing season (C)

Rel

ativ

e yi

eld

Wheat

0 10 20 30 400

5

1

Mean temperature during growing season (C)

Rel

ativ

e yi

eld

Maize

0 10 20 30 400

5

1

Mean temperature during growing season (C)

Rel

ativ

e yi

eld

Soybeans

0 10 20 30 400

5

1

Mean temperature during growing season (C)

Rel

ativ

e yi

eld

Cotton

0 10 20 30 40ï

0

Mean temperature during all months (C)

$GGLWLRQDOH[SRVXUHWLP

HPR

USA Global agriculture

&ŝŐƵƌĞ ϰ dŽƉ ĨŽƵƌ ƉĂŶĞůƐ ŵƉŝƌŝĐĂůůLJͲĚĞƌŝǀĞĚ ƌĞƐƉŽŶƐĞƐ ŽĨ ĨŽƵƌ ŵĂũŽƌ ĐƌŽƉƐ ƚŽ ŐƌŽǁŝŶŐ ƐĞĂƐŽŶ ƚĞŵƉĞƌĂƚƵƌĞ ŝŶhŶŝƚĞĚ ^ƚĂƚĞƐ ĐŽƵŶƟĞƐ ĨƌŽŵ ,ƐŝĂŶŐ Ğƚ Ăů ;ϮϬϭϮͿ >ŽǁĞƌ ƉĂŶĞůƐ dŚĞ ĂĚĚŝƟŽŶĂů ĂŵŽƵŶƚ ŽĨ ƟŵĞ ƚŚĂƚ ĐƌŽƉůĂŶĚƐŝŶ ƚŚĞ hŶŝƚĞĚ ^ƚĂƚĞƐ ĂŶĚ ƚŚĞ ǁŽƌůĚ ǁŝůů ďĞ ĞdžƉŽƐĞĚ ƚŽ ƚŚĞƐĞ ƚĞŵƉĞƌĂƚƵƌĞƐ ƵŶĚĞƌ Ϯל ǁĂƌŵŝŶŐ ŝŶ ŐůŽďĂů ŵĞĂŶƚĞŵƉĞƌĂƚƵƌĞ

Ɛ ƉĂƌĂŵĞƚƌŝnjĂƟŽŶƐ ŽĨ ĂĚĂƉƚĂƟŽŶ ĂƌĞ ĚĞǀĞůŽƉĞĚ ŝŶ ƚŚĞ ĞŵƉŝƌŝĐĂů ůŝƚĞƌĂƚƵƌĞ ƚŚĞƐĞ ĚĂŵĂŐĞ ĨƵŶĐƟŽŶ ĐĂůŝďƌĂƟŽŶƐ

ĐĂŶ ďĞ ĂĚũƵƐƚĞĚ ĂĐĐŽƌĚŝŶŐůLJ

ϯ dŚĞ ƌŽůĞ ŽĨ ƐƚŽĐŚĂƐƟĐŝƚLJ

DĂŶLJ ŶĂƚƵƌĂů ĂŶĚ ŚƵŵĂŶ ƐLJƐƚĞŵƐ ĂƌĞ ƐƚŽĐŚĂƐƟĐ &Žƌ ĞdžĂŵƉůĞ ǁĞĂƚŚĞƌ ŝƐ ƚŚĞ ŵĂŶŝĨĞƐƚĂƟŽŶ ŽĨ ǀĂƌŝĂŶĐĞ ĂƌŽƵŶĚ

ĐůŝŵĂƚŽůŽŐŝĐĂů ŵĞĂŶƐ ďƵƐŝŶĞƐƐ ĐLJĐůĞƐ ĂƌĞ ŵĂŶŝĨĞƐƚĂƟŽŶƐ ŽĨ ǀĂƌŝĂŶĐĞ ĂƌŽƵŶĚ ůŽŶŐͲƚĞƌŵ ĞĐŽŶŽŵŝĐ ŐƌŽǁƚŚ ůͲ

ƚŚŽƵŐŚ ĐůŝŵĂƚĞ ĚĂŵĂŐĞƐ ĂƌĞ ŽŌĞŶ ƉĂƌƟĂůůLJ ƌĞĂůŝnjĞĚ ƚŚƌŽƵŐŚ ƐŚŝŌƐ ŝŶ ĐůŝŵĂƚŽůŽŐŝĐĂů ĞdžƚƌĞŵĞƐ /DƐ ŚĂǀĞ ŐĞŶĞƌͲ

ĂůůLJ ŶŽƚ ĞdžƉůŝĐŝƚůLJ ŝŶĐůƵĚĞĚ LJĞĂƌͲƚŽͲLJĞĂƌ ǀĂƌŝĂďŝůŝƚLJ /D ǁĞůĨĂƌĞ ĂŶĂůLJƐŝƐ ƚŚĞƌĞĨŽƌĞ ŝŵƉůŝĐŝƚůLJ ĂƐƐƵŵĞƐ ƉĞƌĨĞĐƚůLJͲ

ĨƵŶĐƟŽŶŝŶŐ ŵĂƌŬĞƚƐ ĂŶĚ ŝŶƐƟƚƵƟŽŶƐ ĐĂƉĂďůĞ ŽĨ ƐƉƌĞĂĚŝŶŐ ƌŝƐŬ ŽǀĞƌ ƟŵĞ ĂŶĚ ƚŚƵƐ ĂůůŽǁŝŶŐ ƚŚĞ ǁĞůĨĂƌĞ ŝŵƉĂĐƚ ŽĨ

ĂǀĞƌĂŐĞ ĚĂŵĂŐĞƐ ƚŽ ďĞ Ă ŐŽŽĚ ƐƵďƐƟƚƵƚĞ ĨŽƌ ƚŚĞ ǁĞůĨĂƌĞ ŝŵƉĂĐƚ ŽĨ Ă ƐĞƋƵĞŶĐĞ ŽĨ ĂĐƚƵĂů ůŽƐƐ ƌĞĂůŝnjĂƟŽŶƐ ;ǁŚŝĐŚ

ǀĂƌLJ ĂƌŽƵŶĚ ƚŚŝƐ ĂǀĞƌĂŐĞͿ tŝƚŚŽƵƚ ƐƵĐŚ ŵĂƌŬĞƚƐ Žƌ ŝŶƐƟƚƵƟŽŶƐ ŚŽǁĞǀĞƌ ƚŚĞ ĂďƐĞŶĐĞ ŽĨ ŝŶƚĞƌͲĂŶŶƵĂů ǀĂƌŝĂďŝůŝƚLJ

ůŝŬĞůLJ ůĞĂĚƐ ƚŽ ĂŶ ƵŶĚĞƌĞƐƟŵĂƚĞ ŽĨ ĨƵƚƵƌĞ ǁĞůĨĂƌĞ ůŽƐƐĞƐ

dŚĞ ŝŵƉŽƌƚĂŶĐĞ ŽĨ ƐƚŽĐŚĂƐƟĐŝƚLJ ŝƐ ĐůĞĂƌůLJ ŝůůƵƐƚƌĂƚĞĚ ďLJ ƚŚĞ ĞdžĂŵƉůĞ ŽĨ ƐĞĂ ůĞǀĞů ĐŚĂŶŐĞ DŽƐƚ ĚĂŵĂŐĞ ĚƵĞ ƚŽ ƐĞĂ

Temperature Yield Models

Growing Season Average Temperatures

Yield Impacts by Crop

Baseline Production by County (2007)

Page 13: Distributed Meta-Analysis System

Introduction Application Architecture

Agriculture Example

Total Yield Impacts, averaged by production

Page 14: Distributed Meta-Analysis System

Introduction Application Architecture

The benefits to you

Performing meta-analyses

Science as advancement of knowledge

Building a public good

Ready-made gold-standard

Ready-made literature comparison

A platform for variable requests

Page 15: Distributed Meta-Analysis System

Introduction Application Architecture

Opensourced Library

Page 16: Distributed Meta-Analysis System

Introduction Application Architecture

Ticket System

Page 17: Distributed Meta-Analysis System

Introduction Application Architecture

New visions

Forge better connections between research and policy andmodelers

Incorporate meta-analysis into all research

Counterbalance publication, reference, and inclusion bias

Thank you!

http://shackleton.gspp.berkeley.edu

Page 18: Distributed Meta-Analysis System

Introduction Application Architecture

Model types

Delta Model p(y |x) = δ(y − g(x))

Discrete-Discrete Model p(yj |xi ) = pij

Spline Model p(y |x) =

ea0+b0y+c0y2

for y0 ≤ y < y1

ea1+b1y+c1y2for y1 ≤ y < y2

· · · · · ·

Bin Model p(y |x) =

p1(y |x) for x0 < x ≤ x1

p2(y |x) for x1 < x ≤ x2

· · · · · ·Mean-Size Model IE[Y |X = xi ] = yi with population si

Multivariate Model p(y |x , . . . , z) = q(y |(x , . . . , z))

Return

Page 19: Distributed Meta-Analysis System

Introduction Application Architecture

Model weights

Pooled:

p(y |x , f ) ∝∏m

f αmm (y |x)

∝ f1(f |y , x)f2(f |y , x)f2(f |y , x) for α1 = 1 and α2 = 2

Hierachical Bayesian:

p(θ, µ, τ |y) ∝ p(µ, τ)∏m

p(θm|µ, τ)f αm(y |θm)

∝ p(µ, τ)p(θ1|µ, τ)p(θ2|µ, τ)f (y |θ1)f (y |θ2)f (y |θ2) for α1 = 1 and α2 = 2

Return