le bauer: data driven model development

28
Data Driven Model Development David LeBauer, Mike Dietze, Deepak Jaiswal, Rob Kooper, Stephen P. Long, Shawn Serbin, Dan Wang

Upload: questrcn

Post on 27-Jul-2015

323 views

Category:

Technology


4 download

TRANSCRIPT

Data Driven Model DevelopmentDavid LeBauer, Mike Dietze, Deepak Jaiswal, Rob Kooper, Stephen P. Long, Shawn Serbin, Dan Wang

Objective: Useful Predictions

Clark et al. 2001 Ecological Forecasts, An Emerging Imperative. Science

Precision, Accuracy

In

form

ati

on

An error has occurred. To continue:

Press Enter to return to Windows, or

Press CTRL+ALT+DEL to restart your computer. If you do this, you will loose any unsaved information in all open applications

Error: 0E : 016F : BFF9B3D4

Press any key to continue _

Windows

Technical UncertaintyA Cautionary Tale

Yie

ld

Observed

Technical UncertaintyA Cautionary Tale

Yie

ld

Pri

ors

Observed

Technical UncertaintyA Cautionary Tale

Yie

ld

+ T

rait

Data

Pri

ors

Observed

Technical UncertaintyA Cautionary Tale

Yie

ld

+ T

rait

Data

+ F

lux D

ata

Pri

ors

Observed

Technical UncertaintyA Cautionary Tale

Yie

ld

+ T

rait

Data

+ F

lux D

ata

Pri

ors

Observed

Annual Merge

Technical UncertaintyA Cautionary Tale

Yie

ld

+ T

rait

Data

+ F

lux D

ata

+ L

ate

st V

ers

ion

Pri

ors

Observed

Annual Merge

Best Practices Write programs for people, not computers

Automate repetitive tasks

Use the computer to record history

Make incremental changes

Use version control

Don't repeat yourself (or others)

Plan for mistakes

Optimize software only after it works correctly

Document the design and purpose of code

Conduct code reviewsWilson et al 2012. Best Practices for Scientific Computing. arXiv:1210.0530v3

Best Practices Write programs for people, not computers

Automate repetitive tasks

Use the computer to record history

Make incremental changes

Use version control

Don't repeat yourself (or others)

Plan for mistakes

Optimize software only after it works correctly

Document the design and purpose of code

Conduct code reviewsWilson et al 2012. Best Practices for Scientific Computing. arXiv:1210.0530v3

Best Practices 1: Automation

Altintas et al 2004. Kepler: an extensible system for design and execution of scientific workflows. Proc 16th ICSSDM

Write programs for people, not computers

Automate repetitive tasks

Use the computer to record history

Make incremental changes

Use version control

Don't repeat yourself (or others)

Plan for mistakes

Optimize software only after it works correctly

Document the design and purpose of code

Conduct code reviews

Parameter Uncertainty: Test Case

Single Analysis:

Contribution of parameter uncertainty to uncertainty in Switchgrass Yield prediction.

LeBauer, Wang, Richter, Davidson, and Dietze 2013. Facilitating Feedbacks between ecological models and data. Ecological

Monographs

Parameter Uncertainty: Automated

Contribution of parameter uncertainty to model uncertainty.

* 17 Plant functional types

* 6 biomes

* 8 scientists

* 6 Months

Dietze, Serbin, LeBauer, Davidson, Desai, Feng, Kelly, Kooper, LeBauer, Mantooth, McHenry, and Wang. submitted

A quantitative assessment of a terrestrial biosphere model's data needs across North American biomes. JGR

% S

D

Exp

lain

ed

Best Practices 2: Iteration with Testing

Wilson et al 2012. Best Practices for Scientific Computing. arXiv:1210.0530v3

Write programs for people, not computers

Automate repetitive tasks

Use the computer to record history

Make incremental changes

Use version control

Don't repeat yourself (or others)

Plan for mistakes

Optimize software only after it works correctly

Document the design and purpose of code

Conduct code reviews

Case Study:C4 Crop Coppice Willow

C3 Photosynthes

isPerennial

StemLeaf

Senescence

Benchmark Data

Aboveground Biomass

23 Calibration Sites

72 Observations

Observed (Mg/ha)

40.0

20.0

0.0

60.0

RMSE*

Correlation

Standard Deviation*

0

1

1

*Scaled to sddata = 1

Results:

Start (C4 Grass)

+ C3 Photosynthesis

+ Perennial Stem

+ Fixed Respiration

+ Leaf Senescence

0.74

0.67

0.20

RMSE*

Correlation

Standard Deviation*

0

1

1

*Scaled to sddata = 1

Results:

Start (C4 Grass)

+ C3 Photosynthesis

+ Perennial Stem

+ Fixed Respiration

+ Leaf Senescence

0.74

0.67

0.20

RMSE*

Correlation

Standard Deviation*

0

1

1

*Scaled to sddata = 1

Results:

Start (C4 Grass)

+ C3 Photosynthesis

+ Perennial Stem

+ Fixed Respiration

+ Leaf Senescence

0.74

0.67

0.20

RMSE*

Correlation

Standard Deviation*

0

1.46

1

1

*Scaled to sddata = 1

Results:

Start (C4 Grass)

+ C3 Photosynthesis

+ Perennial Stem

+ Fixed Respiration

+ Leaf Senescence

0.74

0.67

0.20

RMSE*

Correlation

Standard Deviation*

0

1.46

1

1

*Scaled to sddata = 1

Results:

Start (C4 Grass)

+ C3 Photosynthesis

+ Perennial Stem

+ Fixed Respiration

+ Leaf Senescence

0.74

0.67

0.20

RMSE*

Correlation

Standard Deviation*

0

0.30

0.87

0.84

1.46

1

1

*Scaled to sddata = 1

Results:

Start (C4 Grass)

+ C3 Photosynthesis

+ Perennial Stem

+ Fixed Respiration

+ Leaf Senescence

Observed

Aboveground Biomass (Mg/ha)

Pre

dic

ted

0.0

50.0

100

80.0

50.0

0.0

Conclusions * Best practices lead to more effective and efficient modeling

* Applied integration tests to support model development

* Controlling technical error produces more robust and accurate inference

Future Directions * Track benchmark metrics for specific model runs

* Maintain ability to reproduce published results

* Automated testing with each code commit or major release

* Current Metrics to define limits of model credibility

More Information Email: [email protected]

Web: pecanproject.org

Development:github.com/pecanproject