quantifying uncertainty in the uk carbon flux
DESCRIPTION
Quantifying uncertainty in the UK carbon flux. Tony O’Hagan University of Sheffield. Outline. The carbon flux problem Quantifying input uncertainties Propagating uncertainty Results. Computer models. - PowerPoint PPT PresentationTRANSCRIPT
29 May 2008 IMA Scottish Branch 1
Quantifying uncertainty in the UK carbon flux
Tony O’HaganUniversity of Sheffield
29 May 2008 IMA Scottish Branch 2
Outline
The carbon flux problem
Quantifying input uncertainties
Propagating uncertainty
Results
29 May 2008 IMA Scottish Branch 3
Computer models
In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real-world processes
For understanding, prediction, control
There is a growing realisation of the importance of uncertainty in model predictions
Can we trust them?Without any quantification of output uncertainty, it’s easy to dismiss them
29 May 2008 IMA Scottish Branch 4
Examples
Climate prediction
Molecular dynamics
Nuclear waste disposal
Oil fields
Engineering design
Hydrology
29 May 2008 IMA Scottish Branch 5
Carbon flux
Vegetation can be a major factor in mitigating the increase of CO2 in the atmosphere
And hence reducing the greenhouse effect
Through photosynthesis, plants take atmospheric CO2
Carbon builds new plant material and O2 is released
But some CO2 is released again Respiration, death and decay
The net reduction of CO2 is called Net Biosphere Production (NBP)
I will refer to it as the carbon flux
Complex processes modelled in SDGVMSheffield Global Dynamic Vegetation Model
29 May 2008 IMA Scottish Branch 6
CTCD
The Centre for Terrestrial Carbon Dynamics was a NERC Centre of Excellence
Now part of National Centre for Earth Observation
One major exercise was to estimate the carbon flux from vegetation in England and Wales in 2000
SDGVM run at each of 707 pixels over England & Wales4 plant functional types (PFTs)Principal output is NBPMany inputs
29 May 2008 IMA Scottish Branch 7
SDGVM C flux outputs for 2000
Map of SDGVM estimatesshows positive flux (C sink)in North, but negative(C source) in Midlands
Total estimated flux is9.06 Mt C
Highly dependent onweather, so will varygreatly between years
29 May 2008 IMA Scottish Branch 8
Accounting for uncertainty
There are several sources of uncertaintyUncertain inputs
PFT parameters, defining plant growth etc
Soil structure
Land cover types
Weather
Model structureAll models are wrong!
Two main challengesFormally quantifying these uncertainties
Propagating input uncertainty through the model
29 May 2008 IMA Scottish Branch 9
Progress to date
A paper dealing with uncertainty in plant functional inputs and soil inputs
Kennedy, O'Hagan, Anderson et al (2008). Quantifying uncertainty in the biospheric carbon flux for England and Wales. J. Royal Statistical Society A 171, 109--135.
A paper showing how to quantify uncertainty in land cover
Cripps, O'Hagan, Quaife and Anderson (2008). Modelling uncertainty in satellite derived land cover maps. http://tonyohagan.co.uk/academic/pub.html
Recent work combines these
Still need to account for uncertainty in weather and model structure
29 May 2008 IMA Scottish Branch 10
Quantifying input uncertainties
Plant functional type parametersExpert elicitation
Soil compositionSimple analysis from extensive data
Land coverMore complex analysis of ‘confusion matrix’ data
29 May 2008 IMA Scottish Branch 11
Elicitation
Beliefs of expert (developer of SDGVMd) regarding plausible values of PFT parameters
Four PFTs – Deciduous broadleaf (DBL), evergreen needleleaf (ENL), crops, grass
Many parameters for each PFTKey ones identified by preliminary sensitivity analysis
Important to allow for uncertainty about mix of species in a pixel and role of parameter in the model
In the case of leaf life span for ENL, this was more complex
29 May 2008 IMA Scottish Branch 12
ENL leaf life span
29 May 2008 IMA Scottish Branch 13
Correlations
PFT parameter value at one site may differ from its value in another
Because of variation in species mix
Common uncertainty about average over all species induces correlation
Elicit beliefs about average over whole UKENL joint distributions are mixtures of 25 components, with correlation both between and within years
29 May 2008 IMA Scottish Branch 14
Soil composition
Percentages of sand, clay and silt, plus bulk density
Soil map available at high resolution
Multiple values in each SDGVM siteUsed to form average (central estimate)
And to assess uncertainty (variance)
Augmented to allow for uncertainty in original data (expert judgement)
Assumed independent between pixels
29 May 2008 IMA Scottish Branch 15
Land cover map
LCM2000 is another high resolution mapObtained from satellite imagesVegetation in each pixel assigned to one of 26 classesAggregated to give proportions of each PFT at each siteBut data are uncertain
Field data are available at a sample of pixelsCountryside Survey 2000Table of CS2000 class versus LCM2000 class is called the confusion matrix
29 May 2008 IMA Scottish Branch 16
CS2000 versus LCM2000 matrix
Not symmetric
Rather small numbers
LCM2000
CS2000
DBL ENL Grass Crop Bare
DBL 66 3 19 4 5
Enl 8 20 1 0 0
Grass 31 5 356 22 15
Crop 7 1 41 289 9
Bare 2 0 3 8 81
29 May 2008 IMA Scottish Branch 17
Modelling land cover
The matrix tells us about the probability distribution of LCM2000 class given the true (CS2000) class
Subject to sampling errors
But we need the probability distribution of true PFT given observed PFT
Posterior probabilities as opposed to likelihoodsWe need a prior distribution for land coverWe used observations in a neighbourhood
Implicitly assuming an underlying smooth random field
And the confusion matrix says nothing about spatial correlation of LCM2000 errors
We again relied on expert judgementUsing a notional equivalent number of independent pixels per site
29 May 2008 IMA Scottish Branch 18
Overall proportions
Red lines show LCM2000 proportionsClear overall biases
Analysis gives estimates for all PFTs in each SDGVM siteWith variances and correlations
29 May 2008 IMA Scottish Branch 19
Propagating uncertainty
Uncertainty analysisProblems with simple Monte Carlo approach
EmulationGaussian process emulation
The MUCM project
29 May 2008 IMA Scottish Branch 20
Uncertainty analysis
We have a computer model that produces output y = f (x) when given input x
But for a particular application we do not know x precisely
So X is a random variable, and so therefore is Y = f (X )
We are interested in the uncertainty distribution of Y
How can we compute it?
29 May 2008 IMA Scottish Branch 21
Monte CarloThe usual approach is Monte Carlo
Sample values of x from its distributionRun the model for all these values to produce sample values yi = f (xi)These are a sample from the uncertainty distribution of Y
Typically requires thousands of samples of input parameters
And in this case we would need to run SDGVM 4x707 times for each sample!
Neat but impractical if it takes minutes or hours to run the model
29 May 2008 IMA Scottish Branch 22
Emulation
A computer model encodes a function, that takes inputs and produces outputs
An emulator is a statistical approximation of that function
Estimates what outputs would be obtained from given inputs
With statistical measure of estimation error
Given enough training data, estimation error variance can be made small
29 May 2008 IMA Scottish Branch 23
So what?
A good emulator estimates the model output accurately
with small uncertainty
and runs “instantly”
So we can do uncertainty analysis etc fast and efficiently
Conceptually, weuse model runs to learn about the function
then derive any desired properties of the model
29 May 2008 IMA Scottish Branch 24
GP solution
Treat f (.) as an unknown function with Gaussian process (GP) prior distribution
Use available runs as observations without error, to derive posterior distribution (also GP)
Make inference about the uncertainty distributionE.g. The mean of Y is the integral of f (x) with respect to the distribution of X
Its posterior distribution is normal conditional on GP parameters
29 May 2008 IMA Scottish Branch 25
Why GP emulation?
Simple regression models can be thought of as emulators
But error estimates are invalid
We use Gaussian process emulationNonparametric, so can fit any function
Error measures can be validated
Analytically tractable, so can often do uncertainty analysis etc analytically
Highly efficient when many inputs
Reproduces training data correctly
29 May 2008 IMA Scottish Branch 26
2 code runs
Consider one input and one output
Emulator estimate interpolates data
Emulator uncertainty grows between data points
29 May 2008 IMA Scottish Branch 27
3 code runs
Adding another point changes estimate and reduces uncertainty
29 May 2008 IMA Scottish Branch 28
5 code runs
And so on
29 May 2008 IMA Scottish Branch 29
BACCO
This has led to a wide ranging body of tools for inference about all kinds of uncertainties in computer models
All based on building the GP emulator of the model from a set of training runs
This area is now known as BACCOBayesian Analysis of Computer Code Output
29 May 2008 IMA Scottish Branch 30
BACCO includes
Uncertainty analysis
Sensitivity analysis
Calibration
Data assimilation
Model validation
Optimisation
Etc…
All within a single coherent framework
29 May 2008 IMA Scottish Branch 31
MUCM
Managing Uncertainty in Complex ModelsLarge 4-year research grant
Started in June 2006
7 postdoctoral research assistants
4 PhD studentships
Based in Sheffield, Durham, Aston, Southampton, LSE
Objective: to develop BACCO methods into a robust technology that is widely applicable across the spectrum of modelling applications
29 May 2008 IMA Scottish Branch 32
Emulation of SDGVM
We built GP emulators of all 4 PFTs at 30 of the 707 sites
Estimates (posterior means) and uncertainties (variances and covariances) inter-/extrapolated to the other sites by kriging
Uncertainty due to both emulation and kriging separately accounted for
29 May 2008 IMA Scottish Branch 33
Sensitivity analysis for one site/PFT
Used to identify the most important inputs. These are the ones we needed to formulate uncertainty about carefully.
29 May 2008 IMA Scottish Branch 34
Results
29 May 2008 IMA Scottish Branch 35
Mean NBP corrections
29 May 2008 IMA Scottish Branch 36
NBP standard deviations
29 May 2008 IMA Scottish Branch 37
Aggregate across 4 PFTs
Mean NBP Standard deviation
29 May 2008 IMA Scottish Branch 38
England & Wales aggregate
PFTPlug-in estimate
(Mt C)Mean(Mt C)
Variance (Mt C2)
Grass 5.28 4.37 0.2453
Crop 0.85 0.43 0.0327
Deciduous 2.13 1.80 0.0221
Evergreen 0.80 0.86 0.0048
Covariances -0.0081
Total 9.06 7.46 0.2968
29 May 2008 IMA Scottish Branch 39
Sources of uncertainty
The total variance of 0.2968 is made up as follows
Variance due to PFT and soil inputs = 0.2642
Variance due to land cover uncertainty = 0.0105
Variance due to interpolation/emulation = 0.0222
Land cover uncertainty much larger for individual PFT contributions
Dominates for ENL
But overall tends to cancel out
29 May 2008 IMA Scottish Branch 40
Conclusions
Bayesian methods offer a powerful basis for computation of uncertainties in model predictionsAnalysis of E&W aggregate NBP in 2000
Good case study for uncertainty and sensitivity analyses
But needs to take account of more sources of uncertainty
Involved several technical extensionsHas important implications for our understanding of C fluxesPolicy implications
29 May 2008 IMA Scottish Branch 41
Finally
This was joint work with many othersPlant, soil and earth observation – Shaun Quegan, Ian Woodward, Mark Lomas, Tristan Quaife, Andreas Heinemeyer, Phil Ineson
Statistics – Marc Kennedy, John Paul Gosling, Ed Cripps, Keith Harris, Clive Anderson
Linkshttp://www.shef.ac.uk/ctcd
http://mucm.group.shef.ac.uk
http://tonyohagan.co.uk/academic