using bayesian hierarchical modelling to produce high resolution maps of air pollution in the eu
Post on 22-Feb-2016
31 Views
Preview:
DESCRIPTION
TRANSCRIPT
USING BAYESIAN HIERARCHICAL
MODELLING TO PRODUCE HIGH RESOLUTION MAPS OF AIR POLLUTION IN THE
EUGavin Shaddick
University of Bath
RSS Avon Local Group October 2006
Air Pollution Modelling for Support Air Pollution Modelling for Support to Policy on Health, Environment to Policy on Health, Environment and Risk Management in Europeand Risk Management in Europe
APMoSPHERE
APMoSPHERE is a thematic project, funded under the Global Monitoring for Environment
and Security initiative, as part of the European Union’s Fifth Research Framework
Programme.
Its aim is to compile high resolution maps of air pollution across the EU, as a basis for scientific
research and policy support.
AAir ir PPollution ollution MModelling for odelling for SSupport upport to to PPolicy on olicy on HHealth, ealth, EEnvironment nvironment and and RRisk isk MManagement in anagement in EEuropeurope
APMoSPHERE
APMoSPHERE is a thematic project, funded under the Global Monitoring for Environment
and Security initiative, as part of the European Union’s Fifth Research Framework
Programme.
Its aim is to compile high resolution maps of air pollution across the EU, as a basis for scientific
research and policy support.
PartnersPartnersAPMoSPHERE
•Department of Epidemiology and Public Health, Imperial College
•Institute for Risk Assessment Sciences, University of Utrecht
•Institute for Environmental Research and Sustainable Development, National Observatory of Athens
•Centre for International Climate and Environmental Research, Oslo
•Department of Mathematical Sciences, University of Bath
•AEA Technology Netcen
What APMoSPHERE will do
Key objectives of APMoSPHERE are:
to produce a detailed (1km) inventory of atmospheric emissions by major sector for the EU
to develop and test a range of different methods for mapping air pollution on the basis of these emissions estimates, in combination with other routinely available data sets (including air pollution monitoring data)
using these various methods and data sets to generate detailed (1km) and updatable maps of air pollution, together with a set of policy-related indicators on potential ecological and health risks
based on these results, to provide an assessment of the air pollution situation in the EU, and implications for future air quality monitoring and policy
The pollutants:•Particulates (PM10
and black smoke)•Nitrogen oxides (NOx
and NO2)•Carbon monoxide•Sulphur dioxide•Ozone
What APMoSPHERE will do
Key objectives of APMoSPHERE are:
to produce a detailed (1km) inventory of atmospheric emissions by major sector for the EU
to develop and test a range of different methods for mapping air pollution on the basis of these emissions estimates, in combination with other routinely available data sets (including air pollution monitoring data)
using these various methods and data sets to generate detailed (1km) and updatable maps of air pollution, together with a set of policy-related indicators on potential ecological and health risks
based on these results, to provide an assessment of the air pollution situation in the EU, and implications for future air quality monitoring and policy
Particulates (PM10 and black smoke)
•Nitrogen oxides (NOx and NO2)
•Carbon monoxide
•Sulphur dioxideSulphur dioxide•Ozone
Geographic Information Geographic Information SystemSystem Study Area
EU15 + Norway Concentration data
AIRBASE & EMEP * 1 km predictors
Topography Meteorology Roads * Land cover * Light intensity * Modelled Emissions *
Population data 1 km modelled
population*
AimsAims Provide modelled exposures (and Provide modelled exposures (and
measures of uncertainty).measures of uncertainty).• Impute missing valuesImpute missing values• Unmeasured locationsUnmeasured locations
Combine information from multiple sourcesCombine information from multiple sources Investigate the spatio-temporal modelling Investigate the spatio-temporal modelling
of pollutants.of pollutants. Assessing the contribution of spatial, temporal and Assessing the contribution of spatial, temporal and
random variabilty.random variabilty.
Data dependenciesData dependencies Relationship with covariatesRelationship with covariates
Climate, e.g. temperatureClimate, e.g. temperature Local emissions, e.g. land coverLocal emissions, e.g. land cover Topography, e.g. altitudeTopography, e.g. altitude
Temporal dependencies.Temporal dependencies. Spatial dependencies.Spatial dependencies.
Distance between monitoring sites.Distance between monitoring sites. Site type (e.g. background, traffic).Site type (e.g. background, traffic).
Model frameworkModel framework Bayesian Hierarchical Model.Bayesian Hierarchical Model. Pollutants (log) modelled as a function of the Pollutants (log) modelled as a function of the
‘true’ underlying level with unstructured error.‘true’ underlying level with unstructured error. Incorporate covariate informationIncorporate covariate information
True underlying level is a function of the True underlying level is a function of the previous year’s level.previous year’s level.
Missing values treated as unknown parameters Missing values treated as unknown parameters within the Bayesian framework and can be within the Bayesian framework and can be estimated.estimated.
Priors Information from previous studies or Information from previous studies or
yearsyears Expert opinionExpert opinion Physical sciencePhysical science ‘‘vague’vague’
Posteriors and parameter estimates
In simple cases, e.g. where both prior In simple cases, e.g. where both prior and likelihood are conjugate, exact and likelihood are conjugate, exact expressions for the posterior expressions for the posterior distributions can be founddistributions can be found
In more complex cases, the posterior In more complex cases, the posterior may be intractablemay be intractable Can use simulation to ‘build up’ the posteriorCan use simulation to ‘build up’ the posterior
MCMC (WinBUGS)MCMC (WinBUGS)
Model stagesModel stages Level 1 : Observed data stage.Level 1 : Observed data stage.
YYtt = = tt + covariates + site effect + v + covariates + site effect + vtt, v, vtt ~ N(0, ~ N(0,vv))
Level 2(a) : Temporal/system stage.Level 2(a) : Temporal/system stage. tt = = ααt-1t-1 + w + wtt, w, wtt ~ N(0, ~ N(0,ww))
Level 2(b) : Spatial stageLevel 2(b) : Spatial stage Site random effects modelled as multivariate normal with Site random effects modelled as multivariate normal with
correlations proportional to the distance,d, between sites.correlations proportional to the distance,d, between sites. f(d) = exp(-f(d) = exp(-d)d)
Site effects can be estimated at unmeasured locations Site effects can be estimated at unmeasured locations conditional on the measured values.conditional on the measured values.
Level 3 : Hyperparameters.Level 3 : Hyperparameters. Assign prior distributions to covariate effects and variances.Assign prior distributions to covariate effects and variances.
Prior informationPrior information For the spatial effectFor the spatial effect
ΦΦ given a uniform (1.3-4) distribution given a uniform (1.3-4) distribution Corresponds to correlations falling to Corresponds to correlations falling to
between 0.13 and 0.52 at a distance of 50kmbetween 0.13 and 0.52 at a distance of 50km Normal distributions for covariate Normal distributions for covariate
effectseffects Gamma distributions for (inverse of) Gamma distributions for (inverse of)
variances [precisions]variances [precisions]
ResultsResults UK data for SO2, 1997-2001UK data for SO2, 1997-2001
Components of variationComponents of variation Random (unstructured) error – 26%Random (unstructured) error – 26% Temporal – 13%Temporal – 13% Spatial – 61%Spatial – 61%
Random errorRandom error
Posterior estimates –Posterior estimates – temporal components temporal components
[1]
[2]
[3]
[4]
[5]
box plot: theta
-0.5
-0.25
0.0
0.25
0.5
Spatial effectsSpatial effects
-1400000 -1000000 -600000 -400000 -200000
2000
0040
0000
6000
0080
0000
1000
000
1200
000
1400
000
X Coord
Y C
oord
-1.413 - -0.15-0.15 - -0.064-0.064 - -0.024-0.024 - 00 - 0.0010.001 - 0.0110.011 - 0.0420.042 - 0.0890.089 - 0.1920.192 - 0.854
Spatial effect
-1200000 -800000 -600000 -400000
4000
0060
0000
8000
0010
0000
0
X Coord
Y C
oord
[-1.75 ,-0.14](-0.14, 0.06](0.06, 0.23](0.23, 0.48](0.48, 0.97]
SO2 median spatial effect for UK
Posterior median for Posterior median for ΦΦ : 3.79, 95% CrI (2.95-4.00) : 3.79, 95% CrI (2.95-4.00)
Predictions for UKPredictions for UK
-1400000 -1200000 -1000000 -800000 -600000 -400000 -200000
2000
0040
0000
6000
0080
0000
1000
000
1200
000
1400
000
X Coord
Y C
oord
7.679 - 22.1822.18 - 26.87426.874 - 31.37131.371 - 35.76535.765 - 40.03140.031 - 44.79844.798 - 50.84150.841 - 61.68761.687 - 91.88191.881 - 328.153
Length of 95% credible intervals for predictions
-1400000 -1200000 -1000000 -800000 -600000 -400000 -200000
2000
0040
0000
6000
0080
0000
1000
000
1200
000
1400
000
X Coord
Y C
oord
1.89 - 6.4046.404 - 6.8316.831 - 7.127.12 - 7.4457.445 - 7.7427.742 - 8.0768.076 - 8.5088.508 - 9.1389.138 - 10.26710.267 - 23.707
Predictions
Overall mean + temporal (2001) effect + covariate effect + spatial effect
Extending methodology Extending methodology to EU levelto EU level
Increased number of sites brings Increased number of sites brings large computational burdenlarge computational burden
Following analysis performed on NO2 Following analysis performed on NO2 in 2001in 2001 75 % dataset (sites) used to build 75 % dataset (sites) used to build
modelsmodels 25 % for validation25 % for validation
Modelling at different scalesModelling at different scales Based on theoretical and empirical environmental Based on theoretical and empirical environmental
modelsmodels VariogramsVariograms
Scales defined by site type and associated covariatesScales defined by site type and associated covariates Global (climate and topological)Global (climate and topological) Rural (transport, population density, agriculture)Rural (transport, population density, agriculture) Urban (transport, population density, urban greenery)Urban (transport, population density, urban greenery)
Site type Modelling set Validation set Total Global background 225 87 312 Rural 225 87 312 Urban 459 141 600
Eases computational burdenEases computational burden
CovariatesCovariates Global Background (1
km) Rural scale (5 km, 21 km)
Urban scale (1km)
Variable NO2 CO PM SO2
O3 NO2 CO PM SO2
O3 NO2 CO PM SO2
O3
Motorways A roads + + + + - + + + + -
B roads Minor roads + + + + - + + + + - Railways HD residential + + + + + + + + + - LD residential + + + + + + + + + Industry + + + + Transport + + + Airports + + + Seaports + + + + Construction
+ + + + -
+ + + + Urban greenery - - - - + Forestry + - - - - + Agriculture + + Altitude - - - - + - - - + + Topex - - - - + - - - - + Distance to sea + + - + -
Combined into 5 Factors Seasonal temperature + + + + + Annual radiation + + + + + Seasonal wind speed + + + + + Seasonal calm + + + + + Annual vapour pressure + + + + + Annual rainfall + + + + +
Model stagesModel stages Global modelGlobal model
YYGsGs = = GG + global covariates + global covariatesSS + site effects + v + site effects + vGsGs, v, vGsGs ~ N(0, ~ N(0,22GG))
Rural modelRural model (Y(YRsRs – predicted(Y – predicted(YRsRs) ) = ) ) = RR + rural covariates + rural covariatesSS + v + vRsRs, v, vRsRs ~ N(0, ~ N(0,22
RR)) Urban modelUrban model
(Y(YUsUs – predicted(Y – predicted(YUsUs) ) = ) ) = UU + urban covariates + urban covariatesSS + v + vUsUs, v, vUsUs ~ N(0, ~ N(0,22UU))
Predictions were made using the global models Predictions were made using the global models for every one of the 1km x 1km cells (2854116)for every one of the 1km x 1km cells (2854116) additional effects of rural (2788454 cells) additional effects of rural (2788454 cells) urban (65662 cells)urban (65662 cells) used to create an further two sets of predictions used to create an further two sets of predictions
which were then combined to create a composite which were then combined to create a composite map.map.
Results – global modelResults – global model Increases with distanceIncreases with distance from sea and for climate from sea and for climate variables 2 & 5 – areas variables 2 & 5 – areas with warm or hot summerswith warm or hot summers
Decreases with altitudeDecreases with altitude
Posterior median for Posterior median for , 0.037, , 0.037, corresponds to fall in corresponds to fall in correlation to 0.024 at 100kmcorrelation to 0.024 at 100km
Without any geograpahical Without any geograpahical covariates, covariates, much smaller (by much smaller (by factor of ten), indicating much factor of ten), indicating much more ‘spatial’ residual error more ‘spatial’ residual error
Results – rural and urbanResults – rural and urban Rural - significant effect of Rural - significant effect of
major roadsmajor roads
Urban - clear overall Urban - clear overall increase (intercept term)increase (intercept term)
transport (major, transport (major, minor roads)minor roads)
population population densitydensity
negative negative associationassociation with altitudewith altitude
PollutantNO2
ScaleComposite of global, rural and urbanbackground
Time period2001, annual average
Geographic extentExcludes Norway and Sweden
Statistics (ug/m3)Min 0.45Max 139.06Mean 12.47Std dev 5.64
Modeling methodBayesian Hierarchical Modelling
Model
0 150 300 450 60075Kilometers
«
APMoSPHEREAPMoSPHERE
0 150 300 450 60075
Kilometers
«
APMoSPHEREAPMoSPHERE
PollutantNO2
ScaleComposite of global, rural and urbanbackground
Time period2001, annual average
Geographic extentExcludes Norway and Sweden
Statistics (ug/m3)Min 1.66Max 287.36Mean 19.19Std dev 9.04
Length of 95% credible interval
ValidationValidation Performed at each scalePerformed at each scale (global, rural, urban)(global, rural, urban)
RSME, MAbsE, RRSME, MAbsE, R22, etc…, etc…
Best results for NOBest results for NO22, PM, PM1010 and Oand O33
Best results for urbanBest results for urban scale (relationships scale (relationships with covariates)with covariates)
exception of Oexception of O33
0 10 20 30 40 50
010
2030
4050
NO2 global
Predicted
Obs
erve
d
0 10 20 30 40 50
010
2030
4050
NO2 rural
Predicted
Obs
erve
d
0 10 20 30 40 50
010
2030
4050
NO2 urban
Predicted
Obs
erve
d
SummarySummary Applied spatial-temporal model to ca. 200 sites Applied spatial-temporal model to ca. 200 sites
measuring SOmeasuring SO22 in UK (1997-2001). in UK (1997-2001). Assessed proportions of spatial, temporal and random Assessed proportions of spatial, temporal and random
variationvariation
Applied spatial model to entire EUApplied spatial model to entire EU
Produced predicted levels at 1km resolution for Produced predicted levels at 1km resolution for different scalesdifferent scales
Produced composite maps with measures of Produced composite maps with measures of uncertaintyuncertainty
Future work/considerationsFuture work/considerations Combined spatial modelsCombined spatial models
different site types modelling different site types modelling simultaneously simultaneously
Computational aspectsComputational aspects Estimation and (joint) predictionEstimation and (joint) prediction Sensitivity analysis (to priors)Sensitivity analysis (to priors) Conditional modellingConditional modelling
Neighbouring sitesNeighbouring sites Other pollutantsOther pollutants
multi-pollutant modelsmulti-pollutant models
More information on More information on APMoSPHEREAPMoSPHERE
http://www.apmosphere.orghttp://www.apmosphere.org
Alternative approach – Alternative approach – conditional modellingconditional modelling
Problems handling large spatial matrices Problems handling large spatial matrices at such a high resolution.at such a high resolution.
Define sites as having ‘neighbours’ (may Define sites as having ‘neighbours’ (may include distance cut-off).include distance cut-off).
Allows feasibility of different resolutions to Allows feasibility of different resolutions to be examined.be examined.
Can be much, much faster!Can be much, much faster! Prediction and estimation may performed Prediction and estimation may performed
together during the MCMC.together during the MCMC.
Conditional modelConditional model YYss ~ N(S ~ N(Sss,v,v)) SSss = = ββ + W + Wss WWss ~ N( ~ N(ρρ ΣΣi in i in δδs s WWss/n/nss, n, nssττ))
Where Where ΣΣi in i in δδs s WWss/n/nss is the average of the is the average of the neighbours of point neighbours of point s. s.
The number of points that constitute The number of points that constitute the neighbourhood can be variedthe neighbourhood can be varied
A 100km resolution structure A 100km resolution structure with 10 neighbourswith 10 neighbours
372 unknown points
Predicted SO2Predicted SO2
-20 -10 0 10 20
-10
010
20
Predicted Median SO2
X Coord
Y C
oord
4.06 - 5.855.85 - 6.176.17 - 6.256.25 - 6.76.7 - 6.976.97 - 7.247.24 - 7.657.65 - 8.078.07 - 9.049.04 - 11.97
VariabilityVariability
-20 -10 0 10 20
-10
010
20
Length of the 95% Pred. interval
X Coord
Y C
oord
2.88 - 3.343.34 - 3.673.67 - 4.024.02 - 4.324.32 - 4.544.54 - 4.734.73 - 5.035.03 - 5.955.95 - 7.367.36 - 15.66
Higher resolutionsHigher resolutions• Example of 50km resolution
418 known 1469 unknown points
Computational aspectsComputational aspects 100,000 iterations with ca. 400 sites100,000 iterations with ca. 400 sites
Joint model – 5 daysJoint model – 5 days Conditional model – 30 minutesConditional model – 30 minutes
Using 2.5GHZ PC with 1GB RAMUsing 2.5GHZ PC with 1GB RAM Using conditional model with observed and Using conditional model with observed and
prediction points together at 20kmprediction points together at 20km 1 day (1000 iterations – 15 minutes)1 day (1000 iterations – 15 minutes)
Higher resolutions computationally Higher resolutions computationally feasible (but problems writing the file!)feasible (but problems writing the file!)
top related