bayesian hierarchical modelling for data assimilation of past...

27
Bayesian hierarchical modelling for data assimilation of past observations and numerical model forecasts Stan Yip Exeter Climate Systems, University of Exeter [email protected] Joint work with Sujit Sahu in University of Southampton MPI-PKS, Dresden, 31st July 2009 Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 1 / 27

Upload: others

Post on 25-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Bayesian hierarchical modelling for dataassimilation of past observations and numerical

model forecasts

Stan Yip

Exeter Climate Systems, University of [email protected]

Joint work with Sujit Sahu in University of Southampton

MPI-PKS, Dresden, 31st July 2009

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 1 / 27

Page 2: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Motivation

Fusing ground level ozone concentration observations withcomputer deterministic model output.

Improving biased forecast.

Capturing spatio-temporal variation.

Quantifying uncertainty through Bayesian probabilistic forecast.

Producing high resolution maps.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 2 / 27

Page 3: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

ForecastEPA’s www.airnow.gov website

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 3 / 27

Page 4: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Ground Level Ozone

Ground level ozone: bad health effects: primarily respiratory, lungfunction, coughing, throat irritation, congestion, bronchitis,emphysema, asthma.

Ozone is a secondary pollutant.

VOC’s (Volatile Organic Compounds) - organic gases but really“chemicals that participate in the formation of ozone.”

Sunlight + VOC + NOx = Ozone.

Meteorological conditions - sunlight, high temperature (soprimarily from April to September), wind direction and wind speed.High spatial-temporal correlation.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 4 / 27

Page 5: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Observations

409 spatial point locations arein the area.

Recorded hourly.

Measured by unattendedphotometers.

About 20 percent data ismissing over 15 days.

Sparse data.-90°

-80°

-80°

-70°

30

°

30

°

40

°

40

°

0 375 750 1,125 1,500187.5Kilometers

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 5 / 27

Page 6: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

CMAQ modelling system - Computer model output

National Oceanic and Atmospheric Administration (NOAA) havedesigned the Community Multi-scale Air Quality (CMAQ)modelling system.

The model is used by Environmental Protection Agency (EPA)

CMAQ consists of a set of deterministic physical models from firstprinciple.

The forecasts are biased.

Computer model outputs are in grid cell, but in the real siuation,we want point location prediction.

Uncertainty has not been taken into account.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 6 / 27

Page 7: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

CMAQ

time/hour

ozon

e co

ncen

trat

ion/

ppb

0 50 100 150 200 250 300

020

4060

8010

0

Location in NY State, MSE = 299

CMAQOzone

time/hour

ozon

e co

ncen

trat

ion/

ppb

0 50 100 150 200 250 300

020

4060

8010

0

Location in MD State, MSE = 754

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 7 / 27

Page 8: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

CMAQ modulesChing and Byun,1999

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 8 / 27

Page 9: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

CMAQ modules

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 9 / 27

Page 10: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

ProblemDaily 8-hour maximum Prediction

8-hour average ozone concentration is an important indicator forenvironmental monitoring.

Measuring the daily 8-hour average maximum ozoneconcentration is required by the law.

One day ahead 8-hour average maximum ozone concentration atan arbitrary location is needed.

High resolution map can be produced from the prediction outputs.

Obtaining forecasts within few hours.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 10 / 27

Page 11: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Work done by others

Fuentes and Raftery (2005) combine the computer model andobservation by joint multivariate normal distribution.

Zimmerman and Holland (2005) use different data sources withdifferent measurement error and bias.

Jun and Stein (2004) compare the correlation structure ofcomputer model and observations.

None of them deal with space-time forecast at the same time.

The measurement is not ground truth.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 11 / 27

Page 12: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Why do we adopt Bayesian approach?

Probabilistic forecast addresses the uncertainty throughdistribution (pdf).

Modelling becomes more flexible.

Linear regression model doesn’t work here, it cannot capturespatial correlation.

The approach distinguish "ground truth", "measurement" and"biased forecast".

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 12 / 27

Page 13: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Model StructureHistorical Data Forecasts

Observation

Ground Truth

CMAQ

Z (s, t)

O(s, t)

x(s, t)

O(s, t − 1) O(s, t + 1)

x(s, t + 1)

ǫ(s, t)6

6 6

- -

Measurement Equation: Z (si , t) = O(si , t) + ǫ(si , t)

System Equation: O(si , t) = ξt + ρ O(si , t − 1) + β0 x(si , t) + η(si , t)

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 13 / 27

Page 14: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Model SpecificationHistorical Data Forecasts

Observation

Ground Truth

CMAQ

Z (s, t)

O(s, t)

x(s, t)

O(s, t − 1) O(s, t + 1)

x(s, t + 1)

ǫ(s, t)6

6 6

- -

Measurement Equation: Z (si , t) ∼ N(O(si , t), σ2ǫ),

System Equation: O(t) ∼ N(ξt + ρ O(t − 1) + β0 x(t), σ2ωΣ),

where O(t) = (O(s1, t), . . . , O(sn, t))′,x(t) = (x(s1, t), . . . , x(sn, t))′.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 14 / 27

Page 15: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

How do we forecast?Posterior Predictive Distribution

The posterior predictive distribution of Z (s′, t ′) is obtained byintegrating over the unknown quantities with respect to the jointposterior distribution, i.e.,

π (Z (s′, t ′)|z) =∫

π(

Z (s′, t ′)|O(s′, [t ′]), σ2ǫ

)

π (O(s′, [t ′])|θ, w)dO(s′, [t ′]) dθ dw .

It can be done by Monte Carlo integration in the Markov chain MonteCarlo routine.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 15 / 27

Page 16: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Prediction MapsThe 1-day ahead forecast surfaces on 11th Aug: Bayes and CMAQ

87

59

59

62

69

57

65

57

64

62

72

36

77

28

50

9787

6962

8782

60

Bayes forecast map for the following day: 11th Aug

30

50

70

90

87

59

59

62

69

57

65

57

64

62

72

36

77

28

50

9787

6962

8782

60

CMAQ forecast map for the following day: 11th Aug

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 16 / 27

Page 17: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Prediction MapsThe 1-day ahead forecast surfaces on 11th Aug: Bayes and its uncertainty

30

50

70

90

Bayes forecast map for the following day: 11th Aug

30507090

110130

Length of 95% predictive interval for the following day: 11th Aug

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 17 / 27

Page 18: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Prediction QualityComparison of root mean square error (ppb) (MSE) and relative bias (ppb) (rBIAS)

RMSE rBIASValidation Days CMAQ Bayes CMAQ BayesAug 2–9 15.15 7.47 0.1588 -0.0042Aug 3–10 15.70 7.20 0.1687 -0.0070Aug 4–11 16.14 8.03 0.1732 -0.0174Aug 5–12 15.92 7.51 0.1728 -0.0215Aug 6–13 15.51 6.53 0.1724 -0.0083

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 18 / 27

Page 19: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Validation PlotValidation plot for one day ahead forecast on 11th Aug

Observation/ppb

For

ecas

t/ppb

20 40 60 80 100

4060

8010

0

CMAQBayes

Validation plot of one day ahead prediction on 11th

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 19 / 27

Page 20: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Hit and error percentages for O3 exceeding 80 ppb.

Period CMAQ Hit Error Bayes Hit ErrorAug 2-9 84.76 15.24 95.12 4.88

Aug 3-10 82.20 17.80 94.24 5.76Aug 4-11 82.05 17.95 94.36 5.64Aug 5-12 84.78 15.22 94.92 5.08Aug 6-13 83.92 16.08 93.97 6.03

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 20 / 27

Page 21: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Extreme Value Theory Extention

Not accurate to predict high values (> 80ppb).

Non-normal distribution.

1 Measurement Equation: Z (s, t) ∼ GEV (µ(s, t), σg , ν).2 Second Equation: µ(s, t) = O(s, t) + ǫ(s, t).3 System Equation:

O(si , t) = ξt + ρ O(si , t − 1) + β0 x(si , t) + η(si , t).

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 21 / 27

Page 22: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Validation of the upper tail on Aug 13th

Observation/ppb

For

ecas

ts/p

pb

70 75 80 85 90 95

6070

8090

EVTDLMDLM(1)

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 22 / 27

Page 23: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

RMSE of the upper tail on Aug 13th.

Observed Value DLM(1) EVTDLMAll 6.94 7.37> 50 7.26 7.30> 60 7.64 7.61> 70 8.59 8.28> 80 10.53 9.45

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 23 / 27

Page 24: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Conclusion

The forecast is consistent, more accurate, faster than runninganother computer model.

Maps of probability statement could be produced.

The approach is general. We also forecast hourly data under thesame framework.

C language code is developed and a simplifed version S-pluspackage for a faster hourly model has been developed.

Future work will focus on using monitoring data from different datasources.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 24 / 27

Page 25: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Future Work

Modelling the whole USA is also needed.

Using other non-normal distributions.

Other types of spatial correlation structure could be used.

The speed of forecast could be further improved which is atrade-off between accuracy and time.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 25 / 27

Page 26: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

Acknowledgements

EPSRC Doctoral Training Account in University of Southampton.

Data provided by Dave Holland in USEPA.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 26 / 27

Page 27: Bayesian hierarchical modelling for data assimilation of past ...dswc09/CONTRIBUTIONS/yip_talk.pdf · Stan Yip Exeter Climate Systems, University of Exeter c.y.yip@ex.ac.uk Joint

References

Fuentes and Raftery (2005) Model evaluation and spatial interpolation byBayesian combination of observations with outputs from numerical models.Biometrics, 61 (1).Zimmerman and Holland (2005) Complementary co-kriging: spatialprediction using data combined from several environmental monitoringnetworks. Environmetrics, 16.Jun and Stein (2004) Statistical comparison of observed and CMAQmodeled daily sulfate levels. Atmospheric Environment, 38.Sahu, Yip and Holland (2009) Improved space-time prediction of dailyozone concentration levels in the eastern U.S. Atmospheric Environment, 43.Sahu, Yip and Holland (2008) A fast Bayesian method for updating andforecasting hourly ozone levels. Univesity of Southampton , Technical Report.Harrison and West (1997) Bayesian Forecasting and Dynamic Models.Springer.

Stan CY Yip (University of Exeter) Improved space-time Bayesian forecasting 31st July 2009 27 / 27