diagnostics and optimization of analysis by cross-validation · richard ménard and martin...

Diagnostics and optimization of analysis by cross-validation

Richard Ménard and Martin Deshaies-Jacques

Air Quality Research Division – Environment and Climate Change Canada

[email protected]

Environment and Climate Change Canada

Air Quality Research Division

Abstract. We examine how observations can be used to evaluate

an air quality analysis by verifying against passive observations

(i.e. cross-validation) that are not used to create the analysis and

we compare these verifications to those made against the same set

of (active) observations that were used to generate the analysis.

The results show that both active and passive observations can be

used to evaluate of first moment metrics (e.g. bias) but only

passive observations are useful to evaluate second moment metrics

such as variance of observed-minus-analysis and correlation

between observations and analysis. We derive a set of diagnostics

based on passive observation–minus-analysis residuals and we

show that the true analysis error variance can be estimated,

without relying on any statistical optimality assumption. This

diagnostic is used to obtain near optimal analyses that are then

used to evaluate the analysis error using several different methods.

We compare the estimates according to the method of

Hollingsworth Lonnberg, Desroziers, a diagnostic we introduce,

and the perceived analysis error computed from the analysis

scheme, to conclude that as long as the analysis is optimal, all

estimates agrees within a certain error margin. The analysis error

variance at passive observation sites is also obtained.

Cross-validation design

• Divide randomly the observations into 3 equal number sets

• Perform an analysis using 2 sets, and use the third one to validate the

analysis. Do 3 permutations of these sets, so that all observations

sites are used to validate the analysis

analysis with set 2 & 3 analysis with set 1 & 3 analysis with set 1 & 2

Surface aerosols – fine particle (PM2.5)

Modified Normalized Mean Bias

verification against excluded set – passive observations ● mean of the three verification subset ▪ verification against same observations – active observations

Note: with respect to passive observation minimum in variance with respect to active observation variance is monotonically deceasing with

22 / bo

Theory Assuming that observations errors are spatially uncorrelated and

uncorrelated with background errors then the passive observation

errors are uncorrelated with the cross—validation analysis error

Hilbert space representation of random variables

])][])([[(:, yyxxyx EEE

0)(),( TBTO

0)(,)( TBTO c

active obs error uncorrelated with background error

passive obs error uncorrelated with background error

0)(,)( TOTO c passive obs error uncorrelated with active obs error

0)(,)(planeanalysis)(thus ccc TATOTO

Tccccc AOAO AHHR ])()[(E

is true whether the analysis is optimal or not [3]

A is the analysis error covariance, Hc is the operator for the passive obs

Tcc

Tcccc AOAO HAHAHH ˆ}])()[( {min E

searching for the minimum gives the optimal analysis

In the case of optimal analysis There are a number of methods to estimate the analysis error covariance

using active observations

• Hollingsworth-Lönnberg [1]

since ΔOAT is a right triangle

• This study

since ΔBAT is a right triangle

• Desroziers [2]

since ΔOTA is similar to ΔTBA

• Analysis error covariance calculated by the analysis scheme –

perceived analysis error

analysis error variance for PM2.5

using passive observations

• This study

since ΔOcAB is also a right triangle

• General result in cross-validation space [3]

ˆ])ˆ)(ˆ[( THL

TAOAO HAHR E

ˆ])ˆ)(ˆ[( TMD

TBABA HAHHBHE

ˆ])ˆ)(ˆ[( TD

TBAAO HAHE

TTTTTP HBHRHBHHBHHBHHAH

~)

~~(

~~ ˆ 1

ˆ])ˆ()ˆ[( TcMDccc

Tcc BABA HAHBHH E

Tccccc AOAO HAHR ˆ}])()[( {min E

Experiment design

• Hourly analyses for PM2.5 and O3 for a period of 60 days

(June 14 to August 12, 2014)

Step 1 - First guess experiment

• Maximum likelihood estimation of Lc using second-order

autoregressive correlation model and error variances

obtained from local Hollingsworth-Lönnberg fit [4]

• Analysis performed over the same period of 60 days

o Using all observation “active”

o 3 cross-validation analyses using 2 subset out of the 3

with permutations

• Conduct a series of analysis with prescribed uniform error

variances with different ratio but such that

we have an innovation variance

consistency

Step 2 - Optimization

• Obtain the optimal by minimizing

that is minimizing the analysis error variance

• Reevaluate Lc using maximum-likelihood using the new

error variances

Step 3 - Optimal analysis experiment

• Re-conduct an optimization for

• As in Step 1 perform a series of analysis

22)]var[( boBO

22 / bo

22 /ˆbo

])var[( cAO

22 /ˆbo

Results of first guess experiment (iter0) and optimization (iter1)

Input error statistics

Estimation of analysis error variance at the passive observation sites

Estimation of active analysis error variance

Remark: Sensitivity analysis to lack of innovation covariance consistency of the different diagnostics shows

that the MD and D estimates are the least sensitive and the HL is the most sensitive

Experiment cL (km) 2)( BO 22 ˆ/ˆˆbo

2ˆo 2ˆ

b p/2

O3 iter 0 124 101.25 0.22 18.3 83 2.23

O3 iter 1 45 101.25 0.25 20.2 81 1.36

PM2.5 iter 0 196 93.93 0.17 13.6 80.3 2.04

PM2.5 iter 1 86 93.93 0.22 16.9 77 1.25

1

Experiment Active

)ˆ( TMDdiag HAH

Active

)ˆ( TDdiag HAH

Active

)ˆ( THLdiag HAH

Active

)ˆ( TPdiag HAH

O3 iter 0 22.69 9.61 -6.03 5.77

O3 iter 1 13.32 13.68 8.94 11.60

PM2.5 iter 0 17.98 7.71 -3.18 4.37

PM2.5 iter 1 10.68 9.51 7.33 8.21

Experiment Passive

)ˆ( TcMDcdiag HAH

Passive 2])ˆvar[( occAO

O3 iter 0 26.03 32.72

O3 iter 1 28.95 28.75

PM2.5 iter 0 22.65 24.49

PM2.5 iter 1 24.62 21.38

Summary and Conclusions

• We examine how passive observations can be used to evaluate analyses, estimate the analysis error

variance and optimize the analysis

• Assuming that observation errors are horizontally uncorrelated gives the following central diagnostic [3]

which is valid whether or not the analysis is optimal.

• By minimizing the central diagnostic we minimize the analysis error variance thus proving a means to

optimize the analysis

• Optimization of the ratio of observation error variance over background error variance, and re-

evaluating the correlation length give near optimal analyses with chi-square values closer to one

• We have introduce a new diagnostic for analysis error that can be applied in both active and passive

observation space

• We have compared different diagnostic of analysis error variance in active observation space; the

Hollingworth-Lonnberg [1], Desroziers [2], our diagnostic, and the computed analysis error provided

by the analysis scheme and showed that they roughly agrees for near optimal analyses. Strong

disagreement between the estimates is found when the analysis is not optimal.

• We have compared different diagnostic of analysis error variance in passive observation space; our

diagnostic and the central diagnostic and showed strong agreement in the esitmates when the analysis

is optimal

• The method introduced here is general and could be used in other geophysical applications and in

particular in surface analyses

Study has been submitted to Atmosphere Special Issue:Air Quality Forecasting and Monitoring

References 1. Hollingsworth, A. and P. Lönnberg. The verification of objective analyses: Diagnostics of analysis system performance. Meteorol. Atmos. Phys. 1989

2. Desroziers, G., L. Berre, B. Chapnik, and P. Poli. Diagnosis of observation-, background-, and analysis-error statistics in observation space. Q. J. Roy.

Meteorol. Soc. 2005, 131, 3385-3396

3. Marseille, G.-J., J. Barkmeijer, S. de Haan, and W. Verkley. Assessment and tuning of data assimilation systems using passive observations. Q. J. R.

Meteorol. Soc., 2016, 142, 3001-3014, DOI:10.1002/qj.2882.

4. Ménard, R., M. Deshaies-Jacques, and N. Gasset. A comparison of correlation-length estimation methods for the objective analysis of surface pollutants at

Environment and Climate Change Canada. Journal of the Air & Waste Management Association. 2016, 66:9, 874-895

Tccccc AOAO AHHR ])()[(E

ˆ])ˆ)(ˆ[( TMD

TBABA HAHHBHE

ˆ])ˆ()ˆ[( TcMDccc

Tcc BABA HAHBHH E

diagnostics and optimization of analysis by cross-validation · richard ménard and martin...

Documents