quantile regression as a means of calibrating and verifying a mesoscale nwp ensemble tom hopson 1...

39
Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1 , Yubao Liu 1 , Gregory Roux 1 , Wanli Wu 1 , Jason Knievel 1 , Tom Warner 1 , Scott Swerdlin 1 , John Pace 2 , Scott Halvorson 2 2 U.S. Army Test and Evaluation Comma 1

Upload: beryl-gray

Post on 17-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble

Tom Hopson1

Josh Hacker1, Yubao Liu1, Gregory Roux1, Wanli Wu1, Jason Knievel1, Tom Warner1, Scott Swerdlin1,

John Pace2, Scott Halvorson2

2U.S. Army Test and Evaluation Command

1

Page 2: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

OutlineI. Motivation: ensemble forecasting and post-

processingII. E-RTFDDA for Dugway Proving GroundsIII. Introduce Quantile Regression (QR; Kroenker

and Bassett, 1978)III. Post-processing procedureIV. Verification resultsV. Warning: dynamically finding ensemble

dispersion at risk ensemble mean utility VI. Conclusions

Page 3: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Goals of an EPS

• Predict the observed distribution of events and atmospheric states

• Predict uncertainty in the day’s prediction• Predict the extreme events that are possible on a

particular day• Provide a range of possible scenarios for a

particular forecast

Page 4: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

1. Greater accuracy of ensemble mean forecast (half the error variance of single forecast)

2. Likelihood of extremes3. Non-Gaussian forecast PDF’s4. Ensemble spread as a representation of forecast

uncertainty=> All rely on forecasts being calibrated

Further … -- Argue calibration essential for tailoring to local application:

NWP provides spatially- and temporally-averaged gridded forecast output

-- Applying gridded forecasts to point locations requires location specific calibration to account for local spatial- and temporal-scales of variability ( => increasing ensemble dispersion)

More technically …

Page 5: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Dugway Proving Grounds, Utah e.g. T Thresholds

• Includes random and systematic differences between members.

• Not an actual chance of exceedance unless calibrated.

Page 6: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Challenges in probabilistic mesoscale prediction

• Model formulation• Bias (marginal and conditional)• Lack of variability caused by truncation and approximation• Non-universality of closure and forcing

• Initial conditions• Small-scales are damped in analysis systems, and the model must

develop them• Perturbation methods designed for medium-range systems may not be

appropriate• Lateral boundary conditions

• After short time periods the lateral boundary conditions can dominate• Representing uncertainty in lateral boundary conditions is critical

• Lower boundary conditions• Dominate boundary-layer response• Difficult to estimate uncertainty in lower boundary conditions

Page 7: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

RTFDDA and Ensemble-RTFDDA

Liu et al. 2010 AMS Annual Meeting, 14th IOAS-AOLS, Atlanta, GA. January 18 – 23, [email protected]

Page 8: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

The Ensemble Execution Module

Perturbations

observations

Member 1

Perturbations

observations

Member 2

Perturbations

observations

Member 3

Perturbations

observations

Member N

36-48h

fcsts

36-48h

fcsts

36-48h

fcsts

36-48h

fcsts

Input to decision support

tools

Postprocessing

Archiving and verification

RTFDDA

RTFDDA

RTFDDA

RTFDDA

Liu et al. 2010 AMS Annual Meeting, 14th IOAS-AOLS, Atlanta, GA. January 18 – 23, [email protected]

Page 9: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Operated at US Army DPG

since Sep. 2007

D1

D2

D3

Surface and X-sections – Mean, Spread, Exceedance Probability, Spaghetti, …

Likelihood for SPD > 10m/s

Mean T & Wind

T Mean and SD

Wind Speed

T-2m

Wind Rose

Pin-point Surface and Profiles – Mean, Spread, Exceedance probability, spaghetti, Wind roses, Histograms …

Real-time Operational Products for DPG

Page 10: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Forecast “calibration” or “post-processing”Pr

obab

ility

calibration

Flow rate [m3/s]

Prob

abili

ty

Post-processing has corrected:• the “on average” bias• as well as under-representation of the 2nd moment of the empirical forecast PDF (i.e. corrected its “dispersion” or “spread”)

“spread” or “dispersion”

“bias”obs

obs

ForecastPDF

ForecastPDF

Flow rate [m3/s]

Our approach:• under-utilized “quantile regression” approach• probability distribution function “means what it says”• daily variation in the ensemble dispersion directly relate to changes in forecast skill => informative ensemble skill-spread relationship

Page 11: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Example of Quantile Regression (QR)

Our application

Fitting T quantiles using QR conditioned on:

1) Ranked forecast ens

2) ensemble mean

3) ensemble median

4) ensemble stdev

5) Persistence

Page 12: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

T [K

]

Timeforecastsobserved

Regressor set: 1. reforecast ens2. ens mean3. ens stdev 4. persistence 5. LR quantile (not shown)

Prob

abili

ty/°

K

Temperature [K]

climatologicalPDF

Step I: Determineclimatological quantiles

Step 2: For each quan, use “forward step-wisecross-validation” to iteratively select best subsetSelection requirements: a) QR cost function minimum, b) Satisfy binomial distribution at 95% confidenceIf requirements not met, retain climatological “prior”

1.

3.2.

4.

Step 3: segregate forecasts into differing ranges of ensemble dispersion and refit models (Step 2) uniquely for each range

Time

forecasts

T [K

]

I. II. III. II. I.Pr

obab

ility

/°K

Temperature [K]

ForecastPDF

prior

posterior

Final result: “sharper” posterior PDFrepresented by interpolated quans

Page 13: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Measures Used:1) Rank histogram (converted to scalar measure)2) Root Mean square error (RMSE)3) Brier score4) Rank Probability Score (RPS)5) Relative Operating Characteristic (ROC) curve6) New measure of ensemble skill-spread utility

=> Using these for automated calibration model selection by using weighted sum of skill scores of each

Utilizing Verification measures near-real-time …

Page 14: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Problems with Spread-Skill Correlation … ECMWF spread-skill

(black) correlation << 1

Even “perfect model” (blue) correlation << 1 and varies with forecast lead-time

ECMWFr = 0.33“Perfect”r = 0.68

ECMWFr =“Perfect”r = 0.56

ECMWFr = 0.39“Perfect”r = 0.53

ECMWFr = 0.36“Perfect”r = 0.49

1 day

7 day

4 day

10 day

Page 15: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

National Security Applications Program Research Applications Laboratory

3-hr dewpoint time seriesBefore Calibration After Calibration

Station DPG S01

Page 16: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

42-hr dewpoint time seriesBefore Calibration After Calibration

Station DPG S01

Page 17: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

obs

Blue is “raw” ensembleBlack is calibrated ensembleRed is the observed value

Notice: significant change in both “bias” and dispersion of final PDF

(also notice PDF asymmetries)

PDFs: raw vs. calibrated

Page 18: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

National Security Applications Program Research Applications Laboratory

3-hr dewpoint rank histogramsStation DPG S01

Page 19: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

National Security Applications Program Research Applications Laboratory

Station DPG S01

42-hr dewpoint rank histograms

Page 20: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Skill Scores

• Single value to summarize performance.• Reference forecast - best naive guess;

persistence, climatology• A perfect forecast implies that the object

can be perfectly observed• Positively oriented – Positive is good

SS =Aforc −Aref

Aperf −Aref

Page 21: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

National Security Applications Program Research Applications Laboratory

Skill Score VerificationRMSE Skill Score CRPS Skill Score

Reference Forecasts:Black -- raw ensembleBlue -- persistence

Page 22: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Computational Resource Questions:

How best to utilize a multi-model simulations (forecast), especially if under-dispersive?

a) Should more dynamical variability be searched for? Orb) Is it better to balance post-processing with multi-model

utilization to create a properly dispersive, informative ensemble?

Page 23: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

National Security Applications Program Research Applications Laboratory

3-hr dewpoint rank histogramsStation DPG S01

Page 24: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

National Security Applications Program Research Applications Laboratory

RMSE of ensemble members

3hr Lead-time 42hr Lead-time

Station DPG S01

Page 25: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

National Security Applications Program Research Applications Laboratory

Significant calibration regressors

3hr Lead-time 42hr Lead-time

Station DPG S01

Page 26: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Questions revisited:How best to utilize a multi-model simulations (forecast),

especially if under-dispersive?

a) Should more dynamical variability be searched for? Orb) Is it better to balance post-processing with multi-model

utilization to create a properly dispersive, informative ensemble?

Warning: adding more models can lead to decreasing utility of the ensemble mean (even if the ensemble is under-dispersive)

Page 27: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Summary Quantile regression provides a powerful framework for improving the whole (potentially non-gaussian) PDF of an ensemble forecast – different regressors for different quantiles and lead-times

This framework provides an umbrella to blend together multiple statistical correction approaches (logistic regression, etc., not shown) as well as multiple regressors

As well, “step-wise cross-validation” based calibration provides a method to ensure forecast skill no worse than climatological and persistence for a variety of cost functions

As shown here, significant improvements made to the forecast’s ability to represent its own potential forecast error (while improving sharpness):

– uniform rank histogram– significant spread-skill relationship (new skill-spread measure)

Care should be used before “throwing more models” at an “under-dispersive” forecast problem

Further questions: [email protected] or [email protected]

Page 28: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu
Page 29: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Dugway Proving Ground

Page 30: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu
Page 31: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

other options …Assign dispersion bins, then:

2) Average the error values in each bin, then correlate

3) Calculate individual rank histograms for each bin, convert to a scalar measure

Page 32: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu
Page 33: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Example: French Broad RiverBefore Calibration => underdispersive

Black curve shows observations; colors are ensemble

Page 34: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Rank Histogram Comparisons

After quantile regression, rank histogram more uniform(although now slightly over-dispersive)

Raw full ensemble After calibration

Page 35: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Frequency Used forQuantile Fitting of Method I:

Best Model=76%Ensemble StDev=13%Ensemble Mean=0%Ranked Ensemble=6%

What Nash-Sutcliffe (RMSE) implies about Utility

Page 36: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Note:

Take home message:

For a “calibrated ensemble”, error variance of the ensemble mean is 1/2 the error variance of any ensemble member (on average), independent of the distribution being sampled

Prob

abili

ty

obsForecastPDF

Discharge

i=ensembleaverage

( fi −o)2iversus ( f −o)2

i

Simplifying

eq1 : fi2 −2of + o2

eq2 : f 2 −2of + o2

o : fj ⇒ j

eq1 : 2 f 2 − f 2( )

eq2 : f 2 − f 2

⇒ eq1=2 eq2

Page 37: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Sequentially-averaged models (ranked based on NS Score) and their resultant NS Score

=> Notice the degredation of NS with increasing # (with a peak at 2 models)

=> For an equitable multi-model, NS should rise monotonically

=> Maybe a smaller subset of models would have more utility? (A contradiction for an under-dispersive ensemble?)

What Nash-Sutcliffe (RMSE) implies about Utility (cont)

-- degredation with increased ensemble size

Page 38: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Initial Frequency Used forQuantile Fitting:

Best Model=76%Ensemble StDev=13%Ensemble Mean=0%Ranked Ensemble=6%

What Nash-Sutcliffe implies about Utility (cont)

Reduced Set Frequency Used for Quantile Fitting:

Best Model=73%Ensemble StDev=3%Ensemble Mean=32%Ranked Ensemble=29%

…using only top 1/3 of modelsTo rank and form ensemble mean …… earlier results …

=> Appears to be significant gains in the utility of the ensemble after “filtering” (except for drop in StDev) … however “proof is in the pudding” …=> Examine verification skill measures …

Page 39: Quantile regression as a means of calibrating and verifying a mesoscale NWP ensemble Tom Hopson 1 Josh Hacker 1, Yubao Liu 1, Gregory Roux 1, Wanli Wu

Skill Score Comparisonsbetween full- and “filtered” ensemble sets

Points:

-- quite similar results for a variety of skill scores-- both approaches give appreciable benefit over the original raw multi-model output-- however, only in the CRPSS is there improvement of the “filtered” ensemble set over the full set

=> post-processing method fairly robust=> More work (more filtering?)!

GREEN -- full calibrated multi-modelBLUE -- “filtered” calibrated multi-modelReference – uncalibrated set