verification and diagnoses of ensemble qpf forecasts during … · dtc/hmt collaboration goals...

Verification and Diagnoses of Ensemble QPF Forecasts during Extreme Events in California during the

HMT Winter Exercises

Edward Tollerud1,2, Tara Jensen1,3, John Halley Gotway1,3,, Paul Oldenburg1,3, Wally Clark2 , Tressa Fowler1,3 , Stanislav Stoytchev 1,2,

Barb Brown1,3, Ellen Sukovich2, Randy Bullock1,3 , and Isidora Jankov 4

1Developmental Testbed Center (DTC) 2Earth Systems Research Laboratory, NOAA, 3 Research Applications Laboratory, NCAR,

4 CIRA/Colorado State University

3rd NOAA Testbeds and Proving Ground Workshop, 1-3 May 2012, Boulder, CO

2

DTC/HMT Collaboration Goals

  Evaluation and Diagnoses for HMT-West Ensemble Forecasts of Extreme Precipitation Events (e.g., real-time web product for HMT)

  Motivate, Develop, and Evaluate new verification strategies (MET, MODE, and METViewer in particular; e.g., roc, auc, rank histogram, performance diagram,…)

 Assess Model and Verification Configuration Options (Resolution, Initialization, Domain, Event Selection, etc.)

 Inter-compare Forecasting Systems in high-precipitation scenarios, including storm-scale research and EMC operational models

 Assess Impacts of Verification dataset selection (analyses, point obs, etc.) – not covered here

*MODE, Neighborhood, etc

MODELS

OBS

REGIONS

DTC Model

Evaluation Tools (MET)

Traditional Statistics

Output

Spatial* Statistics

Output

Web

Testbed Collaboration Methodology

MET is a set of NWP evaluation tools developed by the Developmental Testbed Center (DTC) to help them assess and evaluate the skill of their model predictions. It is free to download and there is a helpdesk available.

4

ESRL/GSD and HMT Ensemble Modeling System

  WRF model 8-member ensemble; 1 control

  Outer domain 9km; Nested domain 3 km

  Hybrid members: Multi physics packages, two model cores, and different GFS initial conditions

  Model runs to 5 day lead time; DTC evaluated first 72 hours

  DTC built demonstration real-time web display

 Evaluation focus on QPF with addition of state variables in 2011

5

HRRR (3km) HMT-Ens Mean (9km) NMM-B parallel (4km)

NAM (12km) GFS (0.5 deg)

Model Intercomparison for 2010-2011 HMT-West

6 12 18 24 30 36 42 48 54 60 66 72

Gil

bert

Ski

ll S

core

(or

ET

S)

6hr accum. > 1.0”

6

Relationships among scores •  CSI is a nonlinear function of POD and FAR •  CSI depends on base rate (event frequency) and Bias

1CSI 1 1 1POD 1 FAR

=+ −

−

POD

CSI

Very different combinations of FAR and POD lead to the same CSI value

PODBias1 FAR

=−

Freq Bias

9km - Ensemble Mean – 6h Precip

6HR>0.1 in.

6HR>1.0 in.

6HR>2.0 in.

HMT Performance Diagram

All on same plot

  POD   1-FAR (aka Success Ratio)   CSI   Freq Bias Dots: Scores Aggregated Over Lead Time

Colors: Different Thresholds

Best

Success Ratio (1-FAR)

Freq

Bia

s

Equal lines of CSI

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Here we see: • Decreasing skill with higher thresholds even with multiple metrics • Highest skill at 18-24h leads

Roberts et al. (2011), Roebber (WAF, 2009), Wilson (presentation, 2008)

Equal lines of Freq. Bias

Hook pattern suggests degraded skill at those leads

More Optimal

06 12

18 24 30

36 42 48 54

60 66

72 6HR>0.5 in.

Impact of Microphysics on 2010-2011 Results

Ferrier

Schultz

Thompson

  No systematic microphysics impacts last season

  Performance diagrams similar

  Total Intensity distributions similar for most HMT members

  90% Intensity show some differences, especially at higher thresholds

  HMT Ens Mean does not have same performance as ind. members

T

Impact of Microphysics on 2010-2011

  No systematic microphysics impacts last season

  Performance diagrams similar

  Total Intensity distributions similar for most HMT members

  90% Intensity show some differences, especially at higher thresholds

  HMT Ens Mean does not have same performance as ind. members

Dif

fere

nce

(F

cst-

Ob

s) i

n

To

tal

Inte

nsi

ty i

n M

OD

E O

bj

(mm

)

Object Threshold (mm)

NMM ARW HMT

NAM GFS

Using Attributes from MODE Objects

Color groups Different Microphysics

Ferrier Schultz

Thompson

Broad Precip Area Intense Cores

> 6.35 > 25.4

Dif

fere

nce

(F

cst-

Ob

s) i

n

90

% I

nte

nsi

ty i

n M

OD

E O

bj

(mm

)

Ensemble Reliabilty 10

PROB(APCP_06)>1 inch PROB(APCP_06)>2 inch

11

Valuable Insights and Lessons Learned: Some gained, Some still in process

 Resolution improves performance

 Scores for ensemble means are generally different from the mean score of the ensemble members – understanding how to “ensemble” scores is an area of research

 Model Core - Microphysical Impacts -Initialization impacts all need more investigation but we are now have a more effective set of tools to do this

 Performance diagrams may be helpful in diagnosing model performance problems

12

Year 3 (2011-2012) Season Emphasis

 Continued evaluation of QPF  Expansion to state variable (T, SPFH, U/V, HGT) and

critical moisture variables for HMT (IWV, Freezing Level)  Inclusion of AFWA Ensemble (at the request of EMC) –

Thanks to Evan Kuchera and Scott Rentsler  Just finished final evaluation runs of season (yesterday)

 Will be presenting results at:

WAF/NWP – CMOS conference at end of May WRF Users Workshop – end of June

Thanks for your attention 13

 Thanks to the DTC collaborators: ESRL/GSD, ESRL/PSD, EMC, and AFWA

  This DTC/HMT work was funded by USWRP

For more information

  Edward Tollerud ([email protected])

  Tara Jensen ([email protected])

  Brian Etherton ([email protected])   http:/www.dtcenter.org/eval/hmt

14

Impact of Domain on 2010-2011 Results

9kmLand

9kmNest

3km

Best

9kmLand 9kmNest

0.1” 3km

9kmLand

9kmNest

0.5”

3km

9kmLand

9kmNest

1.0”

3km Eval of 9km domain over Nest footprint (9kmNest) appears to have greater skill at short leads 3km domain has more skillful Performance Diagrams at 6-12 hr leads

Impact of Model Cores 0n 2010-2011 Results

ARW 0.1” (solid)

NMM 0.1” (dash)

ARW 0.5” (solid)

ARW 1.0” (solid)

NMM 0.5” (dash)

NMM 1.0” (dash)

NMM does not seem to experience a 6-12 hour degraded skill but ARW generally does

Area under ROC Curve 16

verification and diagnoses of ensemble qpf forecasts during … · dtc/hmt collaboration goals...

Documents