verification and diagnoses of ensemble qpf forecasts during … · dtc/hmt collaboration goals...
TRANSCRIPT
Verification and Diagnoses of Ensemble QPF Forecasts during Extreme Events in California during the
HMT Winter Exercises
Edward Tollerud1,2, Tara Jensen1,3, John Halley Gotway1,3,, Paul Oldenburg1,3, Wally Clark2 , Tressa Fowler1,3 , Stanislav Stoytchev 1,2,
Barb Brown1,3, Ellen Sukovich2, Randy Bullock1,3 , and Isidora Jankov 4
1Developmental Testbed Center (DTC) 2Earth Systems Research Laboratory, NOAA, 3 Research Applications Laboratory, NCAR,
4 CIRA/Colorado State University
3rd NOAA Testbeds and Proving Ground Workshop, 1-3 May 2012, Boulder, CO
2
DTC/HMT Collaboration Goals
Evaluation and Diagnoses for HMT-West Ensemble Forecasts of Extreme Precipitation Events (e.g., real-time web product for HMT)
Motivate, Develop, and Evaluate new verification strategies (MET, MODE, and METViewer in particular; e.g., roc, auc, rank histogram, performance diagram,…)
Assess Model and Verification Configuration Options (Resolution, Initialization, Domain, Event Selection, etc.)
Inter-compare Forecasting Systems in high-precipitation scenarios, including storm-scale research and EMC operational models
Assess Impacts of Verification dataset selection (analyses, point obs, etc.) – not covered here
*MODE, Neighborhood, etc
MODELS
OBS
REGIONS
DTC Model
Evaluation Tools (MET)
Traditional Statistics
Output
Spatial* Statistics
Output
Web
Testbed Collaboration Methodology
MET is a set of NWP evaluation tools developed by the Developmental Testbed Center (DTC) to help them assess and evaluate the skill of their model predictions. It is free to download and there is a helpdesk available.
4
ESRL/GSD and HMT Ensemble Modeling System
WRF model 8-member ensemble; 1 control
Outer domain 9km; Nested domain 3 km
Hybrid members: Multi physics packages, two model cores, and different GFS initial conditions
Model runs to 5 day lead time; DTC evaluated first 72 hours
DTC built demonstration real-time web display
Evaluation focus on QPF with addition of state variables in 2011
5
HRRR (3km) HMT-Ens Mean (9km) NMM-B parallel (4km)
NAM (12km) GFS (0.5 deg)
Model Intercomparison for 2010-2011 HMT-West
6 12 18 24 30 36 42 48 54 60 66 72
Gil
bert
Ski
ll S
core
(or
ET
S)
6hr accum. > 1.0”
6
Relationships among scores • CSI is a nonlinear function of POD and FAR • CSI depends on base rate (event frequency) and Bias
1CSI 1 1 1POD 1 FAR
=+ −
−
POD
CSI
Very different combinations of FAR and POD lead to the same CSI value
PODBias1 FAR
=−
Freq Bias
9km - Ensemble Mean – 6h Precip
6HR>0.1 in.
6HR>1.0 in.
6HR>2.0 in.
HMT Performance Diagram
All on same plot
POD 1-FAR (aka Success Ratio) CSI Freq Bias Dots: Scores Aggregated Over Lead Time
Colors: Different Thresholds
Best
Success Ratio (1-FAR)
Freq
Bia
s
Equal lines of CSI
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Here we see: • Decreasing skill with higher thresholds even with multiple metrics • Highest skill at 18-24h leads
Roberts et al. (2011), Roebber (WAF, 2009), Wilson (presentation, 2008)
Equal lines of Freq. Bias
Hook pattern suggests degraded skill at those leads
More Optimal
06 12
18 24 30
36 42 48 54
60 66
72 6HR>0.5 in.
Impact of Microphysics on 2010-2011 Results
Ferrier
Schultz
Thompson
No systematic microphysics impacts last season
Performance diagrams similar
Total Intensity distributions similar for most HMT members
90% Intensity show some differences, especially at higher thresholds
HMT Ens Mean does not have same performance as ind. members
T
Impact of Microphysics on 2010-2011
No systematic microphysics impacts last season
Performance diagrams similar
Total Intensity distributions similar for most HMT members
90% Intensity show some differences, especially at higher thresholds
HMT Ens Mean does not have same performance as ind. members
Dif
fere
nce
(F
cst-
Ob
s) i
n
To
tal
Inte
nsi
ty i
n M
OD
E O
bj
(mm
)
Object Threshold (mm)
NMM ARW HMT
NAM GFS
Using Attributes from MODE Objects
Color groups Different Microphysics
Ferrier Schultz
Thompson
Broad Precip Area Intense Cores
> 6.35 > 25.4
Dif
fere
nce
(F
cst-
Ob
s) i
n
90
% I
nte
nsi
ty i
n M
OD
E O
bj
(mm
)
Ensemble Reliabilty 10
PROB(APCP_06)>1 inch PROB(APCP_06)>2 inch
11
Valuable Insights and Lessons Learned: Some gained, Some still in process
Resolution improves performance
Scores for ensemble means are generally different from the mean score of the ensemble members – understanding how to “ensemble” scores is an area of research
Model Core - Microphysical Impacts -Initialization impacts all need more investigation but we are now have a more effective set of tools to do this
Performance diagrams may be helpful in diagnosing model performance problems
12
Year 3 (2011-2012) Season Emphasis
Continued evaluation of QPF Expansion to state variable (T, SPFH, U/V, HGT) and
critical moisture variables for HMT (IWV, Freezing Level) Inclusion of AFWA Ensemble (at the request of EMC) –
Thanks to Evan Kuchera and Scott Rentsler Just finished final evaluation runs of season (yesterday)
Will be presenting results at:
WAF/NWP – CMOS conference at end of May WRF Users Workshop – end of June
Thanks for your attention 13
Thanks to the DTC collaborators: ESRL/GSD, ESRL/PSD, EMC, and AFWA
This DTC/HMT work was funded by USWRP
For more information
Edward Tollerud ([email protected])
Tara Jensen ([email protected])
Brian Etherton ([email protected]) http:/www.dtcenter.org/eval/hmt
14
Impact of Domain on 2010-2011 Results
9kmLand
9kmNest
3km
Best
9kmLand 9kmNest
0.1” 3km
9kmLand
9kmNest
0.5”
3km
9kmLand
9kmNest
1.0”
3km Eval of 9km domain over Nest footprint (9kmNest) appears to have greater skill at short leads 3km domain has more skillful Performance Diagrams at 6-12 hr leads
Impact of Model Cores 0n 2010-2011 Results
ARW 0.1” (solid)
NMM 0.1” (dash)
ARW 0.5” (solid)
ARW 1.0” (solid)
NMM 0.5” (dash)
NMM 1.0” (dash)
NMM does not seem to experience a 6-12 hour degraded skill but ARW generally does
Area under ROC Curve 16