1 verification of operational probabilistic & ensemble forecasts zoltan toth environmental...

63
1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand (1) , Steve Lord, Geoff DiMego, John Huddleston (1) : Ecole Normale Superior and LMD, Paris, France http://wwwt.emc.ncep.noaa.gov/gmb/ens/index.html

Upload: darren-rich

Post on 18-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

1

VERIFICATION OF OPERATIONALPROBABILISTIC & ENSEMBLE FORECASTS

Zoltan Toth

Environmental Modeling CenterNOAA/NWS/NCEP

Ackn.: Yuejian Zhu, Olivier Talagrand (1) ,Steve Lord, Geoff DiMego, John Huddleston

(1) : Ecole Normale Superior and LMD, Paris, France

http://wwwt.emc.ncep.noaa.gov/gmb/ens/index.html

Page 2: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

2

OUTLINE / SUMMARY

• FORECAST OPERATIONS– 24/7 PROVISION OF FORECAST INFORMATION– CONTINUAL IMPROVEMENTS

• ATTRIBUTES OF FORECAST SYSTEMS– RELIABILITY Look like nature– RESOLUTION Ability to see into future

• GENERATION OF PROBABILISTIC FORECASTS– NUMERICAL FORECASTING Single or ensemble– IMPROVING RELIABILITY Statistical corrections

• VERIFICATION OF PROBABILSTIC & ENSEMBLE FORECASTS– UNIFIED PROBABILISTIC MEASURES Dimensionless– ENSEMBLE MEASURES Evaluate finite sample

• ROLE OF DTC– SHARE VERIFICATION ALGORITHMS

• Make operationally used algorithms available to research community• Facilitate transition of new measures for use by operational community

Page 3: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

3

FORECAST OPERATIONS

• Definition– Services related to information on future environmental conditions

• Production

• Delivery

• Main objectives– Short-term

• Maintain uninterrupted service - 24/7

– Long-term• Improve information

– Quality – Production

– Utility - Delivery

Page 4: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

4

FORECAST EVALUATION

• Statistical approach– Evaluates set of forecasts and not a single forecast

• Interest in comparing forecast systems

– Forecasts generated by same procedure

– Sample size affects how fine stratification is possible• Level of details is limited

– Size of sample limited by available obs. record (even hind-casts)

• Types– Forecast statistics

• Depends only on forecast properties

– Verification• Comparison of forecast and proxy for “truth” in statistical sense

– Depends on both natural and forecast systems

– Nature represented by “proxy”» Observations (including observational error)» Numerical analysis (including analysis error)

Page 5: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

5

FORECAST VERIFICATION

• Types– Measures of quality

• Environmental science issues– Main focus here

– Measures of utility• Multidisciplinary

– Social & economic issues, beyond environmental sciences– Socio-economic value of forecasts is ultimate measure

» Approximate measures can be constructed

• Quality vs. utility– Improved quality

• Generally permits enhanced utility (assumption)

– How to improve utility if quality is fixed?• Providers make all information that can be made available known

– E.g., offer probabilistic or other information on forecast uncertainty» Engage in education, training

• Users identify forecast aspects important to them– Can providers selectively improve certain aspects of forecasts?

» E.g, improve precipitation forecasts without improving circulation forecasts?

Page 6: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

6

EVALUATING QUALITY OF FORECAST SYSTEMS

• Goal– Infer comparative information about forecast systems

• Value added by

– New methods

– Subsequent steps in end-to-end forecast process (eg., manual changes)

• Critical for monitoring and improving operational forecast systems

• Attributes of forecast systems– Traditionally, forecast attributes defined separately for each fcst format– General definition needed

• Need to compare forecasts

– From any system &

– Of any type / format» Single, ensemble, categorical, probabilistic, etc

• Supports systematic evaluation of

– End-to-end (provider-user) forecast process» Statistical post-processing as integral part of system

Page 7: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

7

FORECAST SYSTEM ATTRIBUTES

• Abstract concept (like length)– Reliability and Resolution

• Both can be measured through different statistics

• Statistical property– Interpreted for large set of forecasts

• Describe behavior of forecast system, not a single forecast

• For their definition, assume that– Forecasts

• Can be of any format

– Single value, ensemble, categorical, probabilistic, etc

• Take a finite number of different “classes” Fa

– Observations• Can also be grouped into finite number of “classes” like Oa

Page 8: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

8

STATISTICAL RELIABILITY – TEMPORAL AGGREGATE

STATISTICAL CONSISTENCY OF FORECASTS WITH OBSERVATIONS

BACKGROUND:• Consider particular forecast class – Fa

• Consider frequency distribution of observations that follow forecasts Fa - fdoa

DEFINITION:• If forecast Fa has the exact same form as fdoa, for all forecast classes,

the forecast system is statistically consistent with observations =>The forecast system is perfectly reliable

MEASURES OF RELIABILITY:• Based on different ways of comparing Fa and fdoa

EXAMPLES: CONTROL FCST

ENSEMBLE

Page 9: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

9

STATISTICAL RESOLUTION – TEMPORAL EVOLUTION

ABILITY TO DISTINGUISH, AHEAD OF TIME, AMONG DIFFERENT OUTCOMESBACKGROUND:• Assume observed events are classified into finite number of classes, like Oa

DEFINITION:• If all observed classes (Oa, Ob,…) are preceded by

– Distinctly different forecasts (Fa, Fb,…)– The forecast system “resolves” the problem =>

The forecast system has perfect resolutionMEASURES OF RESOLUTION:• Based on degree of separation of fdo’s that follow various forecast classes • Measured by difference between fdo’s & climate distribution• Measures differ by how differences between distributions are quantified

FORECASTS OBSERVATIONSEXAMPLES

Page 10: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

10

CHARACTERISTICS OF RELIABILITY & RESOLUTION• Reliability

– Related to form of forecast, not forecast content• Fidelity of forecast

– Reproduce nature when resolution is perfect, forecast looks like nature

– Not related to time sequence of forecast/observed systems

– How to improve?• Make model more realistic

– Also expected to improve resolution• Statistical bias correction: Can be statistically imposed at one time level

– If both natural & forecast systems are stationary in time &– If there is a large enough set of observed-forecast pairs– Link with verification:

» Replace forecast with corresponding fdo

• Resolution – Related to inherent predictive value of forecast system– Not related to form of forecasts

• Statistical consistency at one time level (reliability) is irrelevant

– How to improve?• Enhanced knowledge about time sequence of events

– More realistic numerical model should help» May also improve reliability

Page 11: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

11

CHARACTERISTICS OF FORECAST SYSTEM ATTRIBUTES

RELIABILITY AND RESOLUTION ARE

• General forecast attributes– Valid for any forecast format (single, categorical, probabilistic, etc)

• Independent attributes– For example

• Climate pdf forecast is perfectly reliable, yet has no resolution• Reversed rain / no-rain forecast can have perfect resolution and no reliability

– To separate them, they must be measured according to general definition• If measured according to traditional definition

– Reliability & resolution can be mixed

• Function of forecast quality– There is no other relevant forecast attribute

• Perfect reliability and perfect resolution = perfect forecast system =– “Deterministic” forecast system that is always correct

• Both needed for utility of forecast systems– Need both reliability and resolution

• Especially if no observed/forecast pairs available (eg, extreme forecasts, etc)

Page 12: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

12

FORMAT OF FORECASTS – PROBABILSITIC FORMAT

• Do we have a choice?– When forecasts are imperfect

• Only probabilistic format can be reliable/consistent with nature

• Abstract concept – Related to forecast system attributes

• Space of probability – dimensionless pdf or similar format

– For environmental variables (not those variables themselves)

• Definition1. Define event

• Function of concrete variables, features, etc

– E.g., “temperature above freezing”; “thunderstorm”

2. Determine probability of event occurring in future– Based on knowledge of initial state and evolution of system

Page 13: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

13

GENERATION OF PROBABILISTIC FORECASTS

• How to determine forecast probability?– Fully statistical methods – losing relevance– Numerical modeling

• Liouville Equations provide pdf’s

– Not practical (computationally intractable)

• Finite sample of pdf

– Single or multiple (ensemble) integrations» Increasingly finer resolution estimate in probabilities

• How to make (probabilistic) forecasts reliable?– Construct pdf

• Assess reliability

– Construct frequency distribution of observations following forecast classes

• Replace form of forecast with associated frequency distribution of observations

– Production and verification of forecasts connected in operations

Page 14: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

14

ENSEMBLE FORECASTS

• Definition– Finite sample to estimate full probability distribution

• Full solution (Liouville Eqs.) computationally intractable

• Interpretation (assignment of probabilities)– Narrow

• Step-wise increase in cumulative forecast probability distribution

– Performance dependent on size of ensemble

– Enhanced• Inter- & extrapolation (dressing)

– Performance improvement depends on quality of inter- & extrapolation» Based on assumptions

Linear interpolation (each member equally likely)

» Based on verification statistics

Kernel or other methods (Inclusion of some statist. bias-correction)

Page 15: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

15

OPERATIONAL PROB/ENSEMBLE FORECAST VERIFICATION• Requirements

– Use same general dimensionless probabilistic measures for verifying• Any event• Against either

– Observations or– Numerical analysis

• Measures used at NCEP– Probabilistic forecast measures – ensemble interpreted probabilistically

• Reliability– Component of BSS, RPSS, CRPSS– Attributes & Talagrand diagrams

• Resolution– Component of BSS, RPSS, CRPSS– ROC, attributes diagram, potential economic value

– Special ensemble verification procedures• Designed to assess performance of finite set of forecasts

– Most likely member statistics, PECA

• Missing components include– General event definition - Spatial/temporal/cross variable considerations– Routine testing of statistical significance– Other “spatial” and/or “diagnostic” measures?

Page 16: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

16

VERIFICATION SYSTEM DEVELOPMENT AT NCEP

• FVS, VSDB – Geoff DiMego, Keith Brill– Implement in 2007 for traditional forecasts

• Comprehensive set of basic functionalities with some limitations

• FVIS, VISDB – John Huddleston– Implement in 2008– Expanded capabilities

• Probabilistic/ensemble measures added

• Flexibility added

• Interface with newly designed GSD verification system

• Basis for NOAA-wide unified verification system– NCEP, GSD collaboration – Jennifer Mahoney

Page 17: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

17

ROLE OF DTC IN VERIFICATION “ENTERPRISE”

• Share verification algorithms across forecasting enterprise– Researchers (at DTC) must be able to use

• Exact same measures as those used at operations

– Operations must be able to easily incorporate • New measures used by researchers

• NOAA/NWS is transitioning toward probabilistic forecasting– NRC report on “Completing the forecast”

• DTC needs to coordinate with – Evolving NOAA/NWS operations in probabilistic forecast

• Generation• Verification

– For the benefit of research, operations, and R2O» Interoperable subroutines» Leveraging

Web-based user interfacesDatabase management procedures

Page 18: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

18

REFERENCES

http://wwwt.emc.ncep.noaa.gov/gmb/ens/ens_info.html

Toth, Z., O. Talagrand, and Y. Zhu, 2005: The Attributes of Forecast Systems: A Framework for the Evaluation and Calibration of Weather Forecasts. In: Predictability Seminars, 9-13 September 2002, Ed.: T. Palmer, ECMWF, pp. 584-595.

Toth, Z., O. Talagrand, G. Candille, and Y. Zhu, 2003: Probability and ensemble forecasts. In: Environmental Forecast Verification: A practitioner's guide in atmospheric science. Ed.: I. T. Jolliffe and D. B. Stephenson. Wiley, pp. 137-164.

Page 19: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

19

BACKGROUND

Page 20: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

20

VERIFICATION STATISTICS – SINGLE FORECAST

• Pointwise (can be aggregated in space / time)– RMS error & its decomposition into

• Time mean error• Random error

– Phase vs. amplitude decomposition

• Multivariate (cannot be aggregated)– PAC correlation– Temporal correlation

Page 21: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

21

VERIFICATION STATISTICS – ENSEMBLE

• Point-wise– Ensemble mean statistics

• RMS error & decomposition• Spread around mean

– Best member frequency statistics– Outlier statistics (Talagrand)

• Multivariate (cannot be aggregated)– PAC correlation– Temporal correlation– Perturbation vs. Error Correlation Analysis (PECA)– Independent degrees of freedom (DOF)– Explained error variance

Page 22: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

22

VERIFICATION STATISTICS – PROBABILISTIC

• Point-wise (computed point by point, then aggregated in space and time)– Brier Skill Score (incl. Reliability & Resolution components)– Ranked Probability Skill Score (incl. Reliability & Resolution components)– Continuous Ranked Probability Skill Score (incl. Reliability & Resolution

components)– Relative Operating Characteristics (ROC)– Potential Economic Value– Information content

• Feature-based verification done using same scores– After event definition

• Storm strike probability

Page 23: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

23

REQUIREMENTS – CFS

• NINO 3.4 anomaly correlation (CFS)• Bias-corrected US 2 meter temperature (AC, RMS)• Bias-corrected US precipitation (AC, RMS)• Weekly, monthly, seasonal, annual, inter-annual stats

REQUIREMENTS – GDAS

• All statistics segregated by instrument type• Observation counts and quality mark counts• Guess fits to observations by instrument type• Bias correction statistics• Contributions to penalty

Page 24: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

24

REQUIREMENTS – GFS

• Feature tracking– Hurricane tracks

• Raw track errors and compared to CLIPER• Frequency of being the best• By storm and basin

– Hurricane intensity– Extra-tropical storm statistics

• Verification against observations– Support both interpolation from pressure levels or from native model levels– Horizontal bias and error maps– Vertical bias and error by region– Time series of error fits– Fits by month and year

• Verification against analyses– All fields in master pressure GRIB file can be compared

• All kinds of fields, including tracers• All kinds of levels, including iso-IPV

– Single field diagnostics (without a verifying field)• Mean, mode, median, range, variance

– Masking capability• Only over snow covered, etc.

– Region selection– Anomaly correlation– RMS error– FHO statistics by threshold– Count of difference and largest difference– Superanalysis verification

Page 25: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

25

REQUIREMENTS – THORPEX / NAEFS

• Measures– CRPS for continuous variables?– BSS for extreme temperature, winds, severe weather?

• Forecasts– 500 hPa height (legacy measure, indicator of general level of predictability, to

assess long term evolution of skill)– 2m temperature – heating/cooling degree?– 10m winds– Tropical storm strike probability– Severe weather related measure (that can be verified against both analysis or

observations?)– PQPF– Probabilistic natural river flow

Page 26: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

26

REQUIREMENTS – THORPEX / NAEFS

• Measures– CRPS for continuous variables?– BSS for extreme temperature, winds, severe weather?

• Forecasts– 500 hPa height (legacy measure, indicator of general level of predictability, to

assess long term evolution of skill)– 2m temperature – heating/cooling degree?– 10m winds– Tropical storm strike probability– Severe weather related measure (that can be verified against both analysis or

observations?)– PQPF– Probabilistic natural river flow

Page 27: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

27

EXAMPLE – FLOW OF ENSEMBLE VERIFICATION• Define desired verification: choose verification statistics (continuous ranked

probability score against climatology), variable (2m temp), event (above 0C), for a particular week, one particular lead time and area, verified against set of observations

• Set up script to be run based on info above; statistics to be computed in loop going over each day

• For each day in loop, read in verification data; read in forecast grid; read in climate info; interpolate forecast data to observations (time/space interpolation

• Compute intermediate statistics for CRPSS for each day by comparing observations to interpolated forecast for both forecast system and climate forecast

• Aggregate intermediate statistics either– Over selected domain (possibly with latitudinal weighting) for each day (for time plot of

scores) OR– In time (averaging with equal or decaying weights) for each verification point (for spatial

display of scores)• Store intermediate and final statistics in database • Display results either in tabular or graphical format

Page 28: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

28

FORECAST METHODS

• Empirically based– Based on record of observations =>

• Possibly very good reliability• Will fail in “new” (not yet observed) situations (eg., climate trend, etc)

– Resolution (forecast skill) depends on length of observations• Useful for now-casting, climate applications• Not practical for typical weather forecasting

• Theoretically based– Based on general scientific principles

• Incomplete/approximate knowledge in NWP models =>– Prone to statistical inconsistency

– Run-of-the-mill cases can be statistically calibrated to insure reliability– For forecasting rare/extreme events, statistical consistency of model

must be improved– Predictability limited by

• Gaps in knowledge about system• Errors in initial state of system

Page 29: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

29

SCIENTIFIC BACKGROUND:WEATHER FORECASTS ARE UNCERTAIN

Buizza 2002

Page 30: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

30

USER REQUIREMENTS:PROBABILISTIC FORECAST INFORMATION IS CRITICAL

Page 31: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

31

FORECASTING IN A CHAOTIC ENVIRONMENT –PROBABILISTIC FORECASTING BASED A ON SINGLE FORECAST –

One integration with an NWP model, combined with past verification statistics

DETERMINISTIC APPROACH - PROBABILISTIC FORMAT

•Does not contain all forecast information

•Not best estimate for future evolution of system

•UNCERTAINTY CAPTURED IN TIME AVERAGE SENSE -

•NO ESTIMATE OF CASE DEPENDENT VARIATIONS IN FCST UNCERTAINTY

Page 32: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

32

FORECASTING IN A CHAOTIC ENVIRONMENT - 2DETERMINISTIC APPROACH - PROBABILISTIC FORMAT

PROBABILISTIC FORECASTING -

Based on Liuville Equations

Continuity equation for probabilities, given dynamical eqs. of motion

• Initialize with probability distribution function (pdf) at analysis time• Dynamical forecast of pdf based on conservation of probability values• Prohibitively expensive -

• Very high dimensional problem (state space x probability space)• Separate integration for each lead time• Closure problems when simplified solution sought

Page 33: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

33

FORECASTING IN A CHAOTIC ENVIRONMENT - 3DETERMINISTIC APPROACH - PROBABILISTIC FORMAT

MONTE CARLO APPROACH – ENSEMBLE FORECASTING

• IDEA: Sample sources of forecast error• Generate initial ensemble perturbations• Represent model related uncertainty

• PRACTICE: Run multiple NWP model integrations• Advantage of perfect parallelization• Use lower spatial resolution if short on resources

• USAGE: Construct forecast pdf based on finite sample• Ready to be used in real world applications• Verification of forecasts• Statistical post-processing (remove bias in 1st, 2nd, higher moments)

CAPTURES FLOW DEPENDENT VARIATIONS

IN FORECAST UNCERTAINTY

Page 34: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

34

NCEP GLOBAL ENSEMBLE FORECAST SYSTEM

MARCH 2004 CONFIGURATION

Page 35: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

35

MOTIVATION FOR ENSEMBLE FORECASTING

• FORECASTS ARE NOT PERFECT - IMPLICATIONS FOR:

– USERS:• Need to know how often / by how much forecasts fail• Economically optimal behavior depends on

– Forecast error characteristics– User specific application

» Cost of weather related adaptive action» Expected loss if no action taken

– EXAMPLE: Protect or not your crop against possible frost

Cost = 10k, Potential Loss = 100k => Will protect if P(frost) > Cost/Loss=0.1• NEED FOR PROBABILISTIC FORECAST INFORMATION

– DEVELOPERS:• Need to improve performance - Reduce error in estimate of first moment

– Traditional NWP activities (I.e., model, data assimilation development)• Need to account for uncertainty - Estimate higher moments

– New aspect – How to do this?• Forecast is incomplete without information on forecast uncertainty • NEED TO USE PROBABILISTIC FORECAST FORMAT

Page 36: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

36

• No matter what / how sophisticated forecast methods we use– Forecast skill limited– Skill varies from case to case

• Forecast uncertainty must be assessed by meteorologists

Do users need to know about uncertainty in forecasts?

HOW TO DEAL WITH FORECAST UNCERTAINTY?How forecast uncertainty can be communicated?

Pro

bab

ility

THE PROBABILISTIC APPROACH

Page 37: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

37

WeeksWeeksWeeksWeeks

Typ

e o

f G

uid

anc

eT

ype

of

Gu

idan

ce

Warnings & Alert Warnings & Alert CoordinationCoordination

WatchesWatches

ForecastsForecasts

Threat Assessments

GuidanceGuidance

OutlookOutlook

MinutesMinutesMinutesMinutes DaysDaysDaysDaysHoursHoursHoursHours YearsYearsYearsYears

Lead Time

SeasonsSeasonsSeasonsSeasonsMonthsMonthsMonthsMonths

Forecast Forecast UncertaintyUncertaintyForecast Forecast UncertaintyUncertainty

Protection of Protection of Life/PropertyLife/Property

Flood mitigationFlood mitigationNavigationNavigation

TransportationTransportationFire weatherFire weather

HydropowerHydropowerAgricultureAgriculture

EcosystemEcosystemHealthHealth

CommerceCommerceEnergyEnergy

Initi

al C

ondi

tion

Sens

itivi

ty

Boundary Condition

Sensitivity

SOCIO-ECONOMIC BENEFITS OFSEAMLESS WEATHER/CLIMATE FORECAST SUITE

Reservoir controlReservoir controlRecreationRecreation

Page 38: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

38

Page 39: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

39

Page 40: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

40

144 hr forecast

Verification

Poorly predictable large scale waveEastern Pacific – Western US

Highly predictable small scale waveEastern US

Page 41: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

41

Page 42: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

42

Page 43: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

43

Page 44: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

44

FORECAST PERFORMANCE MEASURES

MEASURES OF RELIABILITY:DESCRIPTION:Statistically compares any sample of forecasts with sample of corresponding observations

GOAL:To assess similarity of samples (e.g., whether 1st and 2nd moments match)EXAMPLES:Reliability component of

Brier ScoreRanked Probability Score

Analysis Rank HistogramSpread vs. Ens. Mean errorEtc.

MEASURES OF RESOLUTION:DESCRIPTION:Compares the distribution of observations that follows different classes of forecasts with the climate distribution (as reference)GOAL:To assess how well the observations are separated when grouped by different classes of preceding fcstsEXAMPLES:Resolution component of

Brier ScoreRanked Probability Score

Information contentRelative Operational CharacteristicsRelative Economic ValueEtc.

COMMON CHARACTERISTIC: Function of both forecast and observed values

COMBINED (REL+RES) MEASURES: Brier, Ranked Probab. Scores, rmse, PAC, etc

Page 45: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

45

EXAMPLE – PROBABILISTIC FORECASTS

RELIABILITY:Forecast probabilities for given event

match observed frequencies of that event (with given prob. fcst)

RESOLUTION:Many forecasts fall into classes

corresponding to high or low observed frequency of given event

(Occurrence and non-occurrence of event is well resolved by fcst system)

Page 46: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

46

Page 47: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

47

PROBABILISTIC FORECAST PERFORMANCE MEASURESTO ASSESS TWO MAIN ATTRIBUTES OF PROBABILISTIC FORECASTS:

RELIABILITY AND RESOLUTIONUnivariate measures: Statistics accumulated point by point in spaceMultivariate measures: Spatial covariance is considered

BRIER SKILL SCORE (BSS)COMBINED MEASURE OF RELIABILITY AND RESOLUTION

EXAMPLE:

Page 48: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

48

BRIER SKILL SCORE (BSS)

METHOD:Compares pdf against analysis • Resolution (random error)• Reliability (systematic error)

EVALUATIONBSS Higher betterResolution Higher betterReliability Lower better

RESULTSResolution dominates initiallyReliability becomes important later• ECMWF best throughout

– Good analysis/model?

• NCEP good days 1-2– Good initial perturbations?– No model perturb. hurts later?

• CANADIAN good days 8-10– Model diversity helps?

May-June-July 2002 average Brier skill score for the EC-EPS (grey lines with full circles), the MSC-EPS (black lines with open circles) and the NCEP-EPS (black lines with crosses). Bottom: resolution (dotted) and reliability(solid) contributions to the Brier skill score. Values refer to the 500 hPa geopotential height over the northern hemisphere latitudinal band 20º-80ºN, and have been computed considering 10 equally-climatologically-likely intervals (from Buizza, Houtekamer, Toth et al, 2004)

COMBINED MEASURE OF RELIABILITY AND RESOLUTION

Page 49: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

49

BRIER SKILL SCORECOMBINED MEASURE OF RELIABILITY AND RESOLUTION

Page 50: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

50

RANKED PROBABILITY SCORECOMBINED MEASURE OF RELIABILITY AND RESOLUTION

Page 51: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

51

ANALYSIS RANK HISTOGRAM (TALAGRAND DIAGRAM)

MEASURE OF RELIABILITY

Page 52: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

52

ENSEMBLE MEAN ERROR VS. ENSEMBLE SPREAD

Statistical consistency between the ensemble and the verifying analysis means that the verifying analysis should be statistically indistinguishable from the ensemble members =>

Ensemble mean error (distance between ens. mean and analysis) should be equal to ensemble spread (distance between ensemble mean and ensemble members)

MEASURE OF RELIABILITY

In case of a statistically consistent ensemble, ens. spread = ens. mean error, and they are both a MEASURE OF RESOLUTION. In the presence of bias, both rms error and PAC will be a combined measure of reliability and resolution

Page 53: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

53

INFORMATION CONTENTMEASURE OF RESOLUTION

Page 54: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

54

RELATIVE OPERATING CHARACTERISTICSMEASURE OF RESOLUTION

Page 55: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

55

ECONOMIC VALUE OF FORECASTSMEASURE OF RESOLUTION

Page 56: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

56

PERTURBATION VS. ERROR CORRELATION ANALYSIS (PECA)

METHOD: Compute correlation between ens perturbtns and error in control fcst for

– Individual members– Optimal combination of members– Each ensemble – Various areas, all lead time

EVALUATION: Large correlation indicates ens captures error in control forecast

– Caveat – errors defined by analysis

RESULTS:– Canadian best on large scales

• Benefit of model diversity?

– ECMWF gains most from combinations• Benefit of orthogonalization?

– NCEP best on small scale, short term• Benefit of breeding (best estimate initial

error)?

– PECA increases with lead time• Lyapunov convergence

• Nonlilnear saturation

– Higher values on small scales

MULTIVATIATE COMBINED MEASURE OFRELIABILITY & RESOLUTION

Page 57: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

57

WHAT WE NEED FOR POSTPROCESSING TO WORK?

• LARGE SET OF FCST – OBS PAIRS• Consistency defined over large sample – need same for post-processing• Larger the sample, more detailed corrections can be made

• BOTH FCST AND REAL SYSTEMS MUST BE STATIONARY IN TIME• Otherwise can make things worse• Subjective forecasts difficult to calibrate

HOW WE MEASURE STATISTICAL INCONSISTENCY?

• MEASURES OF STATIST. RELIABILITY • Time mean error• Analysis rank histogram (Talagrand diagram)• Reliability component of Brier etc scores• Reliability diagram

Page 58: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

58

SOURCES OF STATISTICAL INCONSISTENCY• TOO FEW FORECAST MEMBERS

• Single forecast – inconsistent by definition, unless perfect• MOS fcst hedged toward climatology as fcst skill is lost

• Small ensemble – sampling error due to limited ensemble size

(Houtekamer 1994?)

• MODEL ERROR (BIAS)• Deficiencies due to various problems in NWP models

• Effect is exacerbated with increasing lead time

• SYSTEMATIC ERRORS (BIAS) IN ANALYSIS• Induced by observations

• Effect dies out with increasing lead time• Model related

• Bias manifests itself even in initial conditions

• ENSEMBLE FORMATION (INPROPER SPREAD)• Not appropriate initial spread• Lack of representation of model related uncertainty in ensemble

• I. E., use of simplified model that is not able to account for model related uncertainty

Page 59: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

59

HOW TO IMPROVE STATISTICAL CONSISTENCY?

• MITIGATE SOURCES OF INCONSISTENCY • TOO FEW MEMBERS

• Run large ensemble• MODEL ERRORS

• Make models more realistic• INSUFFICIENT ENSEMBLE SPREAD

• Enhance models so they can represent model related forecast uncertainty

• OTHERWISE =>

• STATISTICALLY ADJUST FCST TO REDUCE INCONSISTENCY• Unpreferred way of doing it• What we learn can feed back into development to mitigate problem at sources• Can have LARGE impact on (inexperienced) users

Page 60: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

60

Page 61: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

61

SUMMARY• WHY DO WE NEED PROBABILISTIC FORECASTS?

– Isn’t the atmosphere deterministic? YES, but it’s also CHAOTICFORECASTER’S PERSPECTIVE USER’S PERSPECTIVEEnsemble techniques Probabilistic description

• WHAT ARE THE MAIN ATTRIBUTES OF FORECAST SYSTEMS?– RELIABILITY Stat. consistency with distribution of corresponding observations– RESOLUTION Different events are preceded by different forecasts

• WHAT ARE THE MAIN TYPES OF FORECAST METHODS?– EMPIRICAL Good reliability, limited resolution (problems in “new” situations)– THEORETICAL Potentially high resolution, prone to inconsistency

• ENSEMBLE METHODS– Only practical way of capturing fluctuations in forecast uncertainty due to

• Case dependent dynamics acting on errors in– Initial conditions– Forecast methods

• HOW CAN PROBABILSTIC FORECAST PERFORMANCE BE MEASURED?Various measures of reliability and resolution

• STATISTICAL POSTPROCESSING Based on verification statistics – reduce statistical inconsistencies

Page 62: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

62

BACKGROUND

Page 63: 1 VERIFICATION OF OPERATIONAL PROBABILISTIC & ENSEMBLE FORECASTS Zoltan Toth Environmental Modeling Center NOAA/NWS/NCEP Ackn.: Yuejian Zhu, Olivier Talagrand

63

http://wwwt.emc.ncep.noaa.gov/gmb/ens/ens_info.htmlToth, Z., O. Talagrand, and Y. Zhu, 2005: The Attributes of Forecast Systems: A Framework for the Evaluation and Calibration of Weather Forecasts. In: Predictability Seminars, 9-13 September 2002, Ed.: T. Palmer, ECMWF, in press. Toth, Z., O. Talagrand, G. Candille, and Y. Zhu, 2003: Probability and ensemble forecasts. In: Environmental Forecast Verification: A practitioner's guide in atmospheric science. Ed.: I. T. Jolliffe and D. B. Stephenson. Wiley, p. 137-164.