statistical weather forecasting 3

37
Statistical Weather Forecasting 3 Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks

Upload: denim

Post on 25-Feb-2016

30 views

Category:

Documents


2 download

DESCRIPTION

Statistical Weather Forecasting 3. Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks. Let’s review a few concepts that were introduced last time on Forecast Verification. Purposes of Forecast Verification - PowerPoint PPT Presentation

TRANSCRIPT

Statistics Presentation 3

Statistical Weather Forecasting 3Daria KluverIndependent Study

From Statistical Methods in the Atmospheric SciencesBy Daniel WilksLets review a few concepts that were introduced last time on Forecast VerificationPurposes of Forecast VerificationForecast verification- the process of assessing the quality of forecasts.Any given verification data set consists of a collection of forecast/observation pairs whose joint behavior can be characterized in terms of the relative frequencies of the possible combinations of forecast/observation outcomes.This is an empirical joint distributionThe Joint Distribution of Forecasts and ObservationsForecast = Observation = The joint distribution of the forecasts and observations is denoted

This is a discrete bivariate probability distribution function associating a probability with each of the IxJ possible combinations of forecast and observation.

The joint distribution can be factored in two ways, the one used in a forecasting setting is:

Called calibration-refinement factorizationThe refinement of a set of forecasts refers to the dispersion of the distribution p(yi)

If yi has occurred, this is the probability of oj happening.Specifies how often each possible weather event occurred on those occasions when the single forecast yi was issued, or how well each forecast is calibrated.The unconditional distribution, which specifies the relative frequencies of use of each of the forecast values yi sometimes called the refinement of a forecast.Scalar Attributes of Forecast PerformanceAccuracyAverage correspondence between individual forecasts and the events they predict.BiasThe correspondence between the average forecast and the average observed value of the predictand.ReliabilityPertains to the relationship of the forecast to the average observation, for specific values of the forecast.ResolutionThe degree to which the forecasts sort the observed events into groups that are different from each other.DiscriminationConverse of resolution, pertains to differences between the conditional averages of the forecasts for different values of the observation.SharpnessCharacterize the unconditional distribution (relative frequencies of use) of the forecasts.Forecast SkillForecast skill- the relative accuracy of a set of forecasts, wrt some set of standard control, or reference, forecast (like climatological average, persistence forecasts, random forecasts based on climatological relative frequencies)Skill score- a percentage improvement over reference forecast.

accuracyAccuracy of referenceAccuracy that would be achieved by a perfect forecast.On to new material2x2 Contingency tablesScalar attributes of contingency tablesTornado exampleNWS vs Weather.com vs climatologySkill ScoresProbabilistic ForecastsMulticategory Discrete PredictandsContinuous Predictands Plots and scoreProbability forecasts for multicategory eventsNon-Probabilistic Field forecasts

Nonprobabilistic Forecasts of Discrete PredictandsNonprobabilistic contains unqualified statement that a single outcome will occur. Contains no expression of uncertainty.

The 2x2 Contingency TableThe simplest joint distribution is from I=J=2. (or nonprobabilistic yes/no forecasts)I=2 possible forecasts

J=2 outcomes

i=1 or y1, event will occuri=2 or y2, event will not occurj=1 or o1, event subsequently occursj=2 or o2, event doesnt subsequently occur

a forecast-observation pairs called hitstheir relative frequency, a/n is the sample estimate of the corresponding joint probability p(y1,o1) b occasions called false alarmsthe relative frequency estimates the joint probability p(y1,o2)C occasions called missesthe relative frequency estimates the joint probability p(y2,o1)D occasions called correct rejection or correct negative the relative frequency estimates the joint probability p(y2,o2)The a forecast observations are called hits, and their relative frequency, a/n is the sample estimate of the corresponding joint probability p(y1,o1)B occasions, false alarms, and the relative frequency estimates the joint probability p(y1,o2)C occasions, misses p(y2,o1)D correct rejection or correct negative p(y2,o2)When you divide by n: refinement distribution: p(y1)=(a+b)/n p(y2)=(c_d)/nThe calibration-refinement factorization in the 2x2 verification setting consists of the conditional probabilities p(o1|y1)=a/(a+b)P(o2|y1)=b/(a+b)P(o1|y2)=c/(c+d)P(o2|y2) = d/(c+d)Similarly, marginal distribution p(oj) with elements p(o1) = (a+c)/n and p(o2)=(b+d)/n is the base-rate (ie sample climatological) distribution in the likelihood-base rate factorization. The remainder of that factorization consists of the 4 conditional probabilities:P(y1|o1) = a/(a+c)P(y2|o1) = c/(a+c)P(y1|o2) = b/(b+d)P(y2|o2) = d/(b+d)

11Scalar Attributes Characterizing 2x2 contingency tablesAccuracy proportion correctThreat Score (TS)Odds ratioBias-Comparison of the average forecast with the average observationReliability and Resolution-False Alarm RatioDiscrimination-Hit rate False Alarm RateScalar Attributes Characterizing 2x2 contingency tablesAccuracy proportion correctPC = (a+d)/nNot desirable when yes event is rarePenalizes both kinds of errors equallyIn percent is called hit rateThreat Score (TS)Or critical success index(CSI)Useful when the event to be forecast occurs substantially less frequently than the nonoccurrence.TS = CSI = a/(a+b+c)A proportion correct for the quantity being forecast after removing correct no forecasts from consideration.Not just used for different forecast occasions but used to asses simultaneously issued spatial forecasts. OddsOdds = the ratio of a probability to its complementary probability, p/(1-p)Odds ratio- the ratio of the conditional odds of a hit, given that the event occurs, to the conditional odds of a false alarm, given that the event does not occur.Larger values indicate more accurate forecastsBias -Comparison of the average forecast with the average observationB= (a+b)/(a+c)Ratio of number of yes forecasts to number of yes observationsunbiased B=1B>1, forecast more than it occurred -> overforecasting B1, suggesting better than random performance

Bias ratio is B=1.96, indicating that approximately twice as many tornados were forecast as actually occurred

FAR = 0.720, which expresses the fact that a fairly large fraction of the forecast tornados did not eventually occur.

H=0.549 and F=0.0262, indicating that more than half of the actual tornados were forecast to occur, whereas a very small fraction of the non tornado cases falsely warned of a tornado.

Skill Scores:HSS=0.355PSS=0.523CSS=0.271GSS=0.216Q=0.957IxJ-1 = 3 dimensions, so any scalar measure is a loss of informationVerification statistics should be equitable:Rates random forecasts and rates all constant forecasts equallyCorrect forecasts of less frequent events are weighted more strongly

16What if your data are Probabilistic?For a dichotomous predictand, to convert from a probabilistic to a nonprobabilistic format requires selection of a threshold probability, above which the forecast will be yes.Ends up somewhat arbitrary.

Climatological probability of precipThreshold that would maximize the Threat scoreProduce unbiased forecasts (b=1)Nonprobabilistic forecasts of the more likely of the two events.Multicategory Discrete PredictandsMake into 2x2 tables

rainmixsnowR m sR non-rainrainNon-rainNonprobabilistic Forecasts of continuous predictandsIt is informative to graphically represent aspects of the joint distribution of nonprobabilistic forecasts for continuous variables.

These plots are examples of a diagnostic verification technique, allowing diagnosis of a particular strengths and weakness of a set of forecasts through exposition of the full joint distribution.Conditional Quantile Plotsperformance of MOS forecasts b) performance of subjective forecastsConditional distributions of the observations given the forecasts are represented in terms of selected quantiles, wrt the perfect 1:1 line.Contain 2 parts, representing the 2 factors in the calibration refinement factorization of the joint distribution of forecasts and observations.MOS observed temps are consistently colder than the forecastsSubjective forecasts are essentially unbiased.

Subjective forecasts are somewhat sharper, or more refined, more extreme temperatures being forecast more freq.Scalar Accuracy MeasuresOnly 2 scalar measures of forecast accuracy for continuous predictands in common use.Mean Absolute Error, and Mean Squared Error

Mean Absolute Error The arithmetic average of the absolute values of the differences between the members of each pair.MAE = 0 if forecasts are perfect. Often used to verify temp forecasts.

Mean Squared Error The average squared difference between the forecast and observed pairsMore sensitive to larger errors than MAEMore sensitive to outliersMSE = 0 for perfectRMSE = which has same physical dimensions as the forecasts and observationsTo calculate the bias of the forecast, compute the Mean Error:

Skill ScoresCan be computed with MAE, MSE, or RMSE as the underlying accuracy statistics

Climatological value for day k

Probability Forecasts of Discrete PredictandsThe joint Distribution for Dichotomous EventsNot just using probabilities of 0 and 1For each possible forecast probability we see the relative freq that forecast value was used, and the probability that the event o1 occurred given the forecast yi

The Brier ScoreScalar accuracy measure for verification of probabilistic forecasts of dichotomous events

This is the mean squared error of the probability forecasts, where o1 = 1 if the event occurs and o2 = 0 if the event doesnt occur.Perfect forecast BS = 0 less accurate forecasts receive higher BS.Briar Skill Score:

The Reliability DiagramIs a graphical device that shows the full joint distribution of forecasts and observations for probability forecasts of a binary predictand, in terms of its calibration-refinement factorizationAllows diagnosis of particular strengths and weaknesses in a verification set.

The conditional event relative frequency is essentially equal to the forecast probability.Forecasts are consistently too small relative to the conditional event relative frequencies, avg forecast smaller than avg obs.Forecasts are consistently too large relative to the conditional event relative frequencies, avg forecast larger than avg obs.Overconfident: extreme probabilities forecast too oftenUnderconfident: extreme probabilities forecast too infrequently

Well-calibrated probability forecasts mean what they say, in the sense that subsequent event relative frequencies are equal to the forecast probabilities.

Other diagram that is not sensitive to conditional and unconditional forecast biases:ROC Diagram:Relative Operating Charasteric, or Receiver Operating Characteristic DiagramAnother discrimination-based graphical forecast verification displayDoes not include the full info contained in the joint distribution of forecasts and observationsLooks for sufficient probability to warrant a nonprobabilistic forecast of an event.ROC diagrams are constructed by evaluating contingency tables using the hit rate (H) and false alarm rate (F)Can be described with a single scalar value, area under the ROC curveAperf = 1Can be expressed as a skill score:SSroc=A-Arandom/Aperf-Arand = (A-.5)/(1-.5) = 2A-1

In the ROC diagrams move the thresholds and re-calculate F and H. You want a value as close to H=1 as possible, with the smallest F possible.

30Hedging and Strictly proper scoring rulesIf a forecaster is just trying to get the best score, they may improve scores by hedging, or gaming -> forecasting something other than our true belief in order to achieve a better score.Strictly proper a forecast evaluation procedure that awards a forecasters best expected score only when his or her true beliefs are forecast.Cannot be hedgedBrier scoreYou can derive that it is proper, but I wont here.Probability Forecasts for Multiple-category eventsFor multiple-category ordinal probability forecasts:Verification should penalize forecasts increasingly as more probability is assigned to event categories further removed from the actual outcome.Should be strictly proper.

Commonly used:Ranked probability score (RPS)

Ranked probability score (RPS)Extension of the Brier score to the many-event situationYm = 1, Om = 1 (sum of probabilities)RPS is the sum of squared differences between the components of the cumulative forecast and observation vectors Perfect RPS = 0 because all probability would be assigned to a yj that is correct.

32Probability forecasts for continuous predictandsFor an infinite number of predictand classes the ranked probability score can be extended to the continuous case.Continuous ranked probability score

Strictly properSmaller values are betterIt rewards concentration of probability around the step function located at the observed value.

1

Central Credible Interval Forecasts (we discussed in the second presentation)The interval width is constant on every forecast occasion, but the location of the interval and the probability it subtends are allowed to vary (fixed width CCI)Ranked probability score3 categories (below, within, and above the forecast interval)RPS=(p-1)^2/2 if the ob falls within the intervalRPS=(p^2+1)/2 if the ob is outside the intervalThe probability within the interval is constant on every forecast occasion but the interval location and width may both change.Winklers scoreW= (b-a+1)+k(a-o), o