verification of high resolution precipitation forecast by...

43
1 Verification of high resolution precipitation forecast by radar-based data D. Rezacova 1 , B. Szintai 2 , B. Jakubiak 3 , J. I. Yano 4 and S. Turner [1]{Institute of Atmospheric Physics AS CR, Prague, Czech Republic} 5 [2]{Hungarian Meteorological Service, Budapest, Hungary} [3]{Warsaw University, ICM, Warsaw, Poland} [4]{CNRM/GAME, Météo France and CNRS, Toulouse, France} [5]{ATMOSPHERE, Toulouse, France} Abstract A state of art of the precipitation-forecast verification in high resolution limit is reviewed. Forecast verification depends on observational data used for verification as well as on a verification technique adopted. The basics for the determination of radar-based verification data are first summarized. Both traditional verification techniques and more recent spatially- based approaches are, then, reviewed. Potentials for dual polarization radars in the verification of simulated microphysics are emphasized. The review emphasizes importance of spatially-based verification techniques that take into consideration the uncertainties of the forecasts, above all, the existing quantitative precipitation-forecasts (QPF). A basic principle of spatial verification is to relax the condition of an exact match to the observation at fine scales by taking an analogy with a visual verification. The review examines the various types of spatial verification techniques. Three examples (CRA, SAL and FSS) are particularly scrutinized. In the first example, the CRA (Contiguous Rain Area) verification technique is applied to the forecasts produced by the Unified Model (UK Met. Office) over Poland in July 2009. In the second example, the SAL (Structure- Amplitude-Location) technique is applied to evaluating an impact of boundary layer parameterizations on the resolved deep convection in the AROME non-hydrostatic model. In the third example, the forecasts of a flash flood period with heavy local convective rainfalls are considered with the FSS (Fractions Skill Score) technique. The review concludes by summarizing the perspectives for application of radar data to QPF and microphysical verifications. The importance of QPF verification in the context of convection parameterization studies is emphasized. We also emphasize a need for developing of more physically-based verifications methods as future direction. 1 Introduction Precipitation is probably the most important information required by users of numerical weather forecasts. For this purpose, a key is to infer an expected amount of precipitation accumulated over a specified time period and area. This is the issue of quantitative precipitation forecast (QPF). Though QPF has improved significantly over the last decades, it still remains the most difficult problem in operational forecasts with large uncertainties. It is particularly the case for convective precipitation with rapid evolution and strong space variability. Here, importance of proper understanding of convective dynamics and improvements of associated model physics is hardly overemphasized. Remember that many of the model physics are represented only in parametric manner (i.e., parameterizations).

Upload: lynguyet

Post on 14-May-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

1

Verification of high resolution precipitation forecast by radar-based data D. Rezacova1, B. Szintai2, B. Jakubiak3, J. I. Yano4 and S. Turner[1]{Institute of Atmospheric Physics AS CR, Prague, Czech Republic}

5

[2]{Hungarian Meteorological Service, Budapest, Hungary} [3]{Warsaw University, ICM, Warsaw, Poland} [4]{CNRM/GAME, Météo France and CNRS, Toulouse, France} [5]{ATMOSPHERE, Toulouse, France} Abstract A state of art of the precipitation-forecast verification in high resolution limit is reviewed. Forecast verification depends on observational data used for verification as well as on a verification technique adopted. The basics for the determination of radar-based verification data are first summarized. Both traditional verification techniques and more recent spatially-based approaches are, then, reviewed. Potentials for dual polarization radars in the verification of simulated microphysics are emphasized. The review emphasizes importance of spatially-based verification techniques that take into consideration the uncertainties of the forecasts, above all, the existing quantitative precipitation-forecasts (QPF). A basic principle of spatial verification is to relax the condition of an exact match to the observation at fine scales by taking an analogy with a visual verification.

The review examines the various types of spatial verification techniques. Three examples (CRA, SAL and FSS) are particularly scrutinized. In the first example, the CRA (Contiguous Rain Area) verification technique is applied to the forecasts produced by the Unified Model (UK Met. Office) over Poland in July 2009. In the second example, the SAL (Structure-Amplitude-Location) technique is applied to evaluating an impact of boundary layer parameterizations on the resolved deep convection in the AROME non-hydrostatic model. In the third example, the forecasts of a flash flood period with heavy local convective rainfalls are considered with the FSS (Fractions Skill Score) technique.

The review concludes by summarizing the perspectives for application of radar data to QPF and microphysical verifications. The importance of QPF verification in the context of convection parameterization studies is emphasized. We also emphasize a need for developing of more physically-based verifications methods as future direction. 1 Introduction Precipitation is probably the most important information required by users of numerical weather forecasts. For this purpose, a key is to infer an expected amount of precipitation accumulated over a specified time period and area. This is the issue of quantitative precipitation forecast (QPF). Though QPF has improved significantly over the last decades, it still remains the most difficult problem in operational forecasts with large uncertainties. It is particularly the case for convective precipitation with rapid evolution and strong space variability. Here, importance of proper understanding of convective dynamics and improvements of associated model physics is hardly overemphasized. Remember that many of the model physics are represented only in parametric manner (i.e., parameterizations).

Page 2: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

2

Thus, model verification techniques must be developed in such manner that it helps both for the dynamical interpretation of forecasts as well as further developments and improvements of model physics.

The forecast verification quantifies the uncertainties associated with QPF. The present operational high resolution (HR) numerical weather prediction (NWP) models can produce time-space evolving information for rainfall and microphysics with horizontal resolution of the order of 1km. This scale happens to correspond to a typical horizontal resolution of operational radar data. For this reason, the radar-based rainfall values become extremely relevant for verifications of HR precipitation forecast.

The present review summarizes the state of art of the QPF verification, in which the radar data is included as a part of the verification data sets. The two aspects are especially important for considering the QPF verification by radar data: 1. Types of radar data available for verifications, as well as their basic properties and uncertainties; 2. Verification techniques suitable for HR QPF by taking into account of QPF uncertainties with a goal of providing results useful for forecast users. Verification needs to meet the demands of many diverse groups, including the modellers, the forecasters, and the end users. For example, modellers may not wish to see simple-minded forecast quantities such the user-oriented scores. More detailed aspects of precipitation forecasts such as timing, localization, structure of precipitation fields would be more important for modellers.

Convective rainfalls with short durations, heavy local rainfalls, and quick hydrological responses are the most difficult to forecast quantitatively. The forecast performance strongly depends on convection parameterization adopted in a HR NWP model. In this very respect, QPF verification must be designed in such manner that it provides baseline information for developing and improving convection parameterizations. Especially, they must provide an objective tool for better quantifying the sensitivities of forecasts on convection-parameterization parameters. The present review primarily aims at modellers, who develop and test new convection parameterizations. For this reason, we focus on the QPF verification techniques suitable for NWP model developers and on the verification of convective precipitation forecasts.

The review begins, in the second section, by recalling the principles of radar measurements as well as radar-based products as verification data sets. The next sections of the review examine both traditional and spatial verification techniques and with their application in verifying HR QPF. The section 3 summarizes the traditional techniques, which are common in the verification of HR precipitation predictions. Spatial verification, which relaxes the condition of an exact match to the observation at fine scales, is the topic of section 4; three examples of spatial verifications are more specifically considered in section 5. Section 6 reviews a use of polarimetric measurement in verifications of microphysics. The concluding section 7 summarizes the fundamental aspects of the problems discussed in the review and formulates an outlook for on HR verifications.

Page 3: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

3

2 Verification data Two basic data sources are used in verification of precipitation forecasts: point-wise ground measurements by gauges and volume-based radar reflectivity data. Gauge measurements provide direct information about point-wise precipitation. The main limits with gauge data in QPF verification are (i) a limited gauge-station distribution and (ii) a representativeness of single-point measurements. On the other hand, in order to use radar measurements as rainfall data, (i) the measured radar reflectivity (Z) must be transformed into rainfall rate (R) by using a suitable form of Z-R relationship, and (ii) errors and imperfections in radar precipitation values must be eliminated being based on the principle of radar volume measurement and radar scan strategy. A merger of both ground-based gauges and radar has been recognized as a way for compiling verification data.

2.1 Ground precipitation measurements Gauge measurements are traditional direct information about point-wise values of rain intensity and rain amount for various accumulation periods. In considering the gauge data, we often meet a term “ground truth”. Comprehensive monographs cover the state of art of gauge-based precipitation measurements (e.g. Sevruk, 2004, and Strangeways, 2007).

There are many problems related to rain-gauge measurements: a limited collector size, evaporative loss, out-splashes, and effects of wind. A basic problem in using the gauge data in QPF verification is related to the estimates of area distribution of rainfall values. The spatial density of gauges can easily be too low for well capturing convective rain distribution, because the resulting single-day rainfall can differ significantly even over just several km. We typically find only one operational gauge over 50-100 km2

It is difficult to define universal gauge representativeness because it depends on gauge type, precipitation type, orography, and other factors. Nevertheless, it is commonly assumed that a gauge measurement represents a true value in the radar pixel that covers the gauge position. Pairing of radar pixel values with gauge data is a basis for statistical correction of the radar-based rainfalls.

in central Europe, and the gauge density is much less over the continents of the southern hemisphere.

Several experiments were performed with precipitation measurement by using local gauge networks of a high density (e.g. Wood et al. 2000a). They studied the differences in 15min rainfalls among 8 gauges located over an area of 2km x 2km. A standard deviation against mean of 8 measurements increased with rainfall value and reached about 4mm for 10mm rainfall over a one rain gauge. Quantification of spatial variability of precipitation is a particular domain that requires more intensive studies, especially for verifying convection representation in models more objectively.

2.2 Determination of radar based rainfalls from radar reflectivity measurements Weather radar is commonly considered capable of capturing a spatial distribution of precipitation well, but in relative sense. A radar scanned area is covered by 3D reflectivity field (Plan Position Indicator - PPI), which is typically transformed into a horizontal distribution for several elevation levels (a horizontal distribution for several elevation levels (Constant Altitude Plan Position Indicator - CAPPI). Horizontal resolution of radar

Page 4: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

4

reflectivity data is, typically, of the order of 100

Weather radar operates by emitting pulses of microwave and sampling the backscattered power. We can use a general radar equation (e.g. Doviak and Zrnic, 1984; Meischer et al., 2004) to deduce the radar reflectivity from the average received power. The radar reflectivity, which is sum of all backscattering cross sections in a unit volume, can be related to the radar reflectivity factor Z, which is a sum of the cross sections of the individual drops within a unit volume. The radar reflectivity factor being meteorologically a more meaningful way of expressing the radar reflectivity is often just referred as radar reflectivity (Meischner et. al, 2004). With a given drop-size distribution, the radar reflectivity factor Z is defined by

km, and typical radar pixel areas are from 1km x 1km to 5km x 5km.

6 6

0 ( ) ,

VZ D N D D dD

∞= =∑ ∫ (1)

where V is a unit volume, D is the diameter of spherical particle and N(D) is a particle-size distribution. Unfortunately, the Rayleigh scattering is a coarse approximation for hydrometeors, and it is generally not valid in the atmosphere. For this reason, the convention (1) is used to compare the measured return power against the equivalent radar reflectivity factor Ze, which would be equal to the radar reflectivity factor when a population of liquid and spherical particles satisfying Rayleigh approximation. A conventional unit for Z and Ze is mm6m-3

The reflectivity factor (Z or Z

. However, it is often expressed on a logarithmic scale (10logZ) with the unit of dBZ.

e

,bZ aR=

) can be converted to a radar-based rainfall rate estimate (R) by using an empirical Z-R relationship. The most common form of the Z-R relationship is

(2) where Z is in mm6m-3, R in mm.h-1

In order to obtain the R value representative of surface rain intensity, the Z value for low horizontal levels (typically CAPPI 1 – 1.5 km) is used. Rainfall of a given accumulation period can be estimated by time integrating over all radar grids inside a verification domain. However, this initial estimate must be corrected against the errors arising from the radar measurement principles and the others.

, a and b are empirical constants. The parameters a, b primarily depend on the type of drop size distribution (DSD). The Z-R relation follows from negative exponential rain DSD of the Marshall-Palmer type or a more general gamma distribution. Based on a historical first Z-R relation suggested by Marshall and Palmer (1948), the parameter values of a = 200 and b = 1.6 are obtained. These are often regarded as operationally acceptable values. However, there are many other forms of Z-R relation, and their parameters vary with regions, storm structures, and cloud microphysical properties.

Villarini and Krajewski (2010) provide an extensive literature survey on the principal sources of errors affecting single polarization radar-based rainfall estimates. These include radar miscalibration, attenuation, ground clutter, anomalous propagation, beam blockage, variability of the Z–R relation, range degradation, vertical variability of the precipitation system, vertical air motion, precipitation drift, and temporal sampling errors. Analyses of meteorological and non-meteorological error sources in radar measurements are also found in Collier (1996), Meischner (2004), as well as in the conference proceedings of ERAD (European conference on RADar in meteorology and hydrology). Meteorological services, based on their operating meteorological radars, provide products, in which the basic technical errors are eliminated as optimally as possible over the operated territory (clutters, radar calibration, etc.)

Page 5: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

5

2.3 Polarimetric radar measurement Hydrometeors in the form of raindrops and ice particles are characterized by different shapes, different orientations during fall and different dielectric constants. Thus, they backscatter differently the signals of different polarization. Polarimetric (or dual-polarization) radar is able to control the polarization of the transmitted as well as of a returned signal. The most polarimetric radars use horizontal and vertical polarizations for transmission and reception. Polarimetric weather radars have a significant advantage over single-polarization systems because they allow multi-parameter measurements useful for estimating DSD and rainfall rate. Thus, it can lead to overall improvements of quantitative precipitation estimation (QPE). A comprehensive description of the all aspects of polarimetric radar measurements is found e.g. in Doviak and Zrnic (1984), Bringi and Chandrasekar (2001), Meischner (2004), and Giangrande (2007). The most common additional parameters from polarimetric radar measurement are:

● The differential reflectivity = 10log( / )HH VVZDR Z Z , where ZHH (ZVV

● The linear depolarization ratio

) is the reflectivity of a horizontal (vertical) polarized pulse. ZDR depends on the asymmetry of particles. It is positive for oblate raindrops, zero or slightly negative for hail and graupel.

= 10log( / )VH HHLDR Z Z with ZVH (ZHH

● The specific differential phase shift KDP, which is a difference of phase shifts between horizontally and vertically polarized radiation measured in degree/km. It is a result of different propagation characteristics for different polarizations.

) the vertically (horizontally) received return for transmission with horizontal polarization. Depolarization of horizontal polarized pulse is normally small for rain but high for melting snow and water coated hail and graupel.

● The co-polar correlation coefficient ρHV, evaluated from time series of ZH and ZV

Extended discussions of parameters derived from polarimetric measurements are found e.g. in Meischner (2004). See particularly the part 5 of this monograph compiled by A. Illingtworth (2004).

, indicates a variability of scattering particles in shape, size, and thermodynamic phase.

Dual polarimetric measurements can improve rain rate estimate by considering a relationship between polarimetric variables and the parameters of drop size spectrum. The rain drop size distribution model presented in many polarimetric radar rainfall studies (Testud et al., 2000; Bringi and Chandrasekar, 2001; Illingworth and Blackman 2002) is the normalized gamma distribution:

( )0 0

3.67( ) ( ) exp ,

µµ

µ+

= −

WDDN D N f

D D (3)

where N(D) in m-3mm-1 is the volume density, D0 in mm is the median volume drop diameter, NW in m-3mm-1 is the intercept parameter, and μ (no units) is the parameter of DSD shape. When μ = 0, the definition (3) reduces to a simple exponential Marshall-Palmer DSD with concentration parameter NW = N0. From polarimetric parameters one can derive the parameters D0, NW

Various relationships between polarimetric parameters and rain rate were derived (e.g., Bringi and Chandrasekar, 2001; Illingworth, 2004; Giangrande, 2007). As an example used by several authors Illingworth (2004) presents the equation

and μ, and hence a rainfall rate R.

Page 6: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

6

( , ) ,= a bR KDP ZDR cKDP ZDR (4)

with the values a, b, and c obtained by a regression analysis of values obtained by scanning over the N0, D0

The efficiency of dual-polarization radar for QPE has been demonstrated in a number of recent studies, which are are designed to test and compare various algorithms for rain rate estimation from polarimetric measurements. For example Anagnostou et al. (2013) have evaluated a new SCOPE-ME microphysics estimation algorithm by using long-term X-band dual-polarization measurements and disdrometer DSD parameter data, acquired in Athens (Greece). The retrieval of median volume diameter D

and μ range given by Ulbrich (1983). A table of the coefficients a, b, and c in (4) was provided by Bringi and Chndrasekar (2001).

0 and intercept parameter NW are compared with two existing rain microphysical estimation algorithms and the retrievals of rain rate with three available radar rainfall estimation algorithms. The relations Z–R, R–KDP, and R–ZH,ZDR,KDP are used here with coefficients obtained multiple regression. Error statistics for rain rate estimates, in terms of relative mean and root-mean-square error, show that the SCOP-ME has a low relative error compared to the other three methods, which systematically underestimate rainfall. Rainfall rate estimates with SCOP-ME mostly depend on the D0, which is estimated much more efficiently than the intercept parameter NW

There is a large number of studies documenting methods for retrieving meteorological information from polarimetric radar measurement. An important issue in polarimetric QPE is in determining a method to employ for a given set of observed polarimetric parameters. At Colorado State University (CSU), an optimization algorithm has been developed and used for a number of years to estimate rainfall based on thresholds of Z

.

h, Zdr, and Kdp

Dual-polarization radars have also been increasingly used for investigations of cloud microphysics. According to Straka et al. (2000), who provide a basic review of microphysical interpretations of polarimetric data, an identification of hydrometeor types by polarimetric radar is accomplished by associating different bulk hydrometeor characteristics with the unions of subsets of values of the various polarimetric variables. Consequently the polarimetric radar-based microphysics categories can be applied as verification data for prognostic microphysic schemes. A retrieval of a cloud microphysical structure from polarimetric characteristics has been a topic of many studies for last several decades and the interest is increasing as the dual polarization radars become more operational. The techniques of extracting microphysical categories and their use in NWP models are further discussed in Sec. 6.

. In the study by Cifelli et al. (2011) a new rainfall algorithm using hydrometeor identification (HID) was presented to guide the choice of the particular rainfall estimation algorithm. Both the data collected from the S-band radar and a network of rain gauges were used to evaluate the performance of the new algorithm in mixed rain and hail in Colorado. Results showed that the new CSU HID-based algorithm provided good performance for the Colorado case studies presented in the study.

2.4 Quantitative precipitation estimate by merging rain-gauge and radar measurements Many different approaches improve the accuracy of radar-based precipitation estimates. A key common strategy is a merger of radar-based and gauge-based data by combining the strengths of both of them so that systematic errors in radar-based QPE can be substantially reduced.

Page 7: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

7

Main attempts are to reduce the radar biases by additional rain gauge data. However, particularly in highly variable convective precipitations, use of a complex gauge adjustment can be detrimental depending upon the density of a rain gauge network (e.g. Collier et al., 2010). For this reason, many procedures are performed simply by employing the correction factors based on the radar-rain gauge ratio of rainfalls, in which an averaged radar rainfall rate over an area covered by a number of gauges is compared with an average of the gauges given over the area. A main role of gauge adjustment is to make sure that the radar-based precipitation estimates are unbiased against gauge measurements on a long term basis. At a more general level, the issue of a merger of rain-gauge and radar data is that of the data assimilation. Along this line, various more complex techniques can be developed based on such as kriging and the other type of weighted interpolations. Some of them are used operationally. Adjustment functions are derived from historical as well as real time data. Alternatively, data representing a certain time and area window is used. See Gjertsen et al. (2004) for further details. Regression based techniques are considered by e.g., Gabella et al. (2001), Kracmar et al. (1999), Sokol (2003), Zacharov et al. (2004), Morin and Gabella (2007). A technique based on weighting the radar-based and gauge-measured rain rate, where the weights depend on the distance of gauges from a radar pixel (Seo and Breidenbach, 2001) was applied operationally in the Czech Republic (Salek et al., 2004). It was furthermore combined with a kriging-based technique (Salek 2010). Probability Matching Method, proposed by Rosenfeld et al. (1994), derived an adjustment function from an analysis of historic radar and gauge data. It could provide a statistically stable relationship, and it formed a basis for other products and procedures (Collier et al., 2010). Several operational products provide to the modellers the radar-based QPE that can be used in verification of HR forecasts. The RANIE product in the DWD Germany (Pfeifer et al., 2008), MERGE product in the CHMI Czech Rep. (Salek, 2010) are examples. In order to more properly perform merger of two different types of data sets with different spatial coverage, statistical distributions of the variables in concern must first be established as already suggested in Sect. 2.1. This procedure much facilitates merger of various rainfall measurements data together.

2.5 Use of direct radar data for verification by comparison with radar operator results A difficulty in evaluating NWP model forecasts against observations is that the latter, such as radar measurements, are not directly linked to any model parameters. The two major approaches have been proposed. The first is “observation to model approach”, which converts the observations into model variables. The second is “model to observation approach”, which simulates observation variables from model outputs. In the latter case, comparisons are performed in terms of observation characteristics. Several operational NWP models include radar-forward operators, which enable to calculate radar reflectivity from the model outputs. Simulated radar data is then compared with radar observation for evaluations. This second approach was applied to single polarisation radar by e.g., Haase and Crewell (2000) and to polarimetric radar by Pfeifer et al. (2008). The present review mostly focuses on the first approach, whilst the use of polarimetric radar forward operator by Pfeifer et al. (2010) is reviewed in the Sect. 6.

Page 8: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

8

3 Traditional verification techniques and their limits Forecast verification is indispensable in meteorological research as well as in operational forecasts. A properly-designed verification method helps to identify model shortcomings and systematic errors. It also provides quantitative assessments for the improvement of the forecasts over a time.

Various international research projects (e.g. the Sydney 2000 and Beijing 2008 Olympic Forecast and Research Demonstration Projects, Mesoscale Alpine Programm (MAP) studies, Forecast Verification Method Intercomparison Project) have provided good frameworks to develop and test new verification strategies. A Joint Working Group on Verification (JWGV) under the WMO/ WWRP was established in January 2003 in order to promote verification practices and research. International conferences and workshops about verification are organized by the WWRP/WGNE JWG on Forecast Verification Research. The group initiated special issues of Meteorological Applications on Forecast Verification (Casati et al., 2008; Ebert et al,. 2013). The QPF verification is critical for improvements of the model performances. However, the intensive verification efforts do not depend only on QPF. Those include the issues related to the verification of extreme events, the verification strategies relevant in operational forecasting. Furthermore, verification efforts must somehow be linked to improved understanding of physical processes within a model, and more specifically in convection parameterization. The latter is a wide open question still to be systematically addressed (cf., Sect. 7).

The WWRP/WGNE JWG on Verification prepared a fundamental material, which was focused on QPF and probabilistic QPF (PQPF) verification, and contained many recommendations related to these specially-based verification techniques (WWRP 2009 – 1, 2009). One of the recommendations said "where possible, combined radar-gauge rainfall analyses be used to verify model QPFs and PQPFs at high spatial and temporal resolution". The present review emphatically supports this view.

3.1 Traditional skill scores Deterministic precipitation forecasts can be verified in two distinctive manners: (i) categorical (dichotomous, binary, Yes/No) and (ii) continuous. Various verification measures (scores) can be adopted for both approaches. As an example of a categorical score, QPF is often judged categorically whether a rainfall amount exceeds a threshold. On the other hand, continuous variables, such as the rainfall amount, can be more directly adopted as a score measure. Quality measures such as the RMSE, MSE, and correlations are also defined in continuous terms. Note, however, "Because rainfall amount is not normally distributed and can have very large values, the continuous verification scores (especially those involving squared errors) which are sensitive to large errors may give less meaningful information for precipitation verification than categorical verification scores" (WWRP 2009 - 1 , 2009). It again points to an importance of careful statistical quantifications of rainfall variability both in time and space, as already emphasized in Sects. 2.1 and 2.5. The report of the WWRP/WGNE JWG on Verification (WWRP 2009 - 1, 2009) lists the recommended verification measures for forecast categories as: (1) forecasts of rain occurrence meeting or exceeding specific thresholds, (ii) forecasts of rain amount, (iii) probability

Page 9: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

9

forecasts of rain meeting or exceeding specific thresholds, (iv) verification of ensemble probability distribution. As any verification score must be regarded as a sample estimate of the "true" value, it is recommended to estimate confidence intervals to set some bounds on the expected value of the score. In the following, we summarize the recommended measures for the (i) and (ii) categories including more comprehensive information about their applications.

A traditional categorical verification of grid-point precipitation defines an event as the accumulated grid-point precipitation greater than or equal to a threshold. An alternative approach is to consider a spatial rainfall pattern over an elementary area covered by a finite number of grid points. A radar either observes an event (o=1) or not (o=0). A model forecast either predicts the event (f=1) or not (f=0). The contingency table (Table 1) counts the number of grid points (elementary areas) with hits (o=1, f=1), false alarms (o=0, f=1), misses (o=1, f=0), and correct rejections (o=0, f=0). There are number of scores based on a contingency table (e.g. Jolliffe and Stephenson, 2003; Wilks, 2006). Examples for this category of QPF verification are found e.g., in Damrath et al. (2000) for Germany and Ebert et al. (2003) for the United States, Australia, and Germany. Categorical scores recommended by WWRP 2009 - 1 (2009) are summarized in Table 2. During the COST 717 Action a survey of traditional verification measures is compiled by C. Wilson (available at http://www.smhi.se/hfa_coord/cost717/doc/WDF_02_200109_1.pdf). Table 3 lists the measures not included in (WWRP 2009 - 1, 2009).

In verifying a forecast of a continuous variable (rain amount), we have to take into account the sensitivity of continuous verification scores to the outliers. According to (WWRP 2009 – 1, 2009), the sensitivity can be reduced if we normalize the rain amount values using a square root transformation (Stephenson, 1999). If necessary an inverse transformation by sparing is applied to return to the physical units. An overview of continuous scores suggested by WWRP 2009 - 1 (2009) is reproduced as the Table 4.

Verifications adopting categorical scores involve a multi-parametric task. Graphical representations of verification results are useful for analysing specific effects on the QPF qualities. Various choices are possible for graphical representations of multiple measures of traditional yes–no forecast quality. See Roebber (2009) for a summary. An example related to the HR forecast of heavy local convective rainfalls is given in Fig. 1.

3.2 The double penalty problem The grid-point related error measures are problematic for the phenomena, such as convective precipitation, which are characterized by complex structures on the scales of less than 100 km. A classic example illustrating the limits of grid-point based scores is the well-known "double penalty problem". Prediction of a precipitation structure with a correct size and structure might yield very poor verification scores, for example, if a feature is displaced slightly in space, because categorical error scores penalize such a situation heavily. From a point of view of traditional verification, a displacement simply leads to a false alarm. A displaced forecast is also very poorly rated by a large RMSE. From these points of view, a forecast with a misplaced precipitation structure is just as bad as a forecast that totally misses an event (Davis et al., 2006a). However, a very fact that an event is somehow predicted should be positively evaluated, because from a physical point of view, it is clearly a better forecast than totally missing an event.

Page 10: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

10

This issue becomes increasingly acute as model resolutions increase (towards the horizontal grid spacing of 1–4 km). These HR models can produce precipitation fields that are comparable to radar information in resolution. Thus, complexity and variety of structures generated by a HR model must be objectively scrutinized. HR precipitation forecasts from numerical models may look quite realistic visually, and it may provide a useful guidance for forecasters. However, usefulness of such HR forecasts must be objectively quantified. Unfortunately, traditional verification scores are not quite adapted for this purpose as already emphasized. Spatial verification techniques are developed in recent years in order to overcome those limitations associated with the traditional methods. These are discussed next.

4 Spatial verification techniques Basic principles of spatial verification methods consist of relaxing an exact match to the observation at the fine scales. The spatially-based techniques stress the usefulness of the forecasts in analogy with the visual verifications ("verification by eye"). According to Ebert (2008), a useful forecast predicts an event somewhere near the observation, over a similar area, and with a similar distribution of intensities as observed. The spatially-based methods focus on the verification of gridded forecast data with an observation field that is on the same grid as for radar based QPE. Such gridded observations have greater uncertainty at the higher resolutions than at the lower resolutions. Point-wise precipitation measurements, which are another source of verification, may suffer from an issue of representativeness, particularly for highly variable fields such as convective precipitation. As a result, an estimated forecast error is further biased by an observational uncertainty. In other words, a formally-obtained forecast error should not be considered practically meaningful. The spatially-based methods are built upon an idea of identifying weather events as "objects" or "features". Under this perspective, the forecast and observed (F/O) fields of rainfall values are not compared directly at the same locations (identical or near-by grids) but instead, the objects of interests are extracted from F/O data and then compared together so that verification statistics are obtained. For this reason, the subclass (b1) to be discussed in Sect. 4.1 is called "object-oriented" or "feature-based". A large number of spatial verification methods are proposed in the literature. Comparative review studies are found e.g. in Ebert (2008), Gilleland et al, (2009), Ahijevich et al. (2009). Spatial Forecast Verification Methods Intercomparison Project (ICP), which is stemmed from the international verification workshop held in Boulder, Colorado in 2007, is an effort to analyze and compare newly proposed methods for primarily verifying HR forecasts. Their comparison suggests model users the methods appropriate for given types of data, forecasts, and desired forecast utility. A division of new methods in several categories is one of major ICP results, which is presented next.

4.1 Categories of spatial methods According to Gilleland at al. (2009), many spatial methods can be classified into two basic categories: (a) filtering methods and (b) displacement methods. The filtering methods apply a spatial filter to one or both data fields (or sometimes to difference field), and then calculate

Page 11: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

11

verification statistics on the filtered fields. A filter is usually applied at progressively coarser scales to provide information about the scales at which a forecast has a skill. The displacement methods seek a best fit of a forecast with observations by adjusting their mutual positions. The “fitting” procedure quantifies an extent that a forecast field needs to be manipulated spatially (displacement, rotations, scaling, etc.) as well as the associated residual errors. The filtering methods (a) can be further classified into (a1) neighbourhood (or fuzzy) and (a2) scale separation (or scale decomposition) methods. The neighbourhood methods apply a smoothing filter, whilst the scale-separation techniques apply several single-band spatial filters (Fourier, wavelets, etc.) so that performance at separate scales can be evaluated separately. The displacement methods (b) can be classified into (b1) feature based (or object oriented) and (b2) field deformation methods. The primary difference between the two is that the feature-based methods first identify features (or objects) of interest (e.g., storm cells), and analyse each feature separately, whereas the field deformation approaches analyse the entire field or a subset thereof.

Gilleland et al. (2009) present examples of verification techniques for each category with references. Their list of individual methods (see Table 1 in Gilleland et al, 2009) includes 2 traditional and 16 spatial techniques. Not all of the methods are unambiguously classified into one of the four categories and but some are classified into more than one category.

Ahijevich et al. (2009) applied the spatial techniques to a set of artificial and perturbed forecasts with prescribed errors, and to a set of real forecasts of convective precipitation on a 4-km grid. They summarize that “each method provided different aspects of forecast quality. Compared to the subjective scores, the traditional approaches were particularly insensitive to changes in perceived forecast quality at high-precipitation thresholds. In these cases, the newer features-based, scale-separation, neighbourhood, and field deformation methods have the ability to give a credit to close forecasts of precipitation features or resemblance of overall texture to observation. In comparing model forecasts with real cases, the traditional verification scores did not agree with the subjective assessment of the forecasts.“ Basic aspects of spatial verification categories are summarized in Table 5 as compiled by Ebert (2011).

4.2 The neighbourhood techniques The neighbourhood techniques are probably the most elaborated approach thanks to studies by Ebert (2008, 2009). Compared to the other categories, the idea, the procedures, and the applications are easier and more intuitive. The neighbourhood (also called “fuzzy”) approaches compare values of a forecast in space–time neighbourhoods relative to a point in an observation field. In majority of applications, spatial windows (also called “elementary areas”) are employed for this purpose. It is also straightforward to extend this technique to include neighbourhoods in time (Ebert 2008). A window size should depend on grid spacing, a time step, and a meteorological situation, thus a unique choice of a window may work for all the forecasts and domain sizes. Fuzzy verification techniques address this question by allowing the neighbourhood size to vary. The scale that attains a desired level of skill in forecast is determined by performing a comparison over incrementally larger neighbourhoods.

There are two strategies, which distinguish the neighbourhood techniques from the traditional ones (Ebert, 2008). The “single observation - neighbourhood forecast” strategy matches a grid box observation to corresponding neighbourhood grid boxes in forecast. Another strategy

Page 12: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

12

called “neighbourhood observation - neighbourhood forecast” takes into account also the neighbourhood surrounding the observations. The “neighbourhood observation-neighbourhood forecast” strategy takes a model-oriented viewpoint, in which observations must be “upscaled” in one way or another, then treated as representing those scales resolved by a model, usually several grid lengths. The “single observation-neighbourhood forecast” strategy represents a user-oriented viewpoint, in which it is important to verify a predicted value for a particular location of interest.

The earliest and perhaps simplest of those methods is more specifically called “upscaling”, in which both the forecasts and observations are averaged consecutively to coarser scales and then compared by traditional scores (e.g., Yates et al. 2006; Zepeda-Arce et al. 2000; Weygandt et al. 2004). A disadvantage of “upscaling” is, however, a loss of small-scale variability that is crucial for depicting high impact events such as heavy local convective precipitation. Such fine-scale variability is certainly captured by a HR model in a gross sense, though details such as the precipitation locations may be displaced from that are observed. Theis et al. (2005) was probably the first who clearly expressed the idea of fuzzy approach, and compared the forecast fractional coverage around a neighbourhood to the observed occurrence of an event. Marsigli et al. (2006) took a more general approach by comparing the statistical moments of the distribution of observations in the neighbourhoods. Several independently-developed fuzzy techniques are reviewed by Ebert (2008), who compares them by using radar-based data for an intense storm event over the United Kingdom in May 1999. This case is also examined in detail by Casati et al. (2004). The verification referred to a 3-h forecast of rain rate (mm.h−1

A large number of techniques apply the principle of spatial verification: see Ebert (2008), Ahijevich et al. (2009), Gilleland et al. (2009). Some of them are used solely by proposers of the methods, but others find more general use. Among those, we are going to examine CRA (contiguous rain areas: Ebert and McBride, 2000), SAL (scale-amplitude-location: Wernli et al., 2008, 2009) and FSS (the Fractions Skill Score: Roberts and Lean 2008) techniques in the next section in detail.

) at 5 km spatial resolution from the Nimrod radar system, which blends radar-based nowcasts with mesoscale model forecasts (Golding, 2000). A sub-domain of 256 × 256 grid boxes centered at the rain system is considered and the Nimrod quality-controlled radar rainfall analysis provided the verification data.

5 Examples of spatial verification Three spatial verification techniques were chosen for a demonstration in the present section. All results were obtained by using single-polarization radar measurements as verification data. We first discuss CRA as developed at the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM), Warsaw University (Poland). Second example is from the Hungarian Meteorological Service (HMS) applying SAL. This analysis specifically quantifies effects of boundary layer parameterizations and its modification at HMS. The third example is an application of FSS in comparing the performance of NWP models and their options (Zacharov et al., 2013). The results were obtained at the Institute of Atmospheric Physics (IAP), Czech Academy of Sciences, in collaboration with the Czech Hydrometeorological Institute (Czech Weather Service).

Page 13: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

13

5.1 CRA technique and its modification Few land surface models were tested in the ICM. The models were coupled to the mesoscale numerical prediction systems being explored operationally at the Warsaw University. Results of precipitation forecasts obtained by using different land-surface schemes were compared with radar observations. Primary radar observations used in our study consisted of 15 minutes reflectivity data on 500m CAPPI level collected from all radars operated in the area of Baltic Sea catchments. After some basic corrections these data were integrated into 1h precipitation accumulations using standard Z-R relationship. To facilitate comparisons, estimated precipitation observations were converted to the projection and the resolution of the model. In the case of Unified Model the spatial resolution of the model was 4km, and the MOSES-II (8A) land-surface physics was used in this set-up and it included: explicitly parameterized top-of-mixed-layer entrainment, non-local diffusion and gradient adjustment terms in the mixed layer, and a formulation of the surface exchange coefficients based directly on Monin-Obukov stability functions. The moist explicit physics and explicitly solved convection were main features of this version of UM model.

The present subsection considers the concept of CRA, as introduced by Ebert and McBride (2000), which is defined as an area of contiguous observed and/or forecast rainfall enclosed within a specified isohyet. This concept consists of the four major steps: 1) Identifying separate objects,

2) Describing characteristics of interest, 3) Finding matching objects in both fields,

4) Calculating verification statistics. Although there are some differences in definitions and techniques presented herein, the philosophy behind the algorithm is that of Ebert and McBride (2000). These four steps are discussed in detail each by each for now.

Object definition. We adopt, as a working hypothesis, a definition for the object proposed by Ebert and McBride (2000). The contiguous rain area (CRA) is defined as the area of contiguous observed or forecasted rainfall enclosed with a specific isohyet. Fig. 2 shows two separate observed and forecasted precipitation fields projected on the same spatial grid. A general view of this picture gives an impression that both fields are very similar, and differences between observed and forecasted rainfall entities help to evaluate the quality of the forecast. The first step of the method is to look for distinct entities that can be associated with merged observation and forecast fields. However, differently from the original algorithm, we treat the forecast and observed fields separately so that our CRAs contain only observed or forecast rain. Depending on the type of data (time span, resolution), we apply a minimal threshold for the precipitation to be considered, and we change sub-threshold values in matrices to no-rain values.

Identification of the objects within the data. We apply an algorithm that distinguishes CRAs based on their spatial coordinates of the objects. Once these spatial coordinates are defined, various statistics such as the maximum precipitation area, average precipitation, and total rainfall can be evaluated for every identified object over a forecast period.

Matching the forecast and observed objects. The comparison of F/O fields begins with identifying objects that the model intends to forecast. The most obvious measure for a similarity between F/O is a mutual distance. Fig. 3 shows examples of observed and forecast objects and the method for estimating the distance between the centres of mass of observed

Page 14: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

14

and forecast objects. There are at least three main methods for assigning a notion of the distance between two objects on the plane. By treating the spatial coordinates as a row and column of a matrix, the centre of mass of an object area can be calculated, and the obtained position is used as a reference for calculating a distance. A drawback of this approach is in leading to awkward results for elongated, curved shapes, in that usually the centres lie outside of an object. Those problems are circumvented by using a variation of the Hausdorff distance (Venugopal et al., 2005), which measures the maximum distance from a point in one set to the nearest point in other set. In general, a threshold has to be set in such manner that maximum distance between the objects under consideration remains close enough (compared to other forecast objects within a range) for further inspections. The choice of a threshold more specifically depends on a type of data and on a judgment made by an analyst. For synoptic scale data sets, 400 km may be a reasonable separation, whilst for local convective events 50 km is too much. If the two objects overlap, the minimum distance is zero under this definition. Typically, there is more than one forecast object within a range. In some cases, a forecast CRA may be found within a range of more than one observed object. In order to make a matching unique, the approach presented by Ebert and McBride (2000) is slightly modified. Once a pair of observed and forecast objects is selected, we calculate one of the classical statistics such as root mean square error, and a correlation coefficient. Next we shift the forecast object over the observed object in order to minimize an error (maximize a correlation). We choose the one with the best score by examining all the statistics obtained in respect to all the observed objects within a range.

Verification statistics. A first step for verification is to look for matched pairs. Once a pair is identified, we can compute a set of errors that measures the model performance. The set leads to a categorical contingency table. Typically, not all observed objects find a corresponding pair. Those which are left out are called missed events. It may also happen that not all forecast events are matched. The latter are called false alarms. The matched objects are called hits. Note that the category of correct rejections doesn't make sense in this setting.

In Fig. 4 statistics estimated from all forecasts produced by the Unified Model (UK Met. Office) in July 2009 over Poland are presented. Inspection of errors counted separately for each forecast hour does not show any impact of forecast lead time on the shift and displacement errors. The total mean square error has a diurnal cycle in the first and the second 24 hours of forecast. Some evidences for higher errors during convection development in a second day of a forecast may be noticed.

5.1.1 Experiments In what follows we refer to the procedure described above (or its part) as the method or the algorithm. Ensemble forecast: An ensemble forecast (EF) is generated by a number of initializations. The latter is prepared by perturbations to an initial input data and the model parameters. Due to an uncertainty in the input data as well as inherent model imperfections, all ensemble members may be considered equally probable. The ensemble forecast field is then defined by a mean of ensemble members. Suppose that an ensemble has N members and let Fn

1

1 .N

nn=

E = FN ∑

, n = 1, 2,…. , N denote their matrices of sizes k × m. Then an ensemble-averaged forecast is given by

(3)

Page 15: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

15

When data is for precipitation, all matrices have non-negative values. Rain/no-rain threshold R is applied to matrices Fn

The above consideration leads to a different approach for a stability of perturbations. Instead of all entries of fields, we use CRAs as a criterion for an occurrence of an event (e.g., rainfall). Those identified objects are considered robust against perturbations on an initial condition, when they are present in most of ensemble members. They are expected to lie in relative vicinity of the object averaged as an ensemble mean. Thus objects form different ensemble members identify themselves as a same object under a perturbation. Once objects are grouped together, a probability of an event can be assigned by checking a fraction of members that predicted a given object. If a majority of the members predict certain event, this event is considered to be robust. In this way we can specify likelihoods (probabilities) of specific events based on an ensemble forecast. This method may be particularly useful in verification of rare but intensive event, with a given measure of reliability.

. We define a rain occurrence by E[i,j] ≥ R for any i and j. This condition may be satisfied even when a majority of members shows no rain, but values of some members are high enough. Clearly, this method does not fully reflect a probabilistic aspect of ensemble prediction. Intuitively, if a majority of members shows that a given entry of a matrix has a value beyond a threshold, we should assign higher probability for an actual occurrence with above threshold. Thus, we may modify a process of obtaining an ensemble forecast by first deciding how likely the occurrence is and then re-assigning a mean value to it. In the example above, we still have an occurrence but with lower confidence in respect to a given confidence level.

Clustering: The CRA algorithm enables us to isolate and describe an individual object in F/O fields. At the same time, many of the objects can be a part of larger-scale structures such as mesoscale organizations or synoptic-scale fronts, which are of interest in their own right. Furthermore, large-scale dynamics may provide a pre-condition for defining a drift bias to the CRA algorithm. In order to reunite objects isolated from F/O fields we introduce a clustering algorithm which collects together those objects, which satisfy proximity criteria. In this way, we can create two sets of non-overlapping clusters of CRAs, which may be subject to comparison for e.g. total precipitation, area, maximum precipitation. Time evolution of CRAs: An interesting question concerns a model's performance in respect to a forecast period. Clearly a reliability of forecast would decrease with time. However, a peak of performance is not at a beginning of a forecast period either due to a spin-up of a model. In order to identify an optimal time lag and determine a reliability profile, we need to compare verification results from nearby neighbouring times in a consistent manner. A direct solution to the problem would be to draw statistics in respect to time. A more sophisticated method requires, for example, tracking of objects in time. When objects in forecast and observed fields are examined in time, continuity in evolution of objects would be seen as an object appears, develops, and disappears. This means that when two objects are matched together at a certain time, their immediate future states (for next few snapshots) should be comparable as well. The CRA algorithm, however, tries to minimize errors for every snapshot separately. A procedure is still to be developed, which would assess model performance with respect to a evolution of CRAs or clusters in time.

Another important aspect is a possibility for extending the Ebert-McBride error minimization procedure to space-time domain. The current algorithm employs a rigid shift in order to find the minimal error. However, by extending a verification domain to both space and time, a

Page 16: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

16

more flexible procedure would become possible by adding a dimension of errors in time. Under this generalization, not only an object can be shifted in space but also in time. Correction of drift: Among the outputs of the CRA algorithm is a vector field showing mutual dislocations of respective objects. Since a pairing algorithm doesn't take into account of any additional conditions concerning a general behaviour of the forecast, this dislocation field should not display any particular tendency. The atmospheric circulation, however, tends to displays a local persistence. A forecast failure of a synoptic-scale circulation results in a tendency for dislocation of CRAs. A way of introducing a tendency in pairing phase is to impose a constraint on a search region so that an observed object can look for a matching forecast object in a preferred sector. We can introduce the two methods for obtaining bias for this algorithm. The first method utilizes tracking mentioned above. Let F and O be matched objects at time t. A dislocation vector is a minimization of a function Error(F+v,O), where Error is a measure of discrepancy of our choice e. g, MSE or a correlation coefficient. We use a tracking algorithm to find O', observed objects at next state. A dislocation vector is then calculated for O and O' i.e. minimization of Error(F+v,O) yields a local tendency in an observed field. Likewise, we calculate a local tendency of forecast field. When vectors are calculated for all the objects admitting at least one future state, we can compare evolution of resulting vector fields.

5.2 The SAL technique The SAL technique (Wernli et al. 2008, 2009) provides a three-component feature-based quality measure for QPF, in which the three components quantify a forecast of a feature in terms of amplitude (A), location (L), and structure (S). The method is object based in the sense that precipitation objects are identified within a verification domain in order to determine S and L for continuous precipitation areas exceeding a fixed or statistically-defined precipitation threshold. However, SAL does not request one-to-one matching between the objects, which are identified separately for the observed and forecast fields.

The A component represents a normalized difference between the domain-averaged QPF and the domain-averaged observations. A positive value of A indicates an overestimation of predicted total precipitation; a negative value indicates an underestimation. The value of A is in the range [−2, 2] and 0 corresponds to a perfect forecast for a system-averaged precipitation intensity. The L component combines information about the predicted precipitation mass center and the error in a weighted-average distance between the precipitation objects' center of masses. It consists of two parts: L=L1 + L2. The L1 measures the normalized distance between the mass centers of the modeled and observed precipitation fields; the values range for [0, 1]. The value of L1= 0 indicates that the mass centers of the predicted and observed precipitation fields are identical. The L2 measures an averaged distance between the mass centers of the total precipitation fields and individual precipitation objects. The value of L2 is in the range [0, 1]. As a whole, L spans between 0 and 2. The S component compares the volumes of the normalized precipitation objects. It provides information about the size and shape of precipitation objects. The range of S is [−2, 2] ; a positive value occurs when the precipitation objects are too large or too flat, and a negative value when the objects are too small and too localized.

Page 17: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

17

A perfect QPF is characterized by zero values for the all three SAL components. The SAL provides information about the systematic differences in the performance of the two NWP models. It was applied in numerous verification studies, as found in Wernli et al. (2009) for example.

5.2.1 Application of SAL at the Hungarian Meteorological Service At the Hungarian Meteorological Service (HMS), the SAL technique has been applied since the end of 2010 (the original R code was developed by the Royal Meteorological Institute of Belgium). It is used both for the evaluation of model development and for the routine verification of operational NWP models.

For the SAL verification an operational radar product of HMS for accumulated rainfall was used. This product was a composite radar image which was produced from the reflectivity fields of the three radars operated by HMS. First, column maximum reflectivity fields (from 10 elevation angles) were produced for each of the three radars, then the composite image was calculated by taking the maximal value out of the three radar column maximum fields for each pixel. The domain of the composite radar image covered a slightly larger domain than Hungary with a spatial resolution of 2 km x 2 km. A composite column maximum reflectivity image was produced every 5 minutes. In making the accumulated rainfall product, a spatio-temporal interpolation method (Li et al., 1995) was applied to the composite images to produce interpolated images at 1 minute intervals. With the use of 1 minute images the unrealistic rainfall amounts of fast propagating convective cells could be avoided. From the interpolated column maximum composite reflectivity fields the precipitation intensity was calculated with the Marshall-Palmer relationship. The radar rainfall product was corrected with surface rain gauge measurements only for the 12 hourly and 24 hourly accumulation period. When evaluating NWP forecasts with the SAL technique, three hourly accumulation period was chosen. This radar product was not corrected, however, because the correction of three hourly radar amounts was still under development. The following example shows the usefulness of SAL at development of the AROME non-hydrostatic model specifically for the evaluation of an impact of different boundary layer parameterizations on the resolved deep convection.

Two different model configurations were compared. For the first configuration (referred to as the no_EDKF run), boundary layer turbulence and shallow convection were parameterized separately; the CBR scheme (Couxart et al., 2000) was adopted for turbulence parameterization, whilst the Kain-Fritsch scheme (Kain and Fritsch, 1990) for shallow convection. For the second configuration (referred to as the EDKF run), the Eddy-Diffusivity – Mass-Flux approach, as originally proposed by Soares et al. (2004) and modified by Pergaud et al. (2009), was used, which describes boundary layer turbulence and shallow convection together. For the evaluation of the two AROME versions, a one-month period from 17th July 2010 to 17th

Figure 5 shows precipitation forecasts with the two versions of the AROME model and the radar observation together with the defined SAL objects (objects were defined using a dynamic threshold equal to 1/15 of the precipitation maximum over the domain) for a selected day. The no_EDKF run seriously overestimates the number of convective cells, whilst the cell number for the EDKF run is much closer to reality. For both model runs, the size of the cells was smaller than in observed and the maximum intensity within a cell was overestimated (i.e. objects were too “peaked” in the model). The reason for the difference between the two

August 2010 was taken.

Page 18: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

18

AROME versions was the fact that the EDKF enhanced mixing in the boundary-layer during unstable conditions compared to the no_EDKF version due to the non-local (mass-flux) transport added in the EDKF scheme. This enhanced mixing prevented the pile-up of warm air close to the surface and hampered the generation of small convective cells in the early afternoon hours.

The three components of SAL verification can be visualized by the so-called SAL-plot (Fig. 6). Figure 6 proves that the SAL-plots for the two different AROME runs are very similar. The domain averaged precipitation (A component) is very well forecast whilst the size of the precipitation objects (S component) is underestimated (also seen in Fig. 5).

It is interesting to compare the SAL results of AROME to hydrostatic models as well. From the SAL-plots of the IFS model at ECMWF and of the operational ALADIN model run at HMS (Fig. 6) it is seen that the hydrostatic models with coarser resolution than AROME tend to overestimate the size of precipitation objects (positive S-values) while the simulated domain averaged precipitation was similar to the AROME model. However, importantly, the precipitation pattern is quite different between the two AROME runs, although it is not reflected in SAL plots. The reason for this is that the two versions mainly differ in the number of precipitation objects (Fig. 7). This characteristic is not reflected in the three SAL components, because only mean of objects is presented. The diurnal cycle of convection and precipitation is a key issue in operations. The domain-averaged diurnal cycle of precipitation can be compared between different model runs and the radar measurements by taking the A component of SAL as a quantitative measure (Fig. 8). As shown in previous studies (e.g. Bechtold et al., 2004; Brockhaus et al., 2008), the hydrostatic models with a parameterized deep convection (ECMWF/IFS, ALADIN, ALARO in Fig. 8) tend to initiate convection too early, whilst non-hydrostatic models (two versions of AROME in Fig. 8) are quite accurate with the timing of convective precipitation.

Unfortunately, the classical SAL method is not suitable for the verification of extreme values in simulated precipitation, e.g., the maximum precipitation intensity within an object. New verification scores were introduced in HMS in order to enhance the SAL information in this respect:

• averaged intensity of the three strongest objects

• averaged maximum intensity of all objects

• averaged maximum intensity of the three strongest objects In order to compare the performances of two different NWP models (or two experiments), these statistics are plotted as a function of lead time moving averaged with time. As an example, verification scores of two experiments with AROME are shown in Fig. 9. This experiment tested the sensitivity of the AROME forecast to an initial surface state of the model over a one-month summer period. In the first experiment (green line in Fig. 9), the initial surface was interpolated from the ALADIN model, whilst in the second experiment (red line), an own surface data assimilation cycle was run for AROME. For both experiments, the same upper air analysis (3DVAR) was used. Figure 9 shows that the forecast with an own AROME surface assimilation predicts the averaged maximum intensity of all convective cells well, although the intensity of the strongest cells is overestimated. In this respect, the experiment using ALADIN surface performs better. This example highlights one peculiar feature of this object based precipitation verification method, namely, different scores (averaged and maximum cell intensity in this case) could

Page 19: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

19

favour different experiments, making it could be difficult to single out the best prediction. The vector score rather helps us to identify certain strengths and weaknesses of the each forecast. Importance of each score component depends on a purpose of the forecast. For instance, performance of the forecasted cell averaged precipitation would be more important for hydrological applications (i.e. when coupling the NWP model to a hydrological model) whileas performance of the cell maxima prediction would be more relevant for aviation meteorology.

5.3 Fractions skill score – QPF from different models Use of the fractions skill score (FSS) belongs to the (a1) category of neighbourhood (or fuzzy) verification techniques and applies the strategy of ‘neighbourhood observation - neighbourhood forecast’. The FSS compares the fractional coverage of events (occurrences of rainfall values exceeding a certain threshold) over a window surrounding the observation and the forecast (see Roberts, 2005; Roberts and Lean 2008; Mittermaier and Roberts 2009). When only a spatial window is considered, we may call the window the elementary area (EA). According to Ebert (2008), FSS is defined by:

1 2

1 2 1 2

( )1 ,F ON

F ON N

N P PFSS

N P N P

− −

−= −

+∑

∑ ∑ (4)

where PX

The FSS spans for [0, 1] with 0 for a complete forecast mismatch and 1 for a perfect forecast. The FSS is zero if forecast does not exceed the threshold but observation does, or if threshold exceeding values are forecasted but not observed. As the size of EA used to evaluate the fractions becomes larger, the score asymptotes to a value that depends on the ratio between the forecast and observed frequencies of an event, i.e. the closer the asymptotic value is to 1, the smaller the forecast bias. The score is sensitive to rare events or small-rainfall areas.

(for X = F,O; F — forecast, O — observation) is a fraction of the elementary area (EA) covered by the rainfall that exceeds a given threshold, and N is a number of grid points in the verification domain. The expression in the numerator is a version of the fraction Brier score (FBS), in which fractions are compared. The denominator gives the worst possible FBS, in which there is no overlap of non-zero fractions.

The FSS depends on EA size and a precipitation threshold. Roberts and Lean (2008) show that when the EA size is increased, the FSS increases until it reaches an asymptotic value of fractions skill score (AFSS). If there is no bias, it asymptotes to AFSS = 1. If there is a bias, then AFSS is linked to the conventional frequency bias f0/fM, where f0 is the fraction of observed points exceeding the threshold over the domain, and fM is the model-forecast frequency. The two reference FSS values are considered in (Roberts and Lean, 2008). The first assumes FSSrandom = f0, which is obtained from a random forecast with a mean fractional coverage f0. The second assumes FSSuniform= 0.5+ f0/2, which is obtained at the grid scale from a forecast with a fraction equal to f0 at every point. Whereas the random forecast has low skill unless the f0 is large, the uniform forecast is always reasonably skilful. The scale, corresponding to the FSSuniform, represents the smallest scale over which the forecast output contains useful information. The largest scale over which output should be presented becomes a compromise between user requirements, cost effectiveness, and forecast skill.

Page 20: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

20

In this manner, we can compare the performance of different models by taking the scales for which the FSS exceeds FSSuniform

The Fig. 10 plots a fraction of the 56 forecast results, which gives FSS values larger that the FSS

. An example is shown in the Fig. 10, which presents a FSS verification of forecasts of heavy local convective rainfalls that caused flash floods over the Czech Republic in 2009 (Zacharov et al. 2013). In this study, a traditional verification, SAL technique, and FSS verification were all considered. The events occurring between 22 June and 5 July 2009 were analysed and simulated by COSMO (as adapted at IAP for covering the Czech Republic) and ALADIN. The non-hydrostatic model COSMO was running without a cumulus parameterization with horizontal resolution of 2.8km (the C3 in Fig. 10) and with cumulus parameterization and horizontal resolution of 7km (the C7 in Fig. 10). The operational hydrostatic model ALADIN in 2009-configuration was running with 9 km resolution (the A9 in Fig. 10). At present, 4.7 km resolution is used and the results are obtained by running for the 2009 flash flood period at the Czech Hydro-meteorological Institute (the A5 in Fig. 10). The quantitative precipitation forecast of 3h rainfalls was verified by gauge-adjusted radar data from the Czech radar network CZRAD (Novak, 2007). We used the operational CHMI product MERGE (Salek 2010) which provided the gauge adjusted 3h rainfalls in the radar pixels covering the Czech territory with the resolution of 1 km × 1 km.

uniform. In other words, it gives a relative number of forecasts leading to useful forecasts (FSS > FSSuniform

The analysed episode with heavy convective precipitation resulted in flash floods in several parts of the Czech Republic in 2009. Beginning of the period was marked by a front crossing the Czech Republic. A larger scale precipitation associated with this front was generally forecasted well. However, the precipitation was difficult to forecast over the final period due to chaotic thermic convection. At present a second verification of the Aladin forecasts is under way, which tests the QPFs from the 2009-period obtained after a modification of operational model. Preliminary results indicate a significant improvement especially in the days with smaller scale convection without any synoptic forcing.

). The plots show a change in forecast performance with the precipitation threshold. The C3 results are the least accurate for the threshold of 1 mm/3h, and the remaining models yield nearly the same forecast performance. The C3 curve behaviour for a wider EA corresponds to a under prediction of lower precipitation rates during a run. With an increasing threshold, the C3 begins to improve its relative performance, and with the threshold of 10 mm/3h, the C3 achieves the best performance in terms of FSS.

The simultaneous use of a traditional verification, SAL technique, and FSS verification in the study by Zacharov et al. (2003) and in the following analyses showed that the use of several verification techniques was useful. Each technique evaluated the precipitation forecast in a different way, and thus, different information about the forecasts could be obtained. This was especially true for the HR QPF, where the double penalty problem occurred extensively.

6 Use of dual polarization radars in microphysics verification There are several techniques available for a retrieval of hydrometeor categories from polarimetric radar (PR) measurements. They include the decision tree methods based on physical rules and threshold values of PR characteristics; the application of fuzzy logic principles (Mendel, 1995; Vivekanandan et al., 1999) has been recognised as the most suitable for hydrometeor classification by PR variables. The tree methods (classification techniques) use threshold values in phase space of several PR characteristics. For instance one of the first studies based on threshold values examined the

Page 21: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

21

microphysical structure of a supercell storm occurred near Munich on 30 June 1990 (Höller et al., 1994). The PR variables, ZDR and LDR, together with a height of the melting layer were used for an empirical interpretation for nine microphysical categories. The verification against observation was focused mainly on hail occurrence and the results showed consistency with the storm dynamics and proved to be useful for hail detection.

Straka et al. (2000) provided an extended review on a range of PR variables, which can be expected for the identification of several microphysical categories (hail, graupel, rain, wet hail mixture, snow crystals, and aggregates). The temperature was also included in classification in order to avoid some obvious unphysical situations that cannot be avoided otherwise. For example, ice crystals would not be expected above 15°C, and rain would not be expected below -30°C.

Vivekanandan et al. (1999) also reported many classification studies based on thresholds, or hard boundaries of polarimetric parameters. Based on this review, they proposed a fuzzy approach as more suitable for the retrieval of information about hydrometeor categories from a set of PR variables. It is known that use of hard boundaries can lead to a wrong classification due to an overlap between PR for various precipitation types. Fuzzy boundaries between polarimetric observables are best considered under a fuzzy logic (Mendel, 1995) that enables a smooth transition in polarimetric observable boundaries among precipitation types. Use of neural networks (NN: e. g. Haykin, S., 1994) was also considered. Although powerful, the NN approach needs a training set of considerable size for verifications, which is difficult to attain (e.g. Straka, 2000). For example, a NN-based method was applied to large data sets of radar and surface observation by Vulpiani et al. (2009). Point-wise estimates of hourly rainfall accumulations and instantaneous rainfall rates by NN using parametric polarimetric rainfall relations were compared with dense surface gauge observations. Liu and Chandrasekar (2000) proposed a NN system in combination with a fuzzy logic classification. The performance of a fuzzy classification depends critically on a shape of the so called membership functions, which enter a “fuzzification” component and convert measured values to fuzzy sets with a different membership degree. In their study, a hybrid neuro-fuzzy system was proposed, in which a training algorithm for NN was also used to determine the parameters for a fuzzy logic. The state of art of using PR measurements for retrieving microphysical structures is reviewed by Chandrasekar et al. (2013). The review declared a large progress in hydrometeor classification over the last decade thanks to the fuzzy logics introduced to precipitation classification systems. The implementation procedures expanded from point-wise classification to areal analysis by using texture information. At present, the integrated weather radar classification systems include the three aspects: data quality, echo classification, and hydrometeor identification. The review concluded that “the hydrometeor classification topic presents exciting future research opportunities, and is likely to remain active for a long time.” In their study, Straka et al. (2000) summarized scientific and operational reasons for deducing hydrometeor types from PR data. Verification of microphysical parameterizations in NWP models, and the QPF verification were also included in the set of reasons. At present, many studies focus on verification of the polarimetric identification of microphysical categories by various direct measurements. Use of PR variables for quantitative verification of the HR NWP models is still limited. Majority of verification studies is limited to qualitative comparison of retrieved and modelled microphysics.

Several studies used data collected during Intensive Observing Periods (IOPs) of the MAP to evaluate cloud microphysical structures during heavy precipitation events. The non-

Page 22: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

22

hydrostatic Meso-NH model was run for several MAP events, and the simulated microphysical structure was compared with cloud microphysical retrievals from NCAR S-band dual polarized (S-Pol) radar data (Pujol et al., 2005; Lascaux et al., 2006; Pujol et al. 2011). Simulations made use of a bulk microphysics scheme to predict the time evolution of the six water species (vapour, cloud droplets, raindrops, pristine ice, snow/aggregates, and frozen drops/graupel) included in Meso-NH ICE3 scheme. The scheme was later extended to account for hail (Meso-NH ICE4 scheme). Identification of hydrometeor classes from polarimetric variables used the NCAR algorithm (Vivekanadan et al., 1999) together with the determination of 0°C level from radio sounding. Time evolution of retrieved microphysical classes was compared with simulated microphysics. Detailed microphysical analysis was focused on microphysical processes leading to the development of heavy precipitation in Lago Maggiore region. In the study by Jung at al. (2012), observed polarimetric variables were compared with polarimetric signatures simulated by NWP model ARPS in order to assess the ability of the single moment and dual moment microphysics parameterizations to reproduce observed polarimetric signatures. The study analysed a tornadic thunderstom occurred in May 2004 in central Oklahoma. An ensemble Kalman filter technique was used for assimilation of data from one single polarized radar, whilst observations from another single polarized radars and a dual polarized radar were used for verification. The both microphysics parameterizations were able to well reproduce the observed reflectivity fields. A comparison simulated and observed polarimetric signatures showed a better agreement for double moment microphysics parameterization. A comprehensive verification study using satellite and radar data was performed by Pfeifer et al. (2010). A quantitative verification was applied to precipitation forecasts produced by the COSMO-DE model of the German weather service, with a horizontal resolution of 2.8 km. Two cases with heavy precipitation were analysed and compared with observed data under a model-to-observation approach using a forward operators (cf., Sect. 2.5). In addition to simulated satellite data, the polarimetric radar forward operator (SynPolRad: Pfeifer et al., 2008) was adapted. Verification also applied traditional skill scores such as the root mean square error, probability of detection, the frequency bias, the false alarm ratio, and the Heidtke skill score. Furthermore, the fractions skill score (Roberts and Lean, 2008) was included.

7 Summary and conclusions Radar-based verification data and techniques for QPF verification are critically reviewed. In order to verify the HR QPF performance, we need to identify a suitable verification technique together with verification data, which correspond to the forecast variables and forecast horizontal resolution. The horizontal resolution of radar data, which is of the order of 1km, corresponds to the horizontal resolution of present operational NWP models. This is the main reason why the radar based QPE has become a relevant tool for QPF verifications. The single polarization radar provides QPE values based on a Z-R relation. A suitable merging of rough radar rainfalls with ground precipitation measurement offers a verification data set with sufficient spatial and temporal resolution. Radar based QPE is difficult to obtain over a mountain terrain and when a gauge density is low. Over the last decade, the dual polarization radar has grown into an operational technology in meteorological services of many countries. Polarimetric characteristics provide

Page 23: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

23

multidimensional information. When properly evaluated, polarimetric variables can not only improve the rain rate estimate but also inform about the structure of cloud microphysical categories in time and space.

Different verification techniques evaluate the QPF in different manner, providing different information about the forecast performance. Traditional verification are based on the gridpoint information. The uncertainty inherent in the QPE and also in verification data makes the gridpoint verification inappropriate. On the contrary the spatial verification techniques correspond well to the horizontal resolution of the both HR QPF and radar based QPE and they take into account the QPF uncertainty.

It is important to examine all the aspects of a forecast in order to well reveal its strengths and weaknesses, e.g., a spatial distribution of high rainfalls. Use of spatial methods is particularly worthwhile for HR forecasts that may have difficulty in predicting localisation of high precipitation areas. A simultaneous use of multiple verification techniques is recommended for modellers. In summary, we expect the following for coming years:

● More studies comparing polarimetric and ground based rainfall values and using the polarimetric QPE in HR QPF verification.

● Increasing applications of spatial techniques in modeller-oriented verifications. The spatial techniques are able to take into account the QPF uncertainty and to reflect various aspects of forecast performance. ● Use of spatial verifications in regional ensemble forecasts, especially in quantitative diagnosing the forecast uncertainties. ● Increasing applications of quantitative verification in microphysical studies: For this purpose, the polarimetric parameters are primary verification data, which need to be inverted to the information for cloud microphysical structures. An opposite approach of comparing model generated polarimetric information with measurements is also plausible. Importance of understanding physical uncertainties in a model may be emphasized. In the present review, we have shown how the contributions of the boundary-layer turbulence processes in forecast can be inferred from a SAL-based analysis. We emphasize a need for developing more physically-based verification methods for such assessments more direct. Such a development is especially crucial in fully exploiting multidimensional information obtained from polarimetric radar in future. In longer terms, needs for probabilistic quantifications of the forecast should be emphasized, as already suggested at several places in the review (cf., Sects. 3, and 5). We especially refer to Jayens (2003) for the basics of the probability as an objective measure of uncertainties. From a point of view of a fundamental probability theory, the goal of the model verification would be to reduce the model uncertainties by objectively examining the model errors. In order to make such a procedure useful and effective, forecast errors and model uncertainties must be linked together in direct and quantitative manner. Unfortunately, many of the statistical methods found in general literature are not satisfactory for this purpose. The Bayesian principle (op. cit.) is rather an exception that can provide such a direct link so that from a given forecast error, an uncertainty associated with a particular parameter in parameterization, for example, can objectively and quantitatively estimated by invoking the so-called Bayesian theorem. The principle also tells us that ensemble, sample space,

Page 24: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

24

randomization, etc. as typically invoked in statistical methods are not indispensable ingredients for uncertainty estimates, although they may be useful. Such objective probability-based verification procedures are critically required, especially, in the convection parameterization problem, such as for identifying the closure hypothesis to adopt (Yano et al., 2013). Though such a systematic procedure is still to be developed, it is our conviction that the spatial verification techniques reviewed here are going to be a basis for such systematic model verification algorithms.

Acknowledgements The present work was performed under a framework of the COST Action ES 0905 as a part of activities of WG4 Physics and Observations. The authors thank to all ES 0905 participants who commented the text.

References Ahijevych, D., Gilleland, E., Brown, B. G. and Ebert, E. E.: Application of spatial verification methods to idealized and NWP gridded precipitation forecast. Wea. Forecasting, 24, 1485-1497, 2009.

Anagnostou, M. N., Kalogiros J., Marzano, F. S., Anagnostou, E. N., Montopoli, M. and Piccioti, E.: Performance Evaluation of a New Dual-Polarization Microphysical Algorithm Based on Long-Term X-Band Radar and Disdrometer Observations. J. Hydrometeor., 14, 560-576, 2013.

Atger, F.: Verification of intense precipitation forecasts from single models and ensemble prediction systems. Nonlin. Proc. Geophys., 8, 401-417, 2001.

Bechtold, P., Chaboureau, J. P., Beljaars, A., Betts, A. K., Köhler, M., Miller, M. and Redelsperger, J. L.: The simulation of the diurnal cycle of convective precipitation over land in a global model. – Quart. J. Roy. Meteor. Soc., 130, 3119–3137, 2004. Briggs, W. M. and Levine, R. A.: Wavelets and field forecast verification. Mon. Wea. Rev., 125, 1329–1341, 1997. Bringi, V. N. and Chandrasekar V.: Polarimetric Doppler weather radar. Principles and applications. Cambridge University Press, 2001. Brockhaus, P., Lüthi, D. and Schär, C.: Aspects of the diurnal cycle in a regional climate model. Meteorologische Zeitschrift, 17 (4), 433-443, 2008. Casati, B., Ross, G. and Stephenson, D. B.: A new intensity-scale approach for the verification of spatial precipitation forecast. Meteorol. Applications, 11, 141-154, 2004. Casati, B., Wilson, L. J.: A New Spatial-Scale Decomposition of the Brier Score: Application to the Verification of Lightning Probability Forecasts. Mon. Wea. Rev., 135, 3052-3069, 2007.

Casati, B., Wilson, L. J., Stephenson, D. B., Nurmi, P., Ghelli, A., Pocernich, M., Damtrath, U., Ebert, E. E., Brown, B. G. and Mason, S.: Review. Forecast verification: current status and future directions. Meteorol. Applications, 15, 3-18, 2008.

Page 25: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

25

Cifelli, R. and Chandrasekar, V.: Dual-Polarization Radar Rainfall Estimation, in Rainfall: State of the Science. eds F. Y. Testik and M. Gebremichael, Amer. Geophys. Union, 105-125, 2010.

Cifelli, R., Chandrasekar, V., Lim, S., Kennedy, P. C., Wang, Y. and Rutledge, S. A.: A New Dual-Polarization Radar Rainfall Algorithm: Application in Colorado Precipitation Events. J. Atmos. Oceanic Technol., 28, 352–364, 2011. Collier C. G.: Applications of Weather radar Systems. A guide to uses of radar data in meteorology and hydrology. 2nd Edition, John Wiley&Sons, Chichester, 1996. Collier, C. G., Hawnt R. and Powell J.: Real time adjustment of radar data for water management systems using a PDF technique. The City RainNet Project. in: Proceedings of the 6th European Conf. on radar in meteorology and hydrology. Advances in Radar Applications. Sibiu, Romania, 6-10 September 2010, 1-6, 2010. Cuxart, J., Bougeault, P. and Redelsperger J. L.: A turbulence scheme allowing for mesoscale and large-eddy simulations. Quart. J. Roy. Meteor. Soc., 126, 1–30, 2000. Damrath, U., Doms, G., Frühwald, D., Heise, E., Richter, B. and J. Steppeler: Operational quantitative precipitation forecasting at the German Weather Service. J. Hydrol., 239, 260–285, 2000.

Davis, C., Brown, B. and Bullock, R.: Object-based verification of precipitation forecasts. Part I: methodology and application to Mesoscale Rain Areas. Mon. Wea. Rev., 134, 1772-1784, 2006a. Davis, C., Brown, B. and Bullock, R.: Object-based verification of precipitation forecasts. Part II: application to convective rain systems. Mon. Wea. Rev,. 134, 1785-1795, 2006b. Davis, C. A., Brown, B. G., Bullock, R. and Halley-Gotway, J.: The Method for Object-Based Diagnostic Evaluation (MODE) Applied to Numerical Forecasts from the 2005 NSSL/SPC Spring Program. Wea. Forecasting, 24, 1252–1267,

Davis, C. and Carr, F.: Summary of the 1998 Workshop on Mesoscale Model Verification. Bulletin of American Meteorological Society, 81, 809–819, 2000.

2009.

Doviak, R. J. and Zrnic, D. S.: Doppler radar and weather observations. Academic Press, 1984; Second edition, Dover Publications Inc., ISBN 0-486-45060-0, 2006.

Ebert, E. E.: Fuzzy verification of high-resolution gridded forecasts: a review and proposed framework. Meteorol. Applications 15, 51-64, 2008.

Ebert, E. E.: Neighborhood verification: A strategy for rewarding close forecasts. Wea. Forecasting, 24, 1498-1510, 2009.

Ebert, E. E.: WWRP/WGNE Joint Working Group on Forecast Verification Research. available at https://www.wmo.int/pages/prog/arep/wwrp/new/documents/Ebert.ppt

Ebert, E. E. and McBride, J. L.: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol. 239, 179–202, 2000.

‎. last access: 20 August 2013, 2011.

Ebert, E., Wilson, L., Weigel, A., Mittermaier, M., Nurmi, P., Gill, P., Goeber, M., Joslyn, S., Brown, B., Fowler, T. and Watkins, A.: Progress and challenges in forecast verification. Meteorol. Applications, 20, 130-139, 2013.

Page 26: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

26

Gabella, M., Joss, J., Perona, G. and Galli, G.: Accuracy of rainfall estimates by two radars in the same Alpine environment using gage adjustment. J. Geophys. Res. 106-D6, 5139-5150, 2001.

Giangrande, S. E.: Investigation of polarimetric measurements of rainfall at close and distant ranges. Ph.D. thesis The University of Oklahoma, ProQuest, UMI Dissertations Publishing, 3291249, Gilleland, E., Ahijevych, D., Brown, B. G., Casati, B. and Ebert, E. E.: Intercomparison of Spatial Forecast Verification Methods. Wea. Forecasting, 24, 1416-1430, 2009.

247 pp. 2007.

Gilleland, E., Lindstrom, J. and Lindgren, F.: Analyzing the Image Warp Forecast Verification Method on Precipitation Fields from the ICP. Wea. Forecasting 25, 1249-1262, 2010.

Gjertsen, U., Salek, M. and Michelson, D. B., 2004: Gauge-adjustment of radar-based precipitation estimates. COST 717: Use of radar observation in hydrological and NWP models. EUR 21363EN, ISBN 92-898-0000-3, 2004. Haase, G., and Crewell, S.: Simulation of radar reflectivities using a mesoscale weather forecast model. Water Resource Research, 36, 2221–2231, 2000. Hagen, M. and Yuter, S.A.: Relations between radar reflectivity, liquid–water content, and rainfall rate during the MAP SOP. Quart. J. Roy. Meteor. Soc. 129, 477–493, 2003. Haykin, S.: Neural Networks: A Comprehensive Foundation. Mcmillan College Publishing Company, New York. 1994. Hoffman, R. N., Liu, Z., Louis, J. F. and Grassotti, C.: Distortion representation of forecast errors. Mon. Wea. Rev., 123, 2758–2770, 1995. Höller, H., Hagen, M., Meischner, P. F., Bringi, V. N., Hubbert J.: Life Cycle and Precipitation Formation in a Hybrid-Type Hailstorm Revealed by Polarimetric and Doppler Radar Measurements. J. Atmos. Sci., 51, 2500–2522, 1994.

Illingworth, A.: Improved Precipitation Rates and Data Quality by Using Polarimetric Measurements, in: Weather radar, Principles and Advanced Applications. Meischner, P. (Ed.) Springer-Verlag, 130-166, 2004. Illingworth, A. J., and Blackman, T. M.: The need to represent raindrop size spectra as normalized gamma distributions for the interpretation of polarization radar observations. J. Appl. Meteor., 41, 286–297, 2002.

Jolliffe, I. T., Stephenson, D. B.: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley and Sons: Chichester, UK; 2003.

Jaynes, E. T.: Probability Theory, The Logic of Science, Cambridge University Press, Cambridge, UK, 2003.

Jung, Y., Xue, M., Tong, M.: Ensemble Kalman Filter Analyses of the 29–30 May 2004 Oklahoma Tornadic Thunderstorm Using One- and Two-Moment Bulk Microphysics Schemes, with Verification against Polarimetric Radar Data. Mon. Wea. Rev., 140, 1457-1475, 2012.

Kain, J. S. and Fritsch, J. M.: A One-Dimensional Entraining/Detraining Plume Model and Its Application in Convective Parameterization. J. Atmos. Sci. 47, 2784–2802, 1990.

Page 27: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

27

Keil, C. and Craig, G. C.: A displacement-based error measure applied in a regional ensemble forecasting system. Mon. Wea. Rev. 135, 3248–3259, 2007. Keil, C. and Craig, G. C.: A Displacement and Amplitude Score Employing an Optical Flow Technique. Wea. ForecastingKracmar, J., Joss, J., Novak, P., Havranek, P. and Salek, M.: First steps towards quantitative usage of data from Czech weather radar network. in: Proccedings of the Final Seminar of COST-75: "Advanced Weather Radar Systems", Locarno, Switzerland, 23-27 March 1998, European Commission, Luxembourg, 91-101, 1999.

24, 1297–1308, 2009.

Lascaux, F., Richard, E. and Pinty J. P.: Numerical simulation of three different MAP IOPs and the associated microphysical processes. Quart. J. Roy. Meteor. Soc., 132, 190-1926, 2006.

Li, L., W. Schmid, J. Joss: Nowcasting of Motion and Growth of Precipitation with Radar over a Complex Orography. J. Appl. Meteor., 34, 1286–1300, 1995.

Liu, H. and Chandrasekar, V.: Classification of Hydrometeors Based on Polarimetric Radar Measurements: Development of Fuzzy Logic and Neuro-Fuzzy Systems, and In Situ Verification. J. Atmosph. Oceanic Technol. 17, 140-164, 2000. Marshall, J. S. and Palmer, W. M.: The distribution of raindrops with size. J. Meteor., 5, 165-166, 1948. Marsigli, C., Montani, A. and Paccagnella, T.: A spatial verification method applied to the evaluation of high-resolution ensemble forecasts. Meteorol. Applications, 15, 125–143, 2008. Marzban, C. and Sandgathe, S.: Cluster analysis for verification of precipitation fields. Wea. Forecasting, 21, 824–838, 2006. Mass C. F., Ovens D., Westrick K. and Colle B. A.: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteorol. Soc. 83, 407–430, 2002. McBride, J. L. and Ebert, E. E.: Verification of quantitative precipitation forecasts from operational Numerical weather prediction models over Australia, Wea. Forecasting, 15, 103-121, 2000.

Meischner, P. (Ed.): Weather radar. Principles and advanced applications. Springer-Verlag, ISBN 3-540-000328-2, 2004.

Mendel, J.: Fuzzy logic systems for engineering: A tutorial. Proc. IEEE, 83, 345–377, 1995. Morin, E. and Gabella, M.: Radar-based quantitative precipitation estimation over Mediterranean and dry climate regimes. J. Geophys. Res., 112, D20108, doi:10.1029/2006JD008206, 2007. Novak, P.: The Czech Hydrometeorological Institute’s severe storm nowcasting system. Atmos. Res., 83, 450–457, 2007.

Pergaud, J., Masson, V., Malardel, S. and Couvreux, F.: A Parameterization of Dry Thermals and Shallow Cumuli for Mesoscale Numerical Weather Prediction. Boundary-Layer Meteorol., 132, 83-106, 2009. Pfeifer, M., Craig, G. C., Hagen, M. and Keil, C.:. A Polarimetric Radar Forward Operator for Model evaluation. Journal of Applied Meteorology and Climatology, 47, 3202-3220, 2008Pfeifer, M., Yen, W., Baldauf, M., Craig, G., Crewell, S., Fischer, J., Hagen, M., Hühnerbein, A., Mech, M., Reinhardt, T., Schroeder, M. and Seifert, A.:. Validating precipitation forecasts

.

Page 28: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

28

using remote sensor synergy: A case study approach. Meteorologische Zeitschrift, 19, 6, 601-617, 2010. Pujol, O., Georgis J. F., Chong M. and Roux F.: Dynamics and microphysics of orographic precipitation during MAP IOP3. Quart. J. Roy. Meteor. Soc. 131, 2795-2819, 2005. Pujol, O., Lascaux, F. and Georgis J. F.: Kinematics and microphysics of MAP-IOP3 event from radar observations and Meso-NH simulation. Atmos. Res., 101, 124-142, 2011. Rezacova, D. and Sokol, Z.:. A diagnostic study of a summer convective precipitation event in the Czech Republic using a non-hydrostatic NWP model. Atmos. Res., 67, 559-572, 2003. Rezacova, D., Sokol, Z. and Pesice, P.: A radar-based verification of precipitation forecast for local convective storms. Atmos. Res., 83, 211-224, 2007. Rezacova, D., Zacharov, P. and Sokol, Z.: Uncertainty in the area-related QPF for heavy convective precipitation. Atmos. Res., 93, 238-246, 2009. Roberts, N. M.: Assessing the spatial and temporal variation in the skill of precipitation forecasts from an NWP model. Meteorol. Applications, 15, 163–169, 2008. Roberts, N. M. and Lean, H. W.: Scale-selective verification of rainfall accumulations from high resolution forecast of convective events. Mon. Wea. Rev. 136, 78–97, 2008. Roebber, P. J.: Visualizing multiple measures of forecast quality. Wea. Forecasting 24, 601–608, 2009. Salek, M., Novak, P. and Seo, D.: Operational application of combined radar and raingauges precipitation estimation at the CHMI, in: Proceedings of the European Conference on Radar in Meteorology and Hydrology (ERAD), Visby, Sweden, 6-10 September 2004, Vol. 2 of ERAD Publication Series, 16–20, 2004. Salek, M.: Operational application of precipitation estimate by radar and raingauges using local bias correction and regression kriging. in: Proceedings of the 6th European Conf. on radar in meteorology and hydrology. Advances in Radar Applications. Sibiu, Romania, 6-10 September 2010, 39-43, 2010. Schaeffer, J. T.: The Critical Success Index as an Indicator of Warning Skill, Wea. Forecasting 5, 570-575, 1990. Seo, D. J. and Breindenbach, J. P.: Real-Time correction of spatially nonuniform bias in radar rainfall data using rain gauge measurements. J. Hydrometeorol., 3, 93–111, 2001. Sevruk, B.: Niederschlag als Wasserkreislaufelement: Theorie und Praxis der Niederschlagsmessung. Zurich, 2004. ISBN 80-969343-7-6 Soares, P. M. M., P. M. A. Miranda, A. P. Siebesma, and J. Teixeira, 2004: An eddy--diffusivity/mass-flux parameterization for dry and shallow cumulus convection. Quart. J. Roy. Meteor., 130, 3365-3383.

Sokol, Z.: Utilization of Regression Models for Rainfall Estimates Using Radar-Derived Data and Rain Gauge Data. J. Hydrology, 278, 144-152, 2003.

Stanski, H. R., Wilson, L. J. and Burrows, W. R.: Survey of common verification methods in meteorology. World Weather Watch Technical Report No. 8, WMO/TD No. 358. WMO: Geneva, 114, 1989.

Page 29: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

29

Stephenson, D. B., Kumar, R. R., Doblas-Reyes, F. J., Royer, J.-F., Chauvin, F. and Pezzulli, S.: Extreme daily rainfall events and their impact on ensemble forecasts of the Indian monsoon. Mon. Wea. Rev., 127, 1954-1966, 1999.

Straka, J. M., Zrnic, D. S. and Ryzhkov, A. V.: Bulk hydrometeor classification and quantification using polarimetric radar data: Synthesis of relations. J. Appl. Meteor. 39, 1341–1372, 2000. Strangeways, I.: Precipitation: Theory, Measurement and Distribution, Cambridge University Press, 2007. Testud, J., Le Bouar, E., Obligis, E. and Ali-Mehenni, M.: The rain profiling algorithm applied to polarimetric weather radar. J. Atmos. Oceanic Technol., 17, 332–356, 2000. Theis, S. E., Hense, A. and Damrath, U.: Probabilistic precipitation forecasts from a deterministic model: a pragmatic approach. Meteorol. Applications 12, 257-268, 2005. Ulbrich, C. W.: Natural variations in the analytical form of the raindrop size distribution. J. Climate and Appl. Meteor., 22, 1764-1775, 1983. Venugopal, V., Basu, S., and Foufoula-Georgiou, E.: A new metric for comparing precipitation patterns with an application to ensemble forecasts. J. Geophys. Res., 110, D08111, 1-11, doi: 10.1029/2004JD005395, 2005.

Villarini, G. and Krajewski, W. F.: Review of the Different Sources of Uncertainty in Single Polarization Radar-Based Estimates of Rainfall. Surv. Geophys. 31, 107–129, 2010.

Vivekanandan, J. E., Brooks, M. , Politovich, M. K. and Zhang, G.: Retrieval of atmospheric liquid and ice characteristics using dual-wavelength radar observations. IEEE Trans. Geosci. Remote Sens., 37, 2325–2333, 1999. Vivekanandan, J. E., Zrnic, D. S., Ellis, S. M., Oye, R., Ryzhkov, A. V. and Straka, J.: Cloud microphysics retrieval using S-band dual-polarization radar measurements. Bull. Amer. Meteor. Soc., 80, 381– 388, 1999.

Vulpiani, G., Giangrande, S. and Marzano, F. S.: Rainfall Estimation from Polarimetric S-Band Radar Measurements: Validation of a Neural Network Approach. J. Applied Meteor. and Climatology, 48, 2022-2036, 2009. Wernli, H., Paulat, M., Hagen, M. and Frei, Ch.: SAL - A novel quality measure for the verification of Quantitative Precipitation Forecast. Mon. Wea. Rev., 136, 4470-4487, 2008. Wernli, H., Hofmann, Ch. and Zimmer, M.: Spatial Forecast Verification Methods Intercomparison Project: Application of the SAL Technice. Wea. Forecasting 24, 1472-1484, 2009.

Wilks, D. S.: Statistical Methods in Atmospheric Science, 2nd edition., International Geophysics Series 91, Academic Press, U.S.A., 2006.

Wilson, C.: Review of current methods and tools for verification of numerical forecasts of precipitation. Available at: http://www.smhi.se/hfa_coord/cost717/doc/WDF_02_200109_1.pdf, last access: 15 August 2043, 2001.

Wood, S., Jones, D., and Moore, R.: Accuracy of rainfall measurement for scales of hydrological interest. Hydrology and Earth System Sciences, 4, 531–543, 2000.

Page 30: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

30

WWRP 2009 – 1: Recommendations for the verification and intercomparison of QPFS and PQPFS from operational NWP models. WWRP/WGNE Joint Working Group on Verification, Revision 2, October 2008, WMO/TD No. 1485, Available at http://www.wmo.int/pages/prog/arep/wwrp/new/documents/WWRP2009_1.pdf, 2009. Yano, J.-I., Bister, M., Fuchs, Z.., Gerard, L., Phillips, V., Barkidija, S. and Piriou, J.-M.: Phenomenology of convection-parameterization closure. Atmos. Chem. Phys., 13, 4111-4131, 2013.

Zacharov, P. and Rezacova, D.: Using the fractions skill score to assess the relationship between an ensemble QPF spread and skill. Atmos. Res. 94, 684-693, 2009.

Zacharov, P., Rezacova, D. and Brozkova, R.: Evaluation of the QPF quality for convective flash flood rainfalls from 2009. Atmos. Res. 131, 95-107, 2013.

Zrnic, D. S., Ryzhkov, A., Straka, J., Liu, Y. and Vivekanandan J.: Testing a procedure for automatic classification of hydrometeor types. Journal of Atmospheric and Oceanic Technology, 18, 892-913, 2001.

Page 31: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

31

Table 1. Categorical contingency table: A is number of hits (correctly forecast events), B number of false alarms (incorrectly forecast non-events), C number of misses (incorrectly forecast events), and D number of correct rejections (correctly forecast non-events). Using the obtained frequencies, we can transform the table to the relative values x = X/N, where x = a, b, c, d and X = A, B, C, D, respectively. The observed frequencies of events and non-events are given by (A+B)/N and (B+D)/N, respectively. Similarly, the forecast frequencies of events and non-events are (A+B)/N and (C+D)/N .

observation forecast total

YES: o=1

NO: o=0

forecast YES: f=1 A B A+B

NO: f=0 C D C+D

observation total A+C B+D N=A+B+C+D

Page 32: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

32

Table 2. A summary of basic traditional scores (WWRP 2009 - 1 , 2009). The last column indicates recommendations by the WWRP/WGNE Joint Working Group on Verification: highly recommended (***), recommended (**) or worth a try (*).

Name Abbr. Definition (see Table 1 for A,B,C,D meaning) range best recom.

Bias, Frequency Bias: the ratio of the forecast rain frequency to the observed rain frequency

FBI (A+B)/(A+C) 0, ∞ 1 ***

Proportion (Percentage) Correct: the fraction of all correct forecasts PC (A+D)/N *100 0,100 100 ***

Probability of Detection (Hit Rate): the fraction of observed events that were correctly forecast

POD, HR A/(A+C) 0,1 1 ***

False Alarm Ratio (Rate): the fraction of forecast events that were observed to be non-events

FAR B/(A+B) = 1 – FOH 0,1 0 ***

Probability of False Detection (false alarm rate): the fraction of observed non-events that were forecast to be events

POFD B/(B+D) 0,1 0 **

Threat Score (Critical Success Index): the fraction of all events forecast and/or observed that were correctly forecast

TS, CSI A/(A+B+C) 0,1 1 **

Equitable Threat Score (Skill Corrected CSI, Gilbert Skill Score): the fraction of all events forecast and/or observed that were correctly forecast accounting for the hits that would occur purely due to random choice E; E= [(A+C) x (A+B)]/N

ETS, CSI (A-E)/(A+B+C-E)

SC -1/3,1 1 ***

Hanssen and Kuipers score (True Skill Statistic, Pierce skill score): measures the ability of the forecast system to separate the observed YES cases from the NO cases

HK, TSS

A/(A+C) – B/(B+D) =POD-POFD -1,1 1 **

Page 33: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

33

Heidtke (Total) Skill Score: measures the increase in proportion correct for the forecast system, relative to that of random chance; E*= E+[(B+D) x (C+D)]/N

HSS (A+D-E*)/(N-E*) -∞,1 1 **

Odds ratio: gives the ratio of the odds (see Table 3) of making a hit to the odds of making the false alarm and takes prior probability into account

OR AD / BC **

Odds ratio skill score: the transformation of odds ratio to have the range [-1, +1]

ORSS (AD-BC)/(AD+BC) -1,1 **

Page 34: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

34

Table 3. An additional summary of traditional scores according to Wilson (2001).

Name Abbr. Definition (see Table 1 for A,B,C,D meaning) range best

Stratification by observation:

Frequency of misses FOM C/(A+C) = 1 - POD 0,1 0

Probability of Null Event PON D/(B+D) = 1 - POFD 0,1 1

Stratification by forecast

Frequency of Hits,Success Ratio, Post Agreement FOH SR A/(A+B) 0,1 1

Detection Failure Ratio DFR C/(C+D) 0,1 0

Frequency of Correct Null Forecasts FOCN D/(C+D) = 1 - DFR 0,1 1

Odds p(x) / p(xmean) = p / (1-p) 0, ∞

Odds (hit) A/C = POD/(1-POD)

Odds (false alarm) B/D = POFD/(1-POFD)

Confidence limits

Hanssen and Kuipers score s2 (N(HK) 2- 4(A+B)(C+D) HK2

)/ /4N(A+B)(C+D)

Odds Ratio s2 A(logOR) -1 + B-1 + C-1 + C -1

Page 35: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

35

Table 4. A summary of basic traditional scores at forecasting the rain amount value (WWRP 2009 - 1 , 2009). The last column indicates recommendations by the WWRP/WGNE Joint Working Group on Verification: highly recommended (***), recommended (**) or worth a try (*). +)

Any of the accuracy measures can be used to construct a skill score that measures the fractional improvement of the forecast system over a reference forecast (Wilks, 2006; WWRP 2009 - 1 , 2009). The most frequently used scores are the MAE and the MSE. The reference estimate could be either climatology or persistence for 24 h accumulations, but persistence is suggested as a standard for short range forecasts and shorter accumulation periods.

Name Abbr. Description Recom.

Mean observed o ***

Sample standard deviation s

The square root of the sample variance, and provides a variability measure in the same units as the quantity being characterized.

***

Conditional median

The "typical" rain amount. Since the most common rain amount will normally be zero, the conditional median should be drawn from the wet samples in the distribution. It is more resistant to outliers than the mean.

***

Interquartile range IQR

Equal to the difference 75th percentile - 25th percentile of the distribution of rain amounts, and reflects the sample variability. It is more resistant to outliers than the standard deviation. Like the conditional median, the IQR should be drawn from the wet samples

**

Mean error ME The average difference between the forecast and observed values. ***

Mean absolute error MAE The average magnitude of the error. **

Mean square error MSE The average squared error magnitude, and is often used in the construction of skill scores. Larger errors carry more weight

**

Root mean square error RMSE

Measures the average error magnitude but gives greater weight to the larger errors. It is useful to decompose the RMSE into components representing differences in the mean and differences in the pattern or variability

***

Page 36: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

36

Root mean square factor RMSF

The exponent of the root mean square error of the logarithm of the data; gives a scale to the multiplicative error

**

(Product moment) correlation coefficient r

Measures the degree of linear association between the forecast and observed values, independent of absolute or conditional bias. Highly sensitive to large errors; it benefits from the square root transformation of the rain amounts.

***

Spearman rank correlation coefficient r

Measures the linear monotonic association between the forecast and observations, based on their ranks, i.e., the position of the values when arranged in ascending order). r

s

s

** is more

resistant to outliers than r.

MAE skill score MAE_SS +) 1 – MAEforecast/MAE ** reference

MSE skill score MSE_SS +) 1 – MSEforecast/MSE ** reference

Linear error in probability space LEPS Measures the error in probability space as

opposed to measurement space **

Table 5. Some basic aspects of spatial categories for new precipitation verification methods. Adapted from Ebert (2011).

Attribute Traditional Feature- based

Neighbor- hood

Scale Field deformation

Performance at different scales: indirectly indirectly YES YES NO

Location errors: NO YES Indirectly Indirectly YES

Intensity errors: YES YES YES YES YES Structure errors: NO YES NO NO YES

Hits etc.: YES YES YES Indirectly YES

Page 37: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

37

Figure 1. Verification of 6h rainfall by POD, FAR, BIAS, and CSI. The rainfalls were accumulated over the time period 16.00 UTC – 22.00 UTC (10–16 h forecast lead time) for five convective events with heavy local rainfalls (see correspondence between the symbols and the dates in the legend). (a) Grid point precipitation that exceeds the threshold values indicated on the curves for 30 May 2005. (b) Dependency on an elementary square area for analysis with the area side length considered (number of grid points) indicated on the curves for 15 May 2002. Model COSMO with horizontal resolution of 2.8km was adapted to cover the territory of the Czech Republic. Adapted from Rezacova et al. (2009).

Figure 2. An example of observed (left panel) and forecasted (right panel) precipitation fields from July 2009.

Page 38: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

38

Figure 3. Technique for matching observed (left panel) and forecasted (right panel) objects.

Figure 4. Shift and displacement mean square errors (left panel) and total mean square error (right panel).

Page 39: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

39

Figure 5. First row: Three hourly accumulated precipitation from the AROME “no_EDKF” run (left), the AROME “EDKF” run (middle) and raw radar (right) between 09 and 12 UTC on 22nd

July 2010 (+12h forecasts). Second row: the defined SAL objects (for better visibility, objects are distinguished by colours).

Page 40: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

40

Figure 6. SAL plots for ALADIN at 8 km resolution (upper left), ECMWF/IFS at 16 km resolution (upper right), the “no_EDKF” AROME run (lower left) and the “EDKF” AROME run (lower right) for the one-month period. Single forecasts are represented by marks, whose colours show the magnitude of the L component (the colour scale in each panel). The grey areas indicate the 25-75% percentiles and the dashed lines depict the median values of S and A components. The contingency table at the lower-right corner of each panel provides the number of cases (3h intervals) in which the threshold of 0.1 mm/3h is exceeded at least at one grid point in the model (MY) and the observation (OY), respectively, or not (MO, ON). On the SAL-plot only the OY-MY pairs are indicated.

Page 41: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

41

Figure 7. The number of precipitation objects (horizontal axis) identified in the “no_EDKF” AROME run (left) and the “EDKF” AROME run (right) against the number identified by radar measurements (vertical axis). A one-month period is investigated (similarly to Fig. 6) with 3 hourly precipitation accumulation.

Figure 8. Composite of domain-averaged precipitation under diurnal cycle for different models and radar for the one-month period. With all model runs started at 00 UTC, the lead time corresponds to a time of a day. Black: radar measurement; red: AROME “EDKF” version; green: AROME “no_EDKF” version; light blue: ECMWF/IFS; purple: ALADIN (operational at HMS at 8 km resolution); grey: ALARO (ALADIN model with different physics, run experimentally at HMS).

Page 42: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

42

Figure 9. Verification scores for extreme precipitation values based on the SAL object definition. Both model runs started at 00 UTC, the lead time corresponds to a time of a day. One-month period simulations with two different versions of the AROME model. Red: own surface data assimilation cycle was run for AROME; green: initial surface interpolated from the ALADIN model. Three hourly radar observations are marked with black. Left: averaged maximum intensity of all objects; right: averaged intensity of the three strongest objects.

Page 43: Verification of high resolution precipitation forecast by ...convection.zmaw.de/fileadmin/user_upload/convection/Convection/... · Verification of high resolution precipitation forecast

43

Figure 10. The cumulative fractions of 56 forecasts with FSS>FSSuniform (vertical axis) exceeding the precipitation thresholds of 1 mm/3h (upper plot), 5 mm/3h (middle plot), and 10 mm/3h (lower plot).The horizontal axis indicates the size of a square elementary area in kilometers. Shown are the results with the four NWP models: COSMO-CZ (horizontal resolution 2,8km: C3: red), COSMO-CZ (horizontal resolution 7km: C7: black), ALADIN (horizontal resolution 4.7km: A5: green), and ALADIN (horizontal resolution 9km: A9: blue). Adapted from (Zacharov et al., 2013).