benchmark dataset processing p. Štěpánek, p. zahradníček czech hydrometeorological institute...

53
Benchmark dataset processing P. Štěpánek, P. Zahradníček Hydrometeorological Institute (CHMI), Regional Office Brno, Czech R COST-ESO601 meeting, Tarragona, 9-11 March 2009 E-mail: [email protected]

Upload: valerie-patterson

Post on 30-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Benchmark dataset processing

P. Štěpánek, P. Zahradníček

Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic,

COST-ESO601 meeting, Tarragona, 9-11 March 2009

E-mail: [email protected]

Page 2: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

OutlineOutline

Outliers (daily data)Detection (monthly data) + correction

(daily data)

(description of methodology)Benchmark dataset (monthly data)

Page 3: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Processing before any data analysisProcessing before any data analysis

Software

AnClim,

ProClimDB

Page 4: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Data Data QQuality uality CControl ontrol FFinding inding OOutliersutliers

Two main approaches: Using limits derived from interquartile

ranges (time series)

comparing values to values of neighbouring stations (spatial analysis)

-4.0

-2.0

0.0

2.0

4.0

6.0

8.0

10.0

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000

-4.0

-2.0

0.0

2.0

4.0

6.0

8.0

10.0

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000

-4.0

-2.0

0.0

2.0

4.0

6.0

8.0

10.0

1950 1955 1960 1965 1970 1975 1980 1985 1990 1995 2000

Page 5: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Example of outputs for outliers assessmentExample of outputs for outliers assessment

Altitudes

and distances of neighbours

List of neighbours

Neighbour stations valuesExpected value

Suspicious values

Page 6: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Quality controlQuality control

Run for period 1961-2007, daily data (measured values in observation hours)

All stations (200 climatological stations, 800 precipitation stations) All meteorological elements (T, TMA, TMI, TPM, SRA, SCE,

SNO, E, RV, H, F) – parameters set individually

Historical records will follow now

Page 7: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Air temperature, Air temperature, number of outliers 1961-2007, number of outliers 1961-2007, from from 33..431431..000000 station-days station-days

0

200

400

600

800

1000

1200

T_07:00 T_14:00 T_21:00 T_AVG TMA TMI TPM

T – air temperature at obs. hour, TMA – daily maximum temp., TMI – daily min. temp., TPM – daily ground minimum temp.

Page 8: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Air temperature, Air temperature, number of outliers 1961-2007, number of outliers 1961-2007, from from 33..431431..000000 station-days station-days

Temperature

0

50

100

150

200

250

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

07:00 14:00 21:00 AVG

Air temperature at obs. hour, AVG – daily average temp.

Page 9: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Air temperature, Air temperature, number of outliers 1961-2007, number of outliers 1961-2007, from from 33..431431..000000 station-days station-days

0

50

100

150

200

250

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

TMA TMI TPM

v

Max, min temperature, ground minimum

TMA – daily maximum temp., TMI – daily min. temp., TPM – daily ground minimum temp.

Page 10: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Air temperature, Air temperature, number of outliers 1961-2007, number of outliers 1961-2007,

Number of outliers per one station (all observation hours, AVG)

0.000

0.020

0.040

0.060

0.080

0.100

0.120

1961 1964 1967 1970 1973 1976 1979 1982 1985 1988 1991 1994 1997 2000 2003 2006

temperature

Page 11: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Spatial distribution of precipitation stationsSpatial distribution of precipitation stations

period 1961-2007 600 stations mean minimum distance: 7.5 km

Page 12: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Problematic detections Problematic detections (heavy rainfall)(heavy rainfall)

Page 13: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Problematic detections Problematic detections (heavy rainfall), Radar information(heavy rainfall), Radar information

Page 14: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Precipitation, Precipitation, number of outliers 1961-2007, from number of outliers 1961-2007, from 1313..724724..000000 station-daysstation-days

.

0

100

200

300

400

500

600

700

800

900

1000

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

precipitation

Page 15: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Precipitation, Precipitation, number of outliers 1961-2007, number of outliers 1961-2007,

Number of outliers per one station

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

1961 1964 1967 1970 1973 1976 1979 1982 1985 1988 1991 1994 1997 2000 2003 2006

precipitation

Page 16: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

ConclusionsConclusions

Only combination of several methods for outliers detection leads to satisfying results (“real” outliers detection, supressing fault detection -> Emsemble approach)

Parameters (settings) has to be found individually for each meteorological element, maybe also region (terrain complexity) and part of a year (noticeable annual cycle in number of outliers)

Similar to homogenization of time series, it is important to use measured value (e.g. from observation hours) - outliers are masked in daily average (and even more in monthly or annual ones)

Errors found in all elements and investigated countries (AT, CZ, SK, HU)

Page 17: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Experiences with homogenization in the Czech Experiences with homogenization in the Czech RepublicRepublic

Page 18: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

HomogenizationHomogenization Change of measuring conditions

inhomogeneities

Page 19: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

DetectingDetecting Inhomogeneities Inhomogeneities by SNHT by SNHT (p=0.05, 950 series)(p=0.05, 950 series)

0

20

40

60

80

100

120

.1 .2 .3 .4 .5 .6 .7 .8 .9 1.0

Amount of change in level /°C

Inh

om

og

en

eit

ies

de

tec

ted

/ %

>2

2

1

0

Error/years

Page 20: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

most of metadata incomplete

we depend upon statistical tests results

uncertainty in test results - right inhomogeneity detection is problematic (for smaller amount of change)

Assessing Homogeneity - Assessing Homogeneity - ProblemsProblems

Page 21: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Proposed solutionProposed solution

most of metadata incomplete uncertainty in test results - the right inhomogeneity

detection is problematic

„Ensemble“ approach - processing of big amount of test

results for each individual series

To get as many test results for each candidate series as possible

Page 22: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

D a ta P ro ce ss ing

In te rq ua rtile R an ge C o m p ar ing to N e igh b ou rs

A le xa n de rsso n te st B iva r ia te Te st t- te st M a nn -W h itn ey -P e tt it

fro m C o r re la tio ns fro m D is tan ces

Filling M iss. Values

Adjusting Data

Hom. Assessm ent

Reference Series

Hom ogeneity T esting

Com bining Near Stations

Q uality Control - Outliers

Monthly, Seasonal and Annual Averages

Several Iterations

Probability

Days, Months, seasons, year

How to increase number of test resultsHow to increase number of test results

Page 23: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

for monthly, daily data (each month individually)

weighted/unweighted mean from neighbouring stations criterions used for stations selection (or combination of it):

– best correlated / nearest neighbours (correlations – from the first differenced series)

– limit correlation, limit distance– limit difference in altitudes

neighbouring stations series should be standardized to test series AVG and / or STD

(temperature - elevation, precipitation - variance)

- missing data are not so big problem then

Creating RCreating Reference eference SSerieseries

Page 24: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Relative homogeneity testingRelative homogeneity testing

Available tests:– Alexandersson SNHT– Bivariate test of Maronna and Yohai– Mann – Whitney – Pettit test– t-test– Easterling and Peterson test– Vincent method– …

20 year parts of the daily series (40 for monthly series with 10 years overlap),

in SNHT splitting into subperiods in position of detected significant changepoint

(30-40 years per one inhomogeneity)

Page 25: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Homogeneity assessmentHomogeneity assessment

Homogeneity assessmentHomogeneity assessment Quality control Homogenization Data Analysis

Test Ref I II III IV V VI VII VIII IX X XI XII Win Spr Sum Aut Year

A avg 1927 1929 1927 1927 1927 1928 1927 1926 1926 1926 1926 1926 1927 1927 1927 1926 1927A 1930A corr 1927 1927 1927 1927 1927 1928 1927 1926 1926 1926 1926 1926 1927 1927 1927 1926 1927A 1939 1938 1939 1940 1922 1937 1937 1935A dist 1927 1928 1927 1927 1927 1928 1927 1926 1926 1926 1926 1926 1927 1927 1927 1926 1927A 1930 1940 1918B avg 1927 1928 1927 1927 1927 1928 1927 1926 1926 1926 1926 1926 1927 1927 1927 1926 1927B 1922B corr 1927 1927 1927 1927 1927 1928 1927 1926 1926 1926 1926 1926 1927 1927 1927 1926 1927B 1936 1938 1939 1944 1922 1935 1937 1937 1935B 1937B dist 1927 1928 1927 1927 1927 1928 1927 1926 1926 1926 1926 1926 1927 1927 1927 1926 1927B 1930 1940 1931 1913 1918V corr 1927 1926V 1937 1922 1935V 1937V dist 1927 1927 1927V 1918

Output example: Station Čáslav, 3rd segment, 1911-1950, n=40

Page 26: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Homogeneity assessmentHomogeneity assessment

Begin End LengthInHomogen

eityNumber

% detected inhom

% possible inhom

EndMissin

g

1911 1950 40 140 100 120

1927 60 43 511926 37 26 32

1928 9 6 8 4

1937 7 5 61922 4 3 31935 4 3 31918 3 2 31930 3 2 31939 3 2 31940 3 2 3 21938 2 1 21913 1 1 1 3 31929 1 1 11931 1 1 11936 1 1 11944 1 1 1

1926 1927 2 97 69 831926 1931 6 111 79 951935 1940 6 20 14 17

1911 1920 10 4 3 31921 1930 10 114 81 97

1931 1940 10 21 15 181941 1950 10 1 1 1

Homogeneity assessmentHomogeneity assessment, , Output II example:

Summed numbers of detections for individual years

Page 27: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Homogeneity assessmentHomogeneity assessment

Homogeneity assessmentHomogeneity assessment

ID ELEMYEAR_INHOMBEGINEND YEAR_COUNTY_POSSIBL YEAR_ENDMISSVALSX_BEGIN_DAX_END_DATEX_BEGINX_ENDLATITUDELONGITUDEALTITUDEB_FULLNAMEREMARKC_OBSERVERC_IDx B1BOJK01 x 1985 41 14.24 12 23.3.1984 31.3.2003 # # Bojkovicechange

B1BOJK01 x 1985 41 14.24 12 23.3.1984 31.12.9999 # # obs Vladimˇr Maz lekB1BOJK01B1BYSH01 x 1978 37 12.85

? B1BYSH01 x 1979 33 11.46? B1BYSH01 x 1980 43 14.93? B1HLHO01 x 1965 31 10.76 4 1

B1HOLE01 x 1976 33 11.46B1KROM01 x 1977 1978 31 10.76

x B1RADE01 x 1994 44 15.28 2 1.1.1994 31.12.9999 # # RadýjovchangeB1RADE01 x 1994 44 15.28 2 1.1.1994 31.12.9999 # # obs Josef Pˇ§aB1RADE01

x B1RYCH01 x 1973 49 17.01 1.5.1973 28.2.1991 # # VyÜkov, Rychtß°ov, Ŕ.157changeB1RYCH01 x 1973 49 17.01 1.9.1972 28.2.1991 # # obs Marie Hor kov B1RYCH01

xx? B1STRZ01 x 1987 53 18.40B1STRZ01 x 1988 30 10.42B1UHBR01 x 1983 31 10.76 18.2.1984 31.1.1999 # # Uherskř Brod, MoŔidla 354changeB1UHBR01 x 1983 31 10.76 18.2.1984 12.5.1993 # # obs Josef KudelaB1UHBR01

x B1UHBR01 x 1984 77 26.74 18.2.1984 31.1.1999 # # Uherskř Brod, MoŔidla 354changeB1UHBR01 x 1984 77 26.74 18.2.1984 12.5.1993 # # obs Josef KudelaB1UHBR01B1VELI01 x 1978 31 10.76

? B1VELI01 x 1977 1978 44 15.28? B1VKLO01 x 1984 29 10.07x B1VYSK01 x 1999 32 11.11 -1 1.4.1998 31.12.9999 # # VyÜkov, Dukelskß 12change

B1VYSK01 x 1999 32 11.11 -1 1.4.1998 31.12.9999 # # obs VojtŘch Sur kB1VYSK01B2BOSK01_rx 1968 33 11.46B2BREC01 x 1968 35 12.15B2BRUM01 x 1989 51 17.71 1.2.1989 31.3.1994 # # BrumovchangeB2BRUM01 x 1989 51 17.71 1.2.1989 31.3.1994 # # obs Marta Paýˇzkov B2BRUM01

-1.0

-0.8

-0.6

-0.4

-0.2

0.0

0.2

0.4

0.6

0.8

1911 1915 1919 1923 1927 1931 1935 1939 1943 1947

combining several outputs (sums of detections in individual years, metadata, graphs of differences/ratios, …)

Page 28: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Adjusting monthly dataAdjusting monthly data using reference series based on correlations adjustment: from differences/ratios 20 years before and after a

change, monhtly

smoothing monthly adjustments (low-pass filter for adjacent values)

I II III IV V VI VII VIII IX X XI XII

Page 29: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Example:

Adjusting values - evaluation

Page 30: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Iterative homogeneity testingIterative homogeneity testing

several iteration of testing and results evaluation– several iterations of homogeneity testing and

series adjusting (3 iterations should be sufficient)

– question of homogeneity of reference series is thus solved:

• possible inhomogeneities should be eliminated by using averages of several neighbouring stations

• if this is not true: in next iteration neighbours should be already homogenized

Page 31: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Filling missing valuesFilling missing values

Before homogenization: influence on right inhomogeneity detection

After homogenization: more precise - data are not influenced by possible shifts in the series

Dependence of tested series on reference series

Page 32: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

#

#

Prague

Brno

HomogenizationHomogenization of the series of the series in the Czech Republicin the Czech Republic

Page 33: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Correlations between tested and reference series Air temperature

Boxplots:

- Median

- Upper and lower quartiles

(for 200 testes series)

1 - 07h2 - 14h3 - 21h4 - AVG

0.80

0.82

0.84

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

4

0.80

0.82

0.84

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1

0.80

0.82

0.84

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

3

0.80

0.82

0.84

0.86

0.88

0.90

0.92

0.94

0.96

0.98

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2

Page 34: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Correlations between tested and reference series Precipitation, snow depth, new snow

Boxplots:

- Median

- Upper and lower quartiles

(for 800 testes series)

1 - 07h precip.2 - 14h snow depth3 - 21h new snow

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

3

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2

Page 35: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Correlations between tested and reference series Wind speed

Boxplots:

- Median

- Upper and lower quartiles

(for 200 testes series)

1 - 07h2 - 14h3 - 21h4 - AVG

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

4

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

1

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

3

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

2

Page 36: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Number of significant inhomogeneities (0.05) detected by used tests

(A, B tests, c and d reference series, alltogether)

0

100

200

300

400

500

600

700

800

900

1000

I II III IV V VI VII VIII IX X XI XII

T_07

T_14

T_21

T_AVG

Air temperature

Homogeneity testing resultsHomogeneity testing results Air temperatureAir temperature

Page 37: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Amount of adjustments, averages of absolute values, T_AVG

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

I II III IV V VI VII VIII IX X XI XII

°C

T_07 T_14 T_21 T_AVG

Air temperature

Homogeneity testing resultsHomogeneity testing resultsAir temperatureAir temperature

Page 38: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Homogeneity testing resultsHomogeneity testing resultsPrecipitationPrecipitation

4 tests, 4 reference series, 12 months + 4 seasons and year Number of detected inhomogeneities (significant)

0

1000

2000

3000

4000

5000

6000

I II III IV V VI VII VIII IX X XI XIIMonth

Nu

mb

er

of

de

tec

tio

ns

Page 39: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Amount of change (ratios – standardized to be >1.0), precipitation(reference series calculation based on correlations)

Boxplots:

- Median

- Upper and lower quartiles

(for 589 testes series)

-0.005

0.000

0.005

0.010

0.015

0.020

0.025

I II III IV V VI VII VIII IX X XI XII

Co

rrel

atio

n in

crea

se

1.000

1.050

1.100

1.150

1.200

1.250

1.300

I II III IV V VI VII VIII IX X XI XII

Am

ou

nt o

f ch

an

ge

(st

an

da

rdiz

ed

)

Correlation improvement

Page 40: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

HomogenizationHomogenization Final rFinal remarks, recommendationsemarks, recommendations 1/2 1/2

data quality control before homogenization is of very importance (if it is not part of it)

Using series of observation hours (complementarily to daily

AVG) is highly recommended (different manifestation of breaks)

be aware of annual cycle of inhomogeneities, adjustments, …

to know behavior of spatial correlations (of element being processed) to be able to create reference series of sufficient quality …

Page 41: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

HomogenizationHomogenization Final rFinal remarks, recommendations 2/2emarks, recommendations 2/2

Because of Noise in the time series it makes sense: - „Ensemble“ approach to homogenization (combining

information from different statistical tests, time frames, overlapping periods, reference series, meteorological elements, …)

- more information for inhomogeneities assessment – higher quality of homogenization in case metadata are incomplete

Page 42: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Software used for data processingSoftware used for data processing

LoadData - application for downloading data from central database (e.g. Oracle)

ProClimDB software for processing whole dataset (finding outliers, combining series, creating reference series, preparing data for homogeneity testing, extreme value analysis, RCM outputs validation, correction, …)

AnClim software for homogeneity testing

http://www.http://www.cclimahom.limahom.eueu

Page 43: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

AnClim softwareAnClim software

AnClim softwareAnClim software

Page 44: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

ProcDataProcData software software

ProProClimDBClimDB software software

Page 45: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Testing of benchmark datasetTesting of benchmark dataset

Fully automated (just for one click), but it should not work like this in reality:

detection phase can be fully automated, but desicion about breaks to be corrected should be man-made (comparison with metadata, plots of differences – ratios, …)

Page 46: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Fully automatic detection phase in Fully automatic detection phase in ProProClimDBClimDB software softwareselecting neighbours for reference seriesselecting neighbours for reference series

Page 47: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Fully automatic detection phase in Fully automatic detection phase in ProProClimDBClimDB software softwarereference series calculation and launching AnClimreference series calculation and launching AnClim

Page 48: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Automation in AnClim, tests can be selected in advanceAutomation in AnClim, tests can be selected in advance

Page 49: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Test results processed back in ProClimDBTest results processed back in ProClimDB

Page 50: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

Non-automated desicion about breaks before correctionNon-automated desicion about breaks before correction

Page 51: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601
Page 52: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601
Page 53: Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601

http://www.http://www.cclimahom.limahom.eueu