benchmark database based on surrogate climate records victor venema

Benchmark database

based on surrogate climate records

Victor Venema

M e te o ro lo g ic a l

I n stitu te

B o n n

Goals of COST-HOME working group 1

Literature survey

Benchmark dataset– Known inhomogeneities– Test the homogenisation algorithms (HA)

Benchmark dataset

1) Real (inhomogeneous) climate records Most realistic case Investigate if various HA find the same breaks Good meta-data

2) Synthetic data For example, Gaussian white noise Insert know inhomogeneities Test performance

3) Surrogate data Empirical distribution and correlations Insert know inhomogeneities Compare to synthetic data: test of assumptions

Creation benchmark – Outline talk

1) Start with homogeneous data

2) Multiple surrogate and synthetic realisations

3) Mask surrogate records

4) Add global trend

5) Insert inhomogeneities in station time series

6) Published on the web

7) Homogenize by COST participants and third parties

8) Analyse the results and publish

1) Start with homogeneous data Monthly mean temperature and precip (France) Later also daily data Later maybe other variables

Homogeneous No missing data Detrended

20 to 30 years is enough for good statistics Longer surrogates are based on multiple copies

– Larger scale correlations are small– Distribution well defined with 30a data

Generated networks are: 50, 100 and 200 a long

2) Multiple surrogate realisations

Multiple surrogate realisations– Temporal correlations– Station cross-correlations– Empirical distribution function

Annual cycle removed before, added at the end Number of stations between 5 and 20 Cross correlation varies as much as possible

Show plot temporal structure of surrogates Show plot cross correlations

One station – with annual cycle

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000

20

40Measurement

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 20000

20

40Surrogate

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 19100

20

40Measurement

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 19100

20

40Surrogate

One station – anomalies

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000-10

0

10Measurement

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000-10

0

10Surrogate

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10

0

10

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10

0

10

Measurement

Surrogate

Multiple stations – 10 year zoom

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10

-5

0

5

10

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10

-5

0

5

10

Measurement

Surrogate

Multiple stations – 10 year zoom

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10

0

10Measurement - low cross correlation

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10

0

10Surrogate

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10

0

10Measurement - high cross correlation

1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910-10

0

10Surrogate

IAAFT algorithm smoothes jumps

100 200 300 400 500 600 700 800 900 1000

98

99

100

101

102

Surrogate of Bounded Cascade

Time or space

LWP

or

LWC

100 200 300 400 500 600 700 800 900 1000

98

99

100

101

102

Time or space

LWP

or

LWC

Bounded Cascade time series


Beginning of records jagged (rough) Linear increase in number of stations Last station after 25% of full time

End of record all stations are measuring

Influence of jagged edge on detection and correction

But trend is also increasing in time (i.e. different)! Is this a problem?


1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

2

4

6

8

10

12

14

16

18

20

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000

2

4

6

8

10

12

14

16

18

20

4) Add global trend NASA GISS GISS Surface Temperature Analysis

(GISTEMP) by J. Hansen Global mean surface temperature Last year of any surrogate network is 1999

1890 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990-1.5

-1

-0.5

0

0.5

1Trend

5) Insert inhomogeneities in stations

Random breaks (implemented) Frequency of breaks 1/20a, 1/40a Size constants for temperature: 0.25, 0.5, 1.0 °C Size factors for rain: 0.8, 0.9, 1.1, 1.2

Simultaneous breaks Frequency of breaks 1/50a In 10 to 50 % of network

5) Insert inhomogeneities in stations

Outliers Frequency: 1 – 3 % Size: 99 and 99.9 percentiles

Local trends (only temperature) Linear increase or decrease in one station Duration: 30, 60a Maximum size: 0.2 to 1.5 °C Frequency: once in 10 % of the stations

6) Published on the web

Inhomogeneous data will be published on the COST-HOME homepage

Everyone is welcome to download and homogenize the data

7) Homogenize by participants

Return homogenised data Should be in COST-HOME file format (next slide)

Return break detections– BREAK– OUTLI– BEGTR– ENDTR

Multiple breaks at one data possible


COST-HOME file format: http://www.meteo.uni-bonn.de/

venema/themes/homogenisation/costhome_fileformat.pdf For benchmark & COST homogenisation software

One data and one quality-flag file per station Filename: variable, resolution, quality, station

ASCII network-file with station names ASCII break-file with dates and station names

COST-HOME file format – monthly data

COST-HOME file format – network file

8) Analyse the results

Detailed analysis will be performed in the working groups– Detection– Correction– Daily data homogenisation

Synthetic and surrogate data– RMS Error– No. breaks detected (function of size)– Application: reduction in the scatter in the trends

Performance difference between synthetic (Gaussian, white noise) and surrogate data

Work in progress

Monthly precipitation Implement some inhomogeneity types Daily data: other inhomogeneities Synthetic data (Gaussian white noise) More input data!

Agree on the details of the benchmark – Next meeting?

Set deadline for the availability benchmark Deadline for the return of the homogeneous data

Questions

Ideas for a better benchmark For example, for other inhomogeneities, constants

Types of inhomogeneities for daily data Automatic processing

– In the order of 100 networks


COST-HOME file format: http://www.meteo.uni-bonn.de/

venema/themes/homogenisation/costhome_fileformat.pdf For benchmark & COST homogenisation software Regular ASCII matrix (columns) One data and one quality-flag file per station Yearly, daily, subdaily data: columns for time, one

for data Monthly data: year column, 12 columns for data Filename: variable, resolution, quality, station ASCII network-file with station names ASCII break-file with dates and station names

benchmark database based on surrogate climate records victor venema

Documents

data slide

data possible slide

network slide

long slide

mask surrogate records

synthetic data

annual cycle slide

daily data