2013 workshop on quantitative evaluation of downscaled data

43
Keith Dixon 1 , Katharine Hayhoe 2 , John Lanzante 1 , Anne Stoner 2 , Aparna Radhakrishnan 3 , V. Balaji 4 , Carlos Gaitán “Perfect Model” Experiments: Testing Stationarity in Statistical Downscaling (NCPP Protocol 2) IS PAST PERFORMANCE AN INDICATION OF FUTURE RESULTS? 1 2 3 DRC Inc. 4 Princeton Univ. 2013 Workshop on Quantitative Evaluation of Downscaled Data

Upload: tareq

Post on 23-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

2013 Workshop on Quantitative Evaluation of Downscaled Data . “ Perfect Model” Experiments: Testing Stationarity in Statistical Downscaling (NCPP Protocol 2 ) is past performance an indication of future results ?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Keith Dixon1, Katharine Hayhoe2, John Lanzante1, Anne Stoner2, Aparna Radhakrishnan3, V. Balaji4, Carlos Gaitán5

“Perfect Model” Experiments: Testing Stationarity in Statistical

Downscaling (NCPP Protocol 2)

IS PAST PERFORMANCE AN INDICATION OF FUTURE RESULTS?

1 2 3 DRC Inc. 4 Princeton Univ. 5 Univ. of Oklahoma

2013 Workshop on QuantitativeEvaluation of Downscaled Data

Page 2: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Goals of this presentation1. Define the ‘stationarity assumption’ inherent to

statistical downscaling of future climate projections.2. Present our ‘perfect model’ (aka ‘big brother’) approach

to quantitatively assess the extent to which the stationarity assumption holds.

3. Illustrate with a few examples the kind of results one can generate using this evaluation framework (Anne Stoner will show more detail next…)

4. Introduce options to extend and supplement the method (setting the hurdle at different heights).

5. Invite statistical downscalers to consider testing their methods within the perfect model framework.

6. Garner feedback from workshop participants.

Page 3: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

This project does not produce any data files that one would use in real world applications.

DATA INFO KNOW-LEDGE

WISDOM

The aim at GFDL is to gain knowledge about aspects of commonly used SD methods (& GCMS), and to communicate that knowledge so that better informed decisions can be made.

Page 4: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

REAL WORLD APPLICATION:

? ? ?

Start with 3 types of data sets … lacking observations of the futureCompute transform functions in training step (1 example below) Produce downscaled refinements of GCM (historical and/or future)Skill: Compare Obs to SD historical output (e.g., cross validation)Cannot evaluate future skill -- left assuming transform functionsapply equally well to past & future -- “The Stationarity Assumption”

TARGET

PREDICTORS

Page 5: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

“PERFECT MODEL” EXPERIMENTAL DESIGN:Start with 4 types of data sets – Hi-res GCM output as proxy for obs& coarsened version of Hi-res GCM output as proxy for usual GCM

? ? ?

Proxy for observations in real-world applicationProxy for GCM output in real-world application

Page 6: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Daily time resolution~25km grid spacing

194 x 114 grid22k pts

Page 7: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

~200km grid spacing64:1

Page 8: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

~64 of the smaller25km resolution grid cells fit withinthe coarser cell

Page 9: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data
Page 10: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Daily time resolution~25km grid spacing

194 x 114 grid

Page 11: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data
Page 12: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Follow the same interpolation/regriddingsequence to produce coarsened data sets for the future climate projections

Page 13: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

“PERFECT MODEL” EXPERIMENTAL DESIGN:Compute transform functions in training step (1 example below) Produce downscaled output for historical and future time periods Can directly evaluate skill both for the historical period and the future using the Hi Res GCM output as “truth” – Test Stationarity

QuantitativeTests of Stationarity: The extent to which

SKILL computed for the FUTURE CLIMATE PROJECTIONS is diminished relative to the

SKILL computed for the HISTORICAL PERIOD

Page 14: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

GFDL-HiRAM-C360 experimentsAll data used in the perfect model testswere derived from GFDL-HiRAM-C360 (C360) model simulations conducted following CMIP5 high resolution time slice experiment protocols.

We consider a pair of 30-year long “historical”experiments (1979-2008), labeled H2 and H2here…

For the period 2086-2095, we make useof a pair of 3 member ensembles -- a total of six 10-year long experiments, identified as the “C” and “E” ensemblesto the right (C warms more than E)...

Page 15: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Same GCM used to generate 2 sets of future projections (3 members each)1979-2008 model climatology

2086-2095 projection “C” (3x10yr)

2086-2095 projection “E” (3x10yr)

"E" Regional Warming [K]

"C" Regional Warming [K]

0 1 2 3 4 5 6 7 8

4.1

6.2

5.0

7.2

Land All pts

Page 16: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Q: How High a Hurdle does this Perfect Model approach present?

A: Varies geographically,by variable of interest,time period of interest, etc.

Page 17: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Coarsened data isslightly cooler withSlightly smaller std dev.… not too challenging

August Daily Max Temperaturesfor a point in Oklahoma

(2.5C histogram bins)

Page 18: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

approx. +7Cmean warming

August Daily Max Temperaturesfor a point in Oklahoma

(2.5C histogram bins)

Page 19: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

August Daily Max Temperaturesfor a point just NE of San Fancisco

Coarsened data ismore ‘maritime’ &HiRes target is more ‘continental’.

Presents more of achallenge during thehistorical period than the Oklahoma example.

(2.5C histogram bins)

Page 20: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

August Daily Max Temperaturesfor a point just NE of San Fancisco

(2.5C histogram bins)

Page 21: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Note that in several locations there is a tendency for ARRM cross-validated downscaled output to have too low standard deviations just offshore and too high std. dev. over coastal land

Map produced by NCPP Evaluation Environment ‘Machinery’

Page 22: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

NEXT: A sampling of results…

Intended to be illustrative…not exhaustive, nor systematic.

Anne Stoner (TTU) will show more.

Page 23: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Area Mean Time Mean Absolute Downscaling ErrorsΣ | (Downscaled Estimate – HiRes GCM) |

(NumDays)

1979-2008 "C" 2086-2095 "E" 2086-20950.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

0.63

0.900.760.82

1.14

0.98

All Pts Land

+40% +20%

Start with a summary: ARRM downscaling errors are larger for daily max temp at end of 21stC than for 1979-2008

Page 24: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Mean Absolute downscaling Error (MAE) during 1979-2008Σ | (Downscaled Estimate – HiRes GCM) |

(60 * 365)

1979-2008

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6All PtsLand

Geographic Variations: Downscaling MAE for 1979-2008

Page 25: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Mean absolute downscaling error during 2086-2095“E” Projections (3 member ensemble)

"E" 2086-2095

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6All PtsLand

Geographic Variations: MAE pattern for “E” projections (+5,4C)

Page 26: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Mean absolute downscaling error during 2086-2095“C” Projections (3 member ensemble)

"C" 2086-2095

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6All PtsLand

Geographic Variations: MAE pattern for “C” projections (+7,6C)

Page 27: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Mean Climate Change Signal Difference (ARRM minus Target) “C” Projection

Geographic Variations: Bias pattern for “C” projections (+7,6C)

Page 28: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

J F M A M J J A S O N D

Red = “C” Blue=“E”

J F M A M J J A S O N DEntire US48 Domain

Looking at how well the stationarity assumption holds in different seasons

Where and when ratio=1.0, the stationarity assumption fully holds (i.e., no degradation in mean absolute downscaling error during 2085-2095 vs. the 1979-2008 period using in training.)

4

3

2

1

tasmax

Page 29: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

…comparing near-coastal land vs. interior

Looking at how well the stationarity assumption holds in different seasons

Page 30: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

J F M A M J J A S O N DBlue= coastal SC-CSCBlack = full US48+ domainGreen = interior SC-CSC

Looking at how well the stationarity assumption holds in different seasons

“C” ensemble

4

3

2

1

Page 31: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

“C” ensembletasmax

A clear intra-month MAE trend in some but not all months

Looking at how well the stationarity assumption holds in different seasons

Page 32: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

“sawtooth” has lower values in cooler part of month, higher in warmer part of month

“C” ensembletasmax

Page 33: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Options for extending this ‘perfect model-based’ exploration of statistical downscaling stationarity

More SDMethods

More ClimateVariables & Indices

Different HiResClimate Models

Different Emissions Scenarios & Times

Use of SyntheticTime Series

Alter GCM-baseddata in known waysto ‘Raise the Bar’

Mix & Match GCMs’Target & Predictors

?? MoreIdeas ??

Page 34: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

GFDL-HiRAM-C360 experimentsData sets used in the perfect model tests are beingmade available to the community. This enables others to use our exp design to test the stationarity assumption for their own SD method.Also, we invite SD developers to consider collaborating with us, so that their techniques &perspectives may be more fully incorporated into this evolving perfect model-based research & assessment system.For more info, visit www.gfdl.noaa.gov/esd_eval &www.earthsystemcog.org/projects/gfdl-perfectmodel

Page 35: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

In other words, by accessing these types of files, folks can generate their own SD results and test how well the stationarity assumption holds for their favorite SD method.

from us

from us

Page 36: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

In other words, by accessing these types of files, folks can generate their own SD results and test how well the stationarity assumption holds for their favorite SD method.

from us

SD method

of choice

from us

Page 37: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Obviously, one does not need to perform tests using all 22,000+ grid points. Below we indicate the locations of

a set of 16 points we’ve used for some development work.(color areas are 3x3 with grid point of interest in the center)

Page 38: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Anne Stoner and Katharine Hayhoe with more ‘perfect model’ analyses…

Page 39: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Keith Dixon1, Katharine Hayhoe2, John Lanzante1, Anne Stoner2, Aparna Radhakrishnan3, V. Balaji4, Carlos Gaitán5

“Perfect Model” Experiments: Testing Stationarity in Statistical

Downscaling (NCPP Protocol 2)

IS PAST PERFORMANCE AN INDICATION OF FUTURE RESULTS?

1 2 3 DRC Inc. 4 Princeton Univ. 5 Univ. of Oklahoma

2013 Workshop on QuantitativeEvaluation of Downscaled Data

Page 40: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

“PERFECT MODEL” EXPERIMENTAL DESIGN:Compute transform functions in training step (1 example below) Produce downscaled output for historical and future time periods Can directly evaluate skill both for the historical period and the future using the Hi Res GCM output as “truth” – Test Stationarity

Page 41: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Options for extending this ‘perfect model-based’ exploration of statistical downscaling stationarity

More SDMethods

More ClimateVariables & Indices

Different HiResClimate Models

Different Emissions Scenarios & Times

Use of SyntheticTime Series

Alter GCM-baseddata in known waysto ‘Raise the Bar’

Mix & Match GCMs’Target & Predictors

?? MoreIdeas ??

Page 42: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

This project does not produce any data files that one would use in real world applications.

DATA INFO KNOW-LEDGE WISDOM

The aim at GFDL is to gain knowledge about aspects of commonly used SD methods (& GCMS), and to communicate that knowledge so that better informed decisions can be made.

Odds-n-ends from day 1* Not all SD codes run on a cell phone* Not All GCMs have 200km atmos grid (50km)* GCMs are developed primarily as research tools – not prediction tools

Page 43: 2013 Workshop on  Quantitative Evaluation of  Downscaled Data

Goals of this presentation1. Define the ‘stationarity assumption’ inherent to

statistical downscaling of future climate projections.2. Present our ‘perfect model’ (aka ‘big brother’) approach

to quantitatively assess the extent to which the stationarity assumption holds.

3. Illustrate with a few examples the kind of results one can generate using this evaluation framework (Anne Stoner will show more detail next…)

4. Introduce options to extend and supplement the method (setting the hurdle at different heights).

5. Invite statistical downscalers to consider testing their methods within the perfect model framework.

6. Garner feedback from workshop participants.