understanding climate change:   a data driven approach

30
Understanding Climate Change: A Data Driven Approach Michael Steinbach University of Minnesota [email protected] www.cs.umn.edu/~steinbac Research funded by NSF, NASA, Planetary Skin Institute, ISET-NOAA, MN Futures program US-China Collaborations in Computer Science and Sustainability @University of Minnesota

Upload: delta

Post on 25-Feb-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Understanding Climate Change:   A Data Driven Approach. Michael Steinbach University of Minnesota [email protected] www.cs.umn.edu/~steinbac Research funded by NSF, NASA, Planetary Skin Institute, ISET-NOAA, MN Futures program. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Understanding Climate Change:   A Data Driven Approach

Understanding Climate Change: A Data Driven Approach

Michael Steinbach

University of Minnesota

[email protected]/~steinbac

Research funded by NSF, NASA, Planetary Skin Institute, ISET-NOAA, MN Futures program

US-China Collaborations in Computer Science and Sustainability@University of Minnesota

Page 2: Understanding Climate Change:   A Data Driven Approach

Vipin Kumar, UM Auroop Ganguly, UTK/ORNL Nagiza Samatova, NCSU Arindam Banerjee, UM

Fred Semazzi, NCSU Joe Knight, UM Shashi Shekhar, UM Peter Snyder, UM Jon Foley, UM

Alok Choudhary, NW Abdollah Homiafar, NCA&T Michael Steinbach, UM Singdhansu Chatterjee, UM

NSF Expeditions ProjectUnderstanding Climate Change: A Data-driven Approach

Page 3: Understanding Climate Change:   A Data Driven Approach

Climate Change: The defining issue of our era

• The planet is warming• Multiple lines of evidence• Credible link to human GHG

(green house gas) emissions

• Consequences can be dire• Extreme weather events,

regional climate and ecosystem shifts, abrupt climate change, stress on key resources and critical infrastructures

• There is an urgency to act• Adaptation: “Manage the

unavoidable”• Mitigation: “Avoid the

unmanageable” • The societal cost of both

action and inaction is largeKey outstanding science challenge: Actionable predictive insights to credibly inform policy

Russia Burns, Moscow ChokesNATIONAL GEOGRAPHIC, 2010 The vanishing of the

Arctic ice capecology.com, 2008

US-China Collaborations in Computer Science and Sustainability 3@University of Minnesota

Page 4: Understanding Climate Change:   A Data Driven Approach

Data-Driven Knowledge Discovery in Climate Science

• From data-poor to data-rich transformation– Sensor Observations: Remote sensors like satellites and

weather radars as well as in-situ sensors and sensor networks like weather station and radiation measurements

– Model Simulations: IPCC climate or earth system models as well as regional models of climate and hydrology, along with observed data based model reconstructions

"The world of science has changed ... data-intensive science [is] so different that it is worth distinguishing [it] … as a new, fourth paradigm for scientific exploration." - Jim Gray

Data guided processes can complement hypothesis guided data analysis to develop predictive insights for use by climate scientists, policy makers and community at large.

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 4

Page 5: Understanding Climate Change:   A Data Driven Approach

Data Mining Challenges

• Spatio-temporal nature of data– spatial and temporal autocorrelation.– Multi-scale/Multi-resolution nature

• Scalability– Size of Earth Science data sets can be very large

For example, for each time instance,• 2.5°x 2.5°:10K locations for the globe • 250m x 250m: ~10 billion• 50m x 50m : ~250 billion

• High-dimensionality• Noise and missing values• Long-range spatial dependence• Long memory temporal processes• Nonlinear processes, Non-Stationarity• Fusing multiple sources of data

SST

Precipitation

NPP

Pressure

SST

Precipitation

NPP

Pressure

Longitude

Latitude

Timegrid cell zone

...

US-China Collaborations in Computer Science and Sustainability 5@University of Minnesota

Page 6: Understanding Climate Change:   A Data Driven Approach

High Performance ComputingEnable efficient large-scale spatio-temporal analytics on

future generation exascale HPC platforms with complex memory hierarchies

Large scale, spatio-temporal, unstructured, dynamic

kernels , features, dependencies

Relationship MiningEnable discovery of complex dependence structures such as non linear associations or long range spatial dependencies

Nonlinear, spatio-temporal, multi-variate, persistence, long memory

Predictive ModelingEnable predictive modeling of typical and extreme behavior from multivariate spatio-temporal data

Nonlinear, spatio-temporal, multivariate

Complex NetworksEnable studying of collective behavior of interacting eco-climate systems

Nonlinear, space-time lag, geographical, multi-scale

relationships Community structure- function-dynamics

Enabling large-scale data-driven science for complex, multivariate, spatio-temporal, non-linear, and dynamic systems: • Fusion plasma

• Combustion • Astrophysics • ….

Our NSF Expedition project is an end-to-end demonstration of this major paradigm for future knowledge discovery process.

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 6

NSF Expedition: Understanding Climate Change – A Data-Driven Approach

Page 7: Understanding Climate Change:   A Data Driven Approach

Illustrative Applications of Spatial-Temporal Data Mining

• Monitoring of global forest cover

• Understanding the structure of the climate system finding climate indices and dipoles

US-China Collaborations in Computer Science and Sustainability 7@University of Minnesota

Page 8: Understanding Climate Change:   A Data Driven Approach

Monitoring Global Vegetation Cover: Motivation

Forestry• Identify degradation in forest cover due to logging,

conversions to cropland or plantations and natural disasters like fires.

• Applications: UN REDD+ , national monitoring, reporting and verification systems, etc.

Agriculture• Identify changes related to farmland, e.g. conversion to

biofuels, changes in cropping patterns and changes in productivity.

• Applications: estimating regional food risks and ecological impact of agricultural practices.

Urbanization• Identify scale, extent, timing and location of

urbanization.• Applications: policy planning, understanding impact on

microclimate, water consumption, etc.US-China Collaborations in Computer Science and Sustainability 8@University of Minnesota

Page 9: Understanding Climate Change:   A Data Driven Approach

Detecting Changes Using Vegetation Time Series

• Daily Remote Sensing observations are available from MODIS aboard AQUA and TERRA satellites.• High temporal frequency (daily for multi-

spectral data and bi-weekly for the Vegetation index products like EVI, FPAR)

• Time series based approaches can be used for• Detection of a greater variety of changes.• Identifying when the change occurred• Characterization of the type of change

e.g., abrupt versus gradual• Near-real time change identification

• Challenges • Poor data quality and high variability• Coarse spatial resolution of observations (250 m) • Massive data sets: 10 billion locations for the

globe

EVI time series for a location

EVI shows density of plant growth on the globe.

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 9

Noisy EVI time series with quality indicators

Page 10: Understanding Climate Change:   A Data Driven Approach

Novel Time Series Change Detection Techniques

Existing Time series change detection algorithms do not address unique characteristics of eco-system data like noise, missing values, outliers, high degree of variability (across regions, vegetation types, and time).Segmentation based approaches– Divide time series into homogenous segments.– Boundary of segments become the change points.– Useful for detection land cover conversions like

forest to cropland, etc. Prediction based approaches – Build a prediction model for the location using

previous observations.– Use the deviation of subsequent observations from

the predicted value by the model to identify changes/disturbances.

– Useful for detecting deviations from the normal vegetation model.

EVI time series for a 250m by 250m of land in Iowa, USA that changed from fallow land to agriculture land.

FPAR time series for a forest fire location in California, USA.

• S. Boriah, V. Kumar, M. Steinbach, et al., Land cover change detection: a case study, KDD 2008.• V. Mithal, S. Boriah, A. Garg, M. Steinbach, V. Kumar et al., Monitoring global forest cover using data mining. ACM Transactions on

Intelligent Systems and Technology, 2011

US-China Collaborations in Computer Science and Sustainability 10@University of Minnesota

Page 11: Understanding Climate Change:   A Data Driven Approach

Monitoring of Global Forest Cover

• Automated Land change Evaluation, Reporting and Tracking System (ALERT)

• Planetary Information System for assessment of ecosystem disturbances, such as

• Forest fires, droughts, floods, logging/deforestation, conversion to agriculture

• This system will help• Quantify the carbon impact of these changes• Understand the relationship to global

climate variability and human activity

• Provide ubiquitous web-based access to changes occurring across the globe, creating public awareness

US-China Collaborations in Computer Science and Sustainability 11@University of Minnesota

Page 12: Understanding Climate Change:   A Data Driven Approach

Case Study 1:

Monitoring Global Forest Cover

US-China Collaborations in Computer Science and Sustainability 12@University of Minnesota

Page 13: Understanding Climate Change:   A Data Driven Approach

Deforestation in the Amazon Rainforest

US-China Collaborations in Computer Science and Sustainability 13@University of Minnesota

Brazil Accounts for almost 50% of all humid tropical forest clearing, nearly 4 times that of the next highest country, which accounts for 12.8% of the total

Page 14: Understanding Climate Change:   A Data Driven Approach

Amazon Deforestation Animation 2001-2009

US-China Collaborations in Computer Science and Sustainability 14@University of Minnesota

Page 15: Understanding Climate Change:   A Data Driven Approach

Detecting other land cover changes

Damage to vegetation by hurricane Katrina

Flooding along Ob River, Russia

Shrinking of Lake Chad, Nigeria

Farm abandonment in Zimbabwe during political conflict between 2004 and 2008.

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 15

Page 16: Understanding Climate Change:   A Data Driven Approach

ALERT Platform

US-China Collaborations in Computer Science and Sustainability 16@University of Minnesota

Page 17: Understanding Climate Change:   A Data Driven Approach

The Need for Technology to Monitor Forests

“The [Peru] government needs to spend more than $100m a year on high-resolution satellite pictures of its billions of trees. But … a computing facility developed by the Planetary Skin Institute (PSI) … might help cut that budget.”

“ALERTS, which was launched at Cancún, uses … data-mining algorithms developed at the University of Minnesota and a lot of computing power … to spot places where land use has changed.”

- The Economist 12/16/2010

US-China Collaborations in Computer Science and Sustainability 17@University of Minnesota

Page 18: Understanding Climate Change:   A Data Driven Approach

Monitoring Forest Cover Change: Challenges Ahead

• Designing robust change detection algorithms• Characterization of land cover changes• Multi-resolution analysis (250m vs 1km vs 4km)

– Different kinds of changes are visible at different scales

• Multivariate analysis– Detecting some types of changes (e.g. crop

rotations) will require additional variables.• Data quality improvement

– Preprocessing of data using spatio-temporal noise removal and smoothing techniques can increase performance of change detection.

• Incremental update and real-time detection• Spatial event identification• Spatial-Temporal Querying• Applications in variety of domains:

– Climate, agriculture, energy– Economics, health care, network traffic@University of Minnesota US-China Collaborations in Computer Science and Sustainability 18

Page 19: Understanding Climate Change:   A Data Driven Approach

Illustrative Applications of Spatial-Temporal Data Mining

• Monitoring of global forest cover

• Understanding the structure of the climate system finding climate indices and dipoles

US-China Collaborations in Computer Science and Sustainability 19@University of Minnesota

Page 20: Understanding Climate Change:   A Data Driven Approach

Understanding Climate Change – Physics based Approach

Projection of temperature increase under different Special Report on Emissions Scenarios (SRES) by 24 different GCM configurations from 16 research centers used in the Intergovernmental Panel on Climate Change (IPCC) 4th Assessment Report.

A1B: “integrated world” balance of fuelsA2: “divided world” local fuelsB1: “integrated world” environmentally consciousIPCC (2007)

US-China Collaborations in Computer Science and Sustainability 20@University of Minnesota

General Circulation Models: Mathematical models with physical equations based on fluid dynamics

Figure Courtesy: ORNL

Ano

mal

ies

from

188

0-19

19 (K

)

Parameterization and non-linearity of differential equations are sources for uncertainty!

Cell

CloudsLand

Ocean

Page 21: Understanding Climate Change:   A Data Driven Approach

Understanding Climate Change – Physics based Approach

“The sad truth of climate science is that the most crucial information is the least reliable” (Nature, 2010)

Physics-based models are essential but not adequate– Relatively reliable predictions at global scale for ancillary

variables such as temperature– Least reliable predictions for variables that are crucial for impact

assessment such as regional precipitation

Regional hydrology exhibits large variations among major IPCC model projections

Disagreement between IPCC models

US-China Collaborations in Computer Science and Sustainability 21@University of Minnesota

General Circulation Models: Mathematical models with physical equations based on fluid dynamics

Figure Courtesy: ORNL

Ano

mal

ies

from

188

0-19

19 (K

)

Parameterization and non-linearity of differential equations are sources for uncertainty!

Cell

CloudsLand

Ocean

Page 22: Understanding Climate Change:   A Data Driven Approach

Example Driving Use CasesImpact of Global Warming on Hurricane

FrequencyRegime Shift in Sahel

1930s Dust Bowl Discovering Climate Teleconnections

Find non-linear

relationships

Validate with hindcasts

Build hurricane models

Sahel zone

Regime shift can occur without any advanced warning and may be triggered by isolated events such as storms, drought

Onset of major 30-year drought over the Sahel region in 1969

Affected almost two-thirds of the U.S. Centered over the agriculturally productive Great Plains

Drought initiated by anomalous tropical SSTs (Teleconnections)

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

longitude

latit

ude

Correlation Between ANOM 1+2 and Land Temp (>0.2)

-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180

90

60

30

0

-30

-60

-90

El Nino Events

Nino 1+2 Index

US-China Collaborations in Computer Science and Sustainability 22@University of Minnesota

Page 23: Understanding Climate Change:   A Data Driven Approach

• A climate index is a time series of temperature or pressure– Similar to business or economic

indices– Based on Sea Surface Temperature

(SST) or Sea Level Pressure (SLP)

• Climate indices are important because– They distill climate variability at a regional

or global scale into a single time series. – They are a way to capture teleconnections, i.e., climate phenomena

occurring in one location that can affect the climate at a faraway location– They are well-accepted by Earth scientists.– They are related to well-known climate phenomena such as El Niño.

Climate Indices: Connecting the Ocean/Atmosphere and the Land

Dow Jones Index (from Yahoo)

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 23

Page 24: Understanding Climate Change:   A Data Driven Approach

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

longitude

latit

ude

-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180

90

60

30

0

-30

-60

-90 -0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

longitude

latit

ude

Correlation Between ANOM 1+2 and Land Temp (>0.2)

-180 -150 -120 -90 -60 -30 0 30 60 90 120 150 180

90

60

30

0

-30

-60

-90

Correlation Between Nino 1+2 and Land Temperature (>0.2)

A Temperature Based Climate Index: NINO1+2

El Nino Events

Nino 1+2 Index

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 24

Page 25: Understanding Climate Change:   A Data Driven Approach

Pressure Based Climate Indices: Dipoles

Dipoles represent a class of teleconnections characterized by anomalies of opposite polarity at two locations at the same time.

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 25

Page 26: Understanding Climate Change:   A Data Driven Approach

Importance of Climate Indices and Dipoles

Crucial for understanding the climate system, especially for weather and climate forecast simulations within the context of global climate change.

Correlation of Land temperature anomalies with NAO

Correlation of Land temperature anomalies with SOI

SOI dominates tropical climate with floodings over East Asia and Australia, and droughts over America. Also has influence on global climate.

NAO influences sea level pressure (SLP) over most of the Northern Hemisphere. Strong positive NAO phase (strong Islandic Low and strong Azores High) are associated with above-average temperatures in the eastern US.

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 26

Page 27: Understanding Climate Change:   A Data Driven Approach

Detection of Global Dipole Structure

Most known dipoles discovered Location based definition possible for some known indices that are defined

using EOF analysis. New dipoles may represent previously unknown phenomenon.

Dipoles found using NCEP (National Centers for Environmental Prediction) Reanalysis Data For Pressure

Without detrending With detrending

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 27

Page 28: Understanding Climate Change:   A Data Driven Approach

Dipole Structure and Climate Models

• We can apply our algorithms to either the observed data or to data from various climate models

• This provides a way to analyze and compare climate models – For example, does the climate model capture known

dipoles when used to produce hindcast data?– What does the model predict about dipole structure

for the future under different scenarios?– How do the different models compare to one another?

@University of Minnesota US-China Collaborations in Computer Science and Sustainability 28

Page 29: Understanding Climate Change:   A Data Driven Approach

Summary

Data driven discovery methods hold great promise for advancement in the mining of climate and eco-system data.

Scale and nature of the data offer numerous challenges and opportunities for research in mining large datasets.

"The world of science has changed ... data-intensive science [is] so different that it is worth distinguishing [it] … as a new, fourth paradigm for scientific exploration." - Jim Gray

US-China Collaborations in Computer Science and Sustainability 29@University of Minnesota

Page 30: Understanding Climate Change:   A Data Driven Approach

Team Members and Collaborators

Vipin Kumar, Shyam Boriah, Rohit Gupta, Gang Fang, Gowtham Atluri, Varun Mithal, Ashish Garg, Vanja Paunic, Sanjoy Dey, Jaya Kawale, Marc Dunham, Divya Alla, Ivan Brugere, Vikrant Krishna, Yashu Chamber, Xi Chen, James Faghmous, Arjun Kumar, Stefan Liess

Sudipto Banerjee, Chris Potter, Fred Semazzi, Nagiza Samatova, Steve Klooster, Auroop Ganguly, Pang-Ning Tan, Joe Knight, Arindam Banerjee, Peter Snyder

Project websiteClimate and Eco-system: www.cs.umn.edu/~kumar/nasa-umn

• @University of Minnesota US-China Collaborations in Computer Science and Sustainability 30