introduction to geospatial data and analytics on...

Post on 07-Aug-2020

7 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Geospatial Data and Analytics on CyberGIS

Yan Liu, Anand Padmanabhan, Shaowen Wang

CyberGIS Center for Advanced Digital and Spatial StudiesCyberInfrastructure and Geospatial Information Laboratory (CIGI)Department of Geography and Geographic Information Sciences

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

April 2014

Outline

• Introduction to CyberGISo Cyberinfrastructure and Geographic Information Systems (GIS)

o CyberGIS

o CyberGIS & XSEDE

• CyberGIS Data and Analyticso Computational challenges

o CyberGIS applications

• CyberGIS Gatewayo Open service API (hands-on)

o Gateway applications (hands-on)

3

CyberInfrastructure

• “The whole is more than the sum of

its parts.”o By Aristotle in the Metaphysics

Borromean rings, after Daniel E. Atkins

Image source:

http://www.phy.ornl.gov/theory/dean/RIATG/web_pages/structure_one_pager.html

Geographic Information Systems

• “Geographic Information Systems (GIS) are simultaneously the telescope, the microscope, the computer, and the Xerox machine of regional analysis and synthesis of spatial data.” (Ronald F.

Abler 1988)

4

Cyber + GIS > Cyber | GIS

Cyber

GIS

CyberGIS Center Websitehttp://cybergis.illinois.edu

CyberGIS Center Vision

Computing- and

Data-Intensive

Applications

and Sciences

Geospatial

Sciences and

Technologies

CyberGISAdvanced

Cyber-

infrastructure

Discovery & Innovation

Advanced Digital Technologies

7

People

• Director

• Executive committeeo Danny Powell, National Center for Supercomputing Applications

o Stephen Marshak, Director, School of Earth, Society and the Environment

o Sara McLafferty, Head, Department of Geography & Geographic Information Science

o Allen Renear, Interim Dean, Graduate School of Library and Information Science

o Brian Ross, Interim Dean, College of Liberal Arts and Sciences

o Peter Schiffer, Vice Chancellor for Research

o William Shilts, Prairie Research Institute

o Up to two additional faculty or staff user members can be added to the EC by majority vote of the Executive Committee.

• Program coordinator

• Technical coordinator

• Project manager

• Education, training and outreach coordinator

• Faculty and researcher affiliates – 70-100

• Multiple research programmers

• Multiple postdoctoral researchers

Focal Themes and Related Fields

• Sciences and technologies of CyberGIS (e.g., advanced cyberinfrastructure, computer science, computational and data science, geography and geographic information science, library and information science, mathematics, and statistics)

• Research and engineering applications of CyberGIS for enabling creative work, discovery, and innovation (e.g., agriculture, applied health sciences, atmospheric sciences, business, civil and environmental engineering, geography and geographic information science, geology, history, political science, sociology, urban and regional planning, and veterinary medicine)

• Human and societal dimensions of CyberGIS (e.g., business, communication, geography and geographic information science, industrial and enterprise systems engineering, and psychology)

• CyberGIS education, outreach, and training (all of the above fields)

NSF SI2-SSI: CyberGIS Project

$4.43 million, Year: 2010-2015

Principal Investigator

– Shaowen Wang

Project Staff

– ASU: Wenwen Li and Rob Pahle

– ORNL: Ranga Raju Vatsavai

– SDSC: Choonhan Youn

– UIUC: Yan Liu and Anand

Padmanabhan

– Graduate and undergraduate students

Industrial Partner: Esri

– Steve Kopp and Dawn Wright

10

Co-Principal Investigators

– Luc Anselin

– Budhendra Bhaduri

– Timothy Nyerges

– Nancy Wilkins-Diehr

Senior Personnel

– Michael Goodchild

– Sergio Rey

– Xuan Shi

– Marc Snir

– E. Lynn Usery

Project Manager

– Anand Padmanabhan

Chair of the Science Advisory Committee

– Michael Goodchild

CyberGIS Communities

• Science Communities• Advanced cyberinfrastructure

• Climate change impact assessment

• Emergency management

• Geographic information science

• Geography and spatial sciences

• Geosciences

• Social sciences

• Etc.

• User Communities• Biologists

• Geographers

• Geoscientists

• Social scientists

• General public

• Broad GIS users

• Etc.

11

12

CyberGIS on XSEDE

• Geographic Information Science gatewayo 2007 – present

o Science gateways program on XSEDE

• Allocationso Annual allocations awarded by XSEDE

o 8M computing hours for the academic year 2013 - 2014

Discoveries

Questions

Predictions

Grand Challenges?

14

Big Spatial Data

15

Big Spatial Simulation

Image created by Eric Shook

16

Complex Spatial Decision Making

17

Collaborative Knowledge Discovery

19

CyberGIS for What and Whom?

CyberGIS

Gateway

CyberGIS

Toolkit

20

www.opensciencegrid.org www.xsede.org

http://lakjeewa.blogspot.com/20

11/09/what-is-cloud-

computing.html

Integrated Digital and Spatial Sciences

CyberGIS

Gateway

CyberGIS

Toolkit

Space-Time Integration & Synthesis

GISolve

Middleware

21

22

23

24

http://blogs.esri.com/esri/arcgis/2013/10/01/what-is-cybergis/

Big Spatial

Data

Big Spatial

Simulation

Complex

Spatial

Decision

Making

Collaborative

Knowledge

Discovery

CyberGIS

Gateway

Yes

Maybe

Yes

Maybe

Yes

Maybe

Yes

Maybe

CyberGIS

Toolkit

Yes

Maybe

Yes

Maybe

Yes

Maybe

Yes

Maybe

GISolve

Middleware

Yes

Maybe

Yes

Maybe

Yes

Maybe

Yes

Maybe

26

Education

• Curriculum and pedagogy

• Partnerships

• Open ecosystems

Center Services

• CyberGIS Commonso Short courses

o One-on-one training

• CyberGIS Helpdesko Proposal development

o Technical consulting

• CyberGIS Infrastructureo Hardware

o SoftwareP

eo

ple

CyberGIS Data and Analytics

• Computational Challengeso Computational intensity

o Research

o Applications

• Scalable CyberGIS Data and Analyticso Data

o Scalable computing

• Performance

• Scalability

Heterogeneous

• Syntactic

• Semantic

Dynamic

• Spatial and temporal

• E.g. social media

Massive

• Produced by individuals

• Accessible to individuals

30

Large-scale

• Global coverage

Fine granularity

• Individual-level

• High-resolution

Distributed access

• Interoperability

• Privacy

• Security

Theory + Experiment + Computation + Big Data

Digital Environments

Parallel

o Used to be regarded as a way for speeding up GIS

functions and spatial analysis

o Now becoming a must for GIS and spatial analysis to

be built on

• Multi- and many-core

• GPU (graphics processing unit)

Heterogeneous architecture

Mobile

Distributed

o Service-oriented

o Clouds31

Computational Intensity Question

• What is the nature of computational intensity of geographic analysis?

o Why spatial is special?

• Comparable to

o “What is the nature of computational complexity of an algorithm?”

32

Spatial Computational Principles/Theories

Spatial

• Distribution

• Dependence

• Integration

• Representation

• Uncertainty

• Etc.

Computational

• Complexity vs. intensity

• Uncertainty vs. validity

• Performance vs. reliability

• Etc.

SC

AL

E

33

Big Geospatial Data Processing and Analysis

• 1/3 arcsec National

Elevation Dataset

• Resolution: 10m

• Size: 0.5TB

• Data operations

• Downloading

• Clipping

• Reprojection

• Transformation

• Visualization

• DEM-based analysis

• TauDEM

• 3DEP

• Multiple PBs

• NED: http://ned.usgs.gov; 3DEP: http://nationalmap.gov/3DEP/

• TauDEM: http://hydrology.usu.edu/taudem

Is Greenland bigger than US?

Scalable Reprojection• pRasterBlaster (Behzad et. al, 2012)

o A high-performance map reprojection software developed by US Geological Survey (USGS)

o Computational performance improvement

• Collaboration between USGS and CIGI@UIUC

• Parallelismo Spatial data domain decomposition

• Computational bottlenecko Load balancing

• Programming techniques

• Randomized workload distribution

• Resultso 12GB raster projection: ~200 seconds on Trestles supercomputer @ SDSC,

1024 processor cores

Scalability Issue

Profiling

Processor index

Programming for Scalability

38

N rows on P processor cores

When P is small When P is big

int load[P];

load[i] = N / P; // i = 0, 1, 2, …, N-1

load[N-1] += N % P;

Programming for ScalabilityN rows on P processor cores

When P is small When P is big

int load[P];

load[i] = N / P; // i = 0, 1, 2, …, N -1

load[j] += 1; // j = 0, 1, 2, …, (N % P) -1.

Input/Output (I/O) Bottleneck• TauDEM

o Terrain analysis using Digital Elevation Models (TauDEM)

• Scalability Improvement (Fan et. al, submitted)o Before: max DEM size – 6GB. Not scalable to the number of processors

o After: 36GB+

Programming for I/O Performance

Before

On each process:

foreach input DEM file

read spatial metadata

determine input data blocks

read data

After

On root process:

foreach input DEM file

read spatial metadata

broadcast metadata to all other processes

On earch process:

determine input data blocks

read data

Redistricting Problem

• Redistrictingo Partitioning a group of indivisible geographic units into a smaller number

of political districts

• Redistricting as A Combinatorial Optimization problemo Objectives and constraints

• Contiguity, competitiveness, equal-population, preservation of

communities of interest and local political subdivisions, Minority

districts

o Computational complexity

• Number of possible solutions

o Stirling number of the second kind: S(n, k)

o Example: S(55, 6) = 8.7 x 1039

• Computationally intractable

o NP-hard

Computational Intractable!

1 2 4 8 16 32 64 128

256 512 1024 2048 4096 8192 … …

263

Genetic Algorithm (GA) Approach• Principles

o Evolutionary process

• “survival of the fittest”

• Iterative algorithm

o Solution population: a diverse set of initial solutions

o GA operators

• Selection, crossover, mutation, replacement

o Stopping criteria

• Solution quality

• Time or the number of iterations

• Spatial GA operatorso Solution generation

o Crossover

o Mutation

Solution Generation

• Randomization-based district formation

• Contiguity and hole-free

Neighborhood graph Growth of seeds

(a) Seeding (b) District expansion (c) Initial solution

Crossover

• Binary crossover

• Spatial crossovero Overlay of two redistricting solutions splits

o Formulation of a finer level (split-level) redistricting problem

o Solution conversion

(a) Selection of

parent solutions

(b) Overlapping

(c) Solution conversion

Mutation

• Exchange of multiple units on district boundary and

beyond

Pick units:

units selected

for changing

district

assignment

High Performance Solution:

Parallel Genetic Algorithm (Liu & Wang, 2014)

Figure 2. Asynchronous PGA framework

CyberGIS Gateway

• Hands-on: Open Service API

• Hands-on: Use Viewshed Analysis for Planning

Hands-on: Open Service API

• Open Service APIo REST Web services

o Document:

• https://wiki.cigi.illinois.edu/display/DOC/GISolve+Open+Service+API+User+

Guide

o Code:

• https://svn2.cigi.uiuc.edu:8443/open-service-api/trunk

• Usageo Develop your own Web or desktop applications that leverage CyberGIS

capabilities

o Results sharing

o Embed analysis results as map layers on Web

Tutorial Server

• tutorial.cigi.uiuc.edu

• Accounts: train1 – train49

• Access:o ssh client

• Mac/linux: ssh username@tutorial.cigi.uiuc.edu

o scp

Hands-on: Viewshed Analysis

• Locating fire lookout towerso Study area: Silver Plume City, Colorado

o Requirements

• The visible area (viewshed) from the lookout towers should be as large as

possible to cover the service area identical to the bounding box of the given

DEM (https://dl.dropbox.com/u/21798649/competition/problem1/sp_b10m.tif)

• The number of towers to be built should be four or less, and be as small as

possible (due to budget and maintenance cost reasons)

Acknowledgments

Federal Agencies Department of Energy’s Office of Science National Science Foundation

– BCS-0846655– EAR-1239603– OCI-1047916– IIS-1354329– PHY-0621704– PHY-1148698– TeraGrid/XSEDE SES070004

US Geological Survey (USGS)

Industry Environmental Systems Research Institute (Esri)

57

Acknowledgments – U of I

College of Liberal Arts and Sciences Department of Geography and Geographic

Information Science Graduate School of Library and Information

Science National Center for Supercomputing Applications Office of the Vice Chancellor for Research Prairie Research Institute School of Earth, Society, and Environment

58

Acknowledgements - CyberGIS Center

Thanks!

• Comments / Questions? o Science Gateway questions

• help@xsede.org

• Surveyo http:/bit.ly/CSUXSEDE

60

top related