india-us indoflux workshop, july 12-16, 2006, chennai, india cyberinfrastructure for environmental...

19
India-US Indoflux Workshop, July 12-16, 2006, Chennai, India Cyberinfrastructure for Environmental Observing Systems Chaitan Baru Director, Science R&D San Diego Supercomputer Center

Upload: augustine-armstrong

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

India-US Indoflux Workshop, July 12-16, 2006, Chennai, India

Cyberinfrastructure for Environmental Observing

Systems

Chaitan Baru

Director, Science R&D

San Diego Supercomputer Center

India-US Indoflux Workshop, July 12-16, 2006, Chennai, India

Components of Cyberinfrastructure (Web Services)-enabled science & engineering

CollaborationServices

Knowledge managementinstitutions for collection buildingand curation of data, information,

literature, digital objects

High-performance computingfor modeling, simulation, data

processing/ mining

Individual &Group Interfaces& Visualization

Physical World

Humans

Facilities for activation,manipulation and

construction

Instruments forobservation andcharacterization.

GlobalConnectivity

A broad, systemic, strategic conceptualization

Implies global (international) system for collaboration

Source: Dan Atkinshttp://www.communitytechnology.org/nsf_ci_report/

“CYBERINFRASTRUCTURE”What do we mean?

• Technologies to bring remote resources together

Social aspect: bringing multidisciplinary groups together

e-science

India-US Indoflux Workshop, July 12-16, 2006, Chennai, India

Environmental Observing Systems

• A major area of emphasis for NSF, and other agencies (e.g. GEOSS)– LTER, www.lternet.org – NEON, www.neoninc.org – ORION, www.orionprogram.org – Waters Network

• CUAHSI HIS, www.cuahsi.org/his • CLEANER, cleaner.ncsa.uiuc.edu

– EarthScope, www.earthscope.org – And many other efforts…(NEES, www.neesinc.org), etc.

• Number of established efforts at other federal and state agencies– USGS, EPA, DOE, …

India-US Indoflux Workshop, July 12-16, 2006, Chennai, India

Example of a Cyberinfrastructure Project GEON: Geosciences Network (

www.geongrid.org)

• Funded by NSF IT Research program (~$11.5M)• Multi-institution collaboration between IT and Earth

Science researchers• GEON Cyberinfrastructure provides:

– Authenticated access to data and Web services– Registration of data sets and tools, with metadata– Search for data, tools, and services, using ontologies– Scientific workflow environment– Data and map integration capability– Scientific data visualization and GIS mapping

India-US Indoflux Workshop, July 12-16, 2006, Chennai, India

Key Informatics Areas• Portals

– Authenticated, role-based access to cyber resources: data, tools, models, model outputs, collaboration spaces, …

• Data Integration– Search, discover and integrate data from heterogeneous

information sources (“mediation” and “semantic integration”)• Modeling and simulation environments based on

“scientific workflow” software– Users can “program” and steer computations at a higher level of

programming abstraction– Share models (not only data), and support generation and sharing

of provenance information• Geospatial information and Geographic Information

Systems (GIS)– Spatial statistics, spatiotemporal data mining

• Visualization of 2D, 2.5D, 3D, 4D data, and multidimensional information spaces

GEON: International Component• India

– Collaboration with University of Hyderabad• Profs. K.V. Subbarao & Arun Agarwal• Deploying a GEON Node at UofHyd and an India-based portal

– Conducted GEON Cyberinfrastructure Workshop, Oct. 2005– Recently announced as a Knowledge Networked R&D Center by Indo-US

Science and Technology Forum• Will partner with institutions like NGRI, INCOIS, Wadia Institute of Himalayan

Geology (WIHG), Birbal Sahni Institute of Paleo-Botany (BSIPB)• China

– Collaboration with Chinese Academy of Sciences, Beijing• Dr. Yaolin Shi, Director, Chinese Geodynamics Lab, Dr. Baopin Yan, Dir, CNIC

– GEON Cyberinfrastructure Workshop, July 20-23, 2006, Beijing. – Deploy GEON node & a Linux cluster for developing parallel geodynamics

codes• Japan

– Collaboration with AIST, Tokyo• Dr. Satoshi Sekiguchi

– Initiating a GEOGrid in Japan. Inauguration in early October, 2006– Will make various remote sensing data available via GEON.

LiDAR Data Processing

• Current implementation– 32 IBM P690 1.7GHz processors,

128GB, 8TB SAN– ~2TB point cloud data, ~6B rows in

database– ~20TB orthophotos

• Migrating to…– 16-way Linux cluster, 64-bit Intel

processors to support…– Central warehouse and replicas for

failover and load balancing, and– On-demand access & analysis of data

R. Haugerud, U.S.G.S D. Harding, NASA

Survey Process & Classify

Interpolate/Grid

Point Cloud

Point Cloudx, y, zn, …

Analyze/ Interpret

Courtesy: Chris Crosby &Prof. Ramon Arrowsmith, Arizona State

Meeting in August with USGS EROS Data Center to make continental-scale datasets open to NEON, GEON, and hazards user communities

NEON Infrastructure Overview

NEON Sensornet-level Cyberdashboard

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

120

Northeast

Mid Atlantic

Southeast

Atlantic Neotropical

Great Lakes

Prairie Peninsula

Appalachians / Cumberland Plateau

Ozarks Complex

Northern Plains

Central Plains

Southern Plains

Northern Rockies

Southern Rockies / Colorado Plateau

Desert Southwest

Great Basin

Pacific Northwest

Pacific Southwest

Tundra

Taiga

Pacific Neotropical11

10

9

8

7

6

5

4

3

2

1

12

16

15

14

13

17

19

18

20

19

18

16

Sensornet Software Stack

Sensors

Real-time Distributed Instrument Control

Admin and

Control

Analysis and

Visualization

Data Management

Services

Workflows

Portal GridSphere

Kepler, Custom portlets

Pub/Sub (Apache AXIS WS)

Data Turbine, EmStar, Antelope, SRB, Surge,ESS2

TinyDB, ESS2

Courtesy: Tony Fountain, Neil Cotofana, SDSC

Definition of standard interfaces

E.g. NEON Site SensorNet Logical Infrastructure

Portal

Workflows

Portal Server

Workflows

Services

Analysis / Vis.

Analysis / Vis. Server

Services

Data Mgmt.

Data Replication / Archiving Server

User(web)

User(BioPDA)

Services

Admin / Ctl.

Data Mgmt.

Well-endowed Node

RTD Instr. Ctl.

RTD Instr. Ctl.MicroNet Node

Sensor

Sensor***

RTD Instr. Ctl.MicroNet Node

Sensor

Sensor***

Sensor MicroNet (e.g. Soil)

Services

Admin / Ctl.

Data Mgmt.

Well-endowed Node

RTD Instr. Ctl.

RTD Instr. Ctl.MicroNet Node

Sensor

Sensor***

RTD Instr. Ctl.MicroNet Node

Sensor

Sensor***

Sensor MicroNet (e.g. Climate)

Courtesy: Tony Fountain, Neil Cotofana, SDSC

……

India-US Indoflux Workshop, July 12-16, 2006, Chennai, India

Opportunities

– Exploit national high-speed networking, e.g. Garuda, to ensure easy and efficient access to online (cyber) resources

– Leverage GEON cyberinfrastructure in collaboration with UofHyd/UCSD

– Keep in step with NEON cyberinfrastructure• Provide well-defined interfaces to metadata, data, and

instruments

– Create a true collaboration between scientists and computer scientists and IT researchers

– Promote e-science• Engage the next generation of scientists

– Lobby for NSF-India office (similar to NSF Beijing)

India-US Indoflux Workshop, July 12-16, 2006, Chennai, India

Thanks!

India-US Indoflux Workshop, July 12-16, 2006, Chennai, India

Data, Informatics and Cyberinfrastructure

Storage hardware

Networked Storage (SAN)

Grid StorageFilesystems, Database Systems

Data Mining, Simulation Modeling, Analysis, Data Fusion

Applications: Medical informatics,Biosciences, Ecoinformatics,…

Knowledge-Based Integration Advanced Query Processing

Visualization

High speed networking

sensornets

How do we configure computer architectures to optimally support

data-oriented computing?

How do we collect, accessand organize data?

How do we obtain usableinformation from data?

How do we detect trends and relationships in data?

How do we represent data, information and knowledge

to the user?

How do we combine data, knowledge

and information management with simulation and modeling?

instrumentsHPC

inte

gra

tio

n

India-US Indoflux Workshop, July 12-16, 2006, Chennai, India

NEON Visualization and Forecasting Facility

NEON Data Center

SAN

. . . . . .

Processing nodes

ArchivalStorage

GeographicallyRemote archival

copy

Raw data,Derived products

ComputeClusterw/ disk

Access toTeraGrid

Internet 2

VisualizationDisplays

(“Synthesis Center”)

NEON DataReplica

Data from NEON District-level PoP’s

NEON “Cyberdashboard” (Portal)

– NEON Portal provides• Authenticated access to NEON sensor

data, metadata, and derived prodcuts; Web services; sensor command and control

• Access to NEON forecasts and models via scientific workflow environments

• Data and map integration capability• Visualization and GIS mapping

A Proposed Open Systems Sensor Network Software Stack

Sensors

Real-time Distributed Instrument Control

Admin and

Control

Analysis and

Visualization

Data Management

Services

Workflows

Portal

Courtesy: Tony Fountain, Neil Cotofana, SDSC