India-US Indoflux Workshop, July 12-16, 2006, Chennai, India
Cyberinfrastructure for Environmental Observing
Systems
Chaitan Baru
Director, Science R&D
San Diego Supercomputer Center
India-US Indoflux Workshop, July 12-16, 2006, Chennai, India
Components of Cyberinfrastructure (Web Services)-enabled science & engineering
CollaborationServices
Knowledge managementinstitutions for collection buildingand curation of data, information,
literature, digital objects
High-performance computingfor modeling, simulation, data
processing/ mining
Individual &Group Interfaces& Visualization
Physical World
Humans
Facilities for activation,manipulation and
construction
Instruments forobservation andcharacterization.
GlobalConnectivity
A broad, systemic, strategic conceptualization
Implies global (international) system for collaboration
Source: Dan Atkinshttp://www.communitytechnology.org/nsf_ci_report/
“CYBERINFRASTRUCTURE”What do we mean?
• Technologies to bring remote resources together
Social aspect: bringing multidisciplinary groups together
e-science
India-US Indoflux Workshop, July 12-16, 2006, Chennai, India
Environmental Observing Systems
• A major area of emphasis for NSF, and other agencies (e.g. GEOSS)– LTER, www.lternet.org – NEON, www.neoninc.org – ORION, www.orionprogram.org – Waters Network
• CUAHSI HIS, www.cuahsi.org/his • CLEANER, cleaner.ncsa.uiuc.edu
– EarthScope, www.earthscope.org – And many other efforts…(NEES, www.neesinc.org), etc.
• Number of established efforts at other federal and state agencies– USGS, EPA, DOE, …
India-US Indoflux Workshop, July 12-16, 2006, Chennai, India
Example of a Cyberinfrastructure Project GEON: Geosciences Network (
www.geongrid.org)
• Funded by NSF IT Research program (~$11.5M)• Multi-institution collaboration between IT and Earth
Science researchers• GEON Cyberinfrastructure provides:
– Authenticated access to data and Web services– Registration of data sets and tools, with metadata– Search for data, tools, and services, using ontologies– Scientific workflow environment– Data and map integration capability– Scientific data visualization and GIS mapping
India-US Indoflux Workshop, July 12-16, 2006, Chennai, India
Key Informatics Areas• Portals
– Authenticated, role-based access to cyber resources: data, tools, models, model outputs, collaboration spaces, …
• Data Integration– Search, discover and integrate data from heterogeneous
information sources (“mediation” and “semantic integration”)• Modeling and simulation environments based on
“scientific workflow” software– Users can “program” and steer computations at a higher level of
programming abstraction– Share models (not only data), and support generation and sharing
of provenance information• Geospatial information and Geographic Information
Systems (GIS)– Spatial statistics, spatiotemporal data mining
• Visualization of 2D, 2.5D, 3D, 4D data, and multidimensional information spaces
GEON: International Component• India
– Collaboration with University of Hyderabad• Profs. K.V. Subbarao & Arun Agarwal• Deploying a GEON Node at UofHyd and an India-based portal
– Conducted GEON Cyberinfrastructure Workshop, Oct. 2005– Recently announced as a Knowledge Networked R&D Center by Indo-US
Science and Technology Forum• Will partner with institutions like NGRI, INCOIS, Wadia Institute of Himalayan
Geology (WIHG), Birbal Sahni Institute of Paleo-Botany (BSIPB)• China
– Collaboration with Chinese Academy of Sciences, Beijing• Dr. Yaolin Shi, Director, Chinese Geodynamics Lab, Dr. Baopin Yan, Dir, CNIC
– GEON Cyberinfrastructure Workshop, July 20-23, 2006, Beijing. – Deploy GEON node & a Linux cluster for developing parallel geodynamics
codes• Japan
– Collaboration with AIST, Tokyo• Dr. Satoshi Sekiguchi
– Initiating a GEOGrid in Japan. Inauguration in early October, 2006– Will make various remote sensing data available via GEON.
LiDAR Data Processing
• Current implementation– 32 IBM P690 1.7GHz processors,
128GB, 8TB SAN– ~2TB point cloud data, ~6B rows in
database– ~20TB orthophotos
• Migrating to…– 16-way Linux cluster, 64-bit Intel
processors to support…– Central warehouse and replicas for
failover and load balancing, and– On-demand access & analysis of data
R. Haugerud, U.S.G.S D. Harding, NASA
Survey Process & Classify
Interpolate/Grid
Point Cloud
Point Cloudx, y, zn, …
Analyze/ Interpret
Courtesy: Chris Crosby &Prof. Ramon Arrowsmith, Arizona State
Meeting in August with USGS EROS Data Center to make continental-scale datasets open to NEON, GEON, and hazards user communities
NEON Infrastructure Overview
NEON Sensornet-level Cyberdashboard
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
120
Northeast
Mid Atlantic
Southeast
Atlantic Neotropical
Great Lakes
Prairie Peninsula
Appalachians / Cumberland Plateau
Ozarks Complex
Northern Plains
Central Plains
Southern Plains
Northern Rockies
Southern Rockies / Colorado Plateau
Desert Southwest
Great Basin
Pacific Northwest
Pacific Southwest
Tundra
Taiga
Pacific Neotropical11
10
9
8
7
6
5
4
3
2
1
12
16
15
14
13
17
19
18
20
19
18
16
Sensornet Software Stack
Sensors
Real-time Distributed Instrument Control
Admin and
Control
Analysis and
Visualization
Data Management
Services
Workflows
Portal GridSphere
Kepler, Custom portlets
Pub/Sub (Apache AXIS WS)
Data Turbine, EmStar, Antelope, SRB, Surge,ESS2
TinyDB, ESS2
Courtesy: Tony Fountain, Neil Cotofana, SDSC
Definition of standard interfaces
E.g. NEON Site SensorNet Logical Infrastructure
Portal
Workflows
Portal Server
Workflows
Services
Analysis / Vis.
Analysis / Vis. Server
Services
Data Mgmt.
Data Replication / Archiving Server
User(web)
User(BioPDA)
Services
Admin / Ctl.
Data Mgmt.
Well-endowed Node
RTD Instr. Ctl.
RTD Instr. Ctl.MicroNet Node
Sensor
Sensor***
RTD Instr. Ctl.MicroNet Node
Sensor
Sensor***
…
Sensor MicroNet (e.g. Soil)
Services
Admin / Ctl.
Data Mgmt.
Well-endowed Node
RTD Instr. Ctl.
RTD Instr. Ctl.MicroNet Node
Sensor
Sensor***
RTD Instr. Ctl.MicroNet Node
Sensor
Sensor***
Sensor MicroNet (e.g. Climate)
…
Courtesy: Tony Fountain, Neil Cotofana, SDSC
…
……
India-US Indoflux Workshop, July 12-16, 2006, Chennai, India
Opportunities
– Exploit national high-speed networking, e.g. Garuda, to ensure easy and efficient access to online (cyber) resources
– Leverage GEON cyberinfrastructure in collaboration with UofHyd/UCSD
– Keep in step with NEON cyberinfrastructure• Provide well-defined interfaces to metadata, data, and
instruments
– Create a true collaboration between scientists and computer scientists and IT researchers
– Promote e-science• Engage the next generation of scientists
– Lobby for NSF-India office (similar to NSF Beijing)
India-US Indoflux Workshop, July 12-16, 2006, Chennai, India
Thanks!
India-US Indoflux Workshop, July 12-16, 2006, Chennai, India
Data, Informatics and Cyberinfrastructure
Storage hardware
Networked Storage (SAN)
Grid StorageFilesystems, Database Systems
Data Mining, Simulation Modeling, Analysis, Data Fusion
Applications: Medical informatics,Biosciences, Ecoinformatics,…
Knowledge-Based Integration Advanced Query Processing
Visualization
High speed networking
sensornets
How do we configure computer architectures to optimally support
data-oriented computing?
How do we collect, accessand organize data?
How do we obtain usableinformation from data?
How do we detect trends and relationships in data?
How do we represent data, information and knowledge
to the user?
How do we combine data, knowledge
and information management with simulation and modeling?
instrumentsHPC
inte
gra
tio
n
India-US Indoflux Workshop, July 12-16, 2006, Chennai, India
NEON Visualization and Forecasting Facility
NEON Data Center
SAN
. . . . . .
Processing nodes
ArchivalStorage
GeographicallyRemote archival
copy
Raw data,Derived products
ComputeClusterw/ disk
Access toTeraGrid
Internet 2
VisualizationDisplays
(“Synthesis Center”)
NEON DataReplica
Data from NEON District-level PoP’s
NEON “Cyberdashboard” (Portal)
– NEON Portal provides• Authenticated access to NEON sensor
data, metadata, and derived prodcuts; Web services; sensor command and control
• Access to NEON forecasts and models via scientific workflow environments
• Data and map integration capability• Visualization and GIS mapping
A Proposed Open Systems Sensor Network Software Stack
Sensors
Real-time Distributed Instrument Control
Admin and
Control
Analysis and
Visualization
Data Management
Services
Workflows
Portal
Courtesy: Tony Fountain, Neil Cotofana, SDSC