we have data, now what?

26
lide 1/19 We have data, now what? Carol Song Senior Research Scientist Rosen Center for Advanced Computing Purdue University [email protected] WGISS-26, September 23, 2008

Upload: larya

Post on 23-Jan-2016

21 views

Category:

Documents


0 download

DESCRIPTION

We have data, now what?. Carol Song Senior Research Scientist Rosen Center for Advanced Computing Purdue University [email protected]. WGISS-26, September 23, 2008. Understanding and Utilizing Data. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: We have data, now what?

Slide 1/19

We have data, now what?

Carol SongSenior Research Scientist

Rosen Center for Advanced ComputingPurdue University

[email protected]

WGISS-26, September 23, 2008

Page 2: We have data, now what?

Slide 2/19

Understanding and Utilizing Data

• An integrated system for real-time NEXRAD II radar data delivery and 3D visualization, with multi-layer user interfaces to reach a wide audience.– Collaboration among computer scientists and

earth/atmospheric scientists– Team: V. Sundaram, L. Zhao, C.X. Song, B. Benes, P. Kristof, R.

Veeramacheneni, M. Huber.

• Demand-driven subscription system for real-time satellite data delivery– Purdue Terrestrial Observatory– Team: R. Kalyanam, L. Zhao, L. Biehl, C.X. Song

• Providing data through services!

Work supported by:National Science Foundation

Page 3: We have data, now what?

Slide 3/19

Next Generation Radar (NEXRAD) Level II Data Weather Surveillance Radar (WSR-88D)

• This data contains a very fine temporal and spatial resolution of three attributes: reflectivity, Doppler radial velocity and spectrum width

• These attributes are vital to understanding, monitoring and predicting severe weather conditions

• There are 135 Radar Stations in the US

• Continuously received in near real-time, streaming

Doppler Radar Tower in Connecticut and

the Pulsed Doppler Radar inside

Acknowledgment: Figures are downloaded from websites

www.CCSU.edu and www.answers.com.

Page 4: We have data, now what?

Slide 4/19

NEXRAD II Data Generation

• 3D structure in Radar Data– Continuous rotation over 360° in azimuth – Simultaneous increase in elevation by 1° to 3°per complete

sweep• Continuous NEXRAD Level II radar data stream

– Data files vary in size: a few MB to tens of MB each, depending on the weather conditions.

– Data compressed with a modified bzip2– The temporal resolution is 4-5 minutes in severe weather vs.

9-10 minutes in calm weather

Structure of Doppler Radar Data (Reflectivity )

Page 5: We have data, now what?

Slide 5/19

NEXRAD II Data Distribution

• The National Climatic Data Center (NCDC) houses the data and provides a central clearinghouse of archived Level II data as a resource to the research, teaching, and technology development communities.

• Distributed through four top tier distributors• Purdue makes it available on the NSF TeraGrid• Opportunity!

– The near real-time availability of high-resolution radar data provides an exciting opportunity for meteorologists if the data can be accessed and visualized in 3D in a timely manner.

– Super res data becoming available as we speak

Page 6: We have data, now what?

Slide 6/19

Technical Challenges

• Large volume and real-time streaming (50 MB/s) presents major computational and data management challenges.

• Super Res data: even larger data– SUPER RESOLUTION DATA INCREASE THE AZIMUTH RESOLUTION FROM 1 DEGREE TO 0.5 DEGREE. – THE REFLECTIVITY DATA RANGE RESOLUTION FROM 1 KM TO 0.25 KM...AND DOPPLER DATA RANGE

FROM 230 KM TO 300 KM FOR SPLIT CUTS...GENERALLY SCANS AT 1.5 DEGREES OR LOWER ELEVATION. – THE AMOUNT OF DATA COLLECTED AND TRANSMITTED DURING A VOLUME SCAN WILL INCREASE BY A

FACTOR OF APPROXIMATELY 2.3.

• Lack of scale: Analyzing data over a long period or large geographical region requires heavy computation

• Lack of interactive 3D visualizations– Despite the availability of 3D information in the new generation, the

data is most commonly visualized as 2D images, simple 3D Point clouds or iso-surfaces.

• Access Method: Download using FTP/HTTP and no programmatic access

• Data Format: compressed (modified bzip2) but not supported by popular libraries (eg RSL)

Page 7: We have data, now what?

Slide 7/19

NEXRAD data products

• Online data– original streamed data from NWS (compressed), searchable

from map and downloadable, most recent months.– Special event data (severe weather events)

• Data services– Uncompressed data (through data services)– Variable values (e.g., reflectivity, radial velocity)– Pre-generated 3D volumes

• Access methods– Data portal– THREDDS, OPeNDAP– Third party viewers (e.g., IDV, Java NEXRAD viewer)– Programming interfaces APIs (C++ library)– New: near real-time, interactive 3D visualization

Page 8: We have data, now what?

Slide 8/19

An End-to-End Integrated System

• Three important components:– Data Management

• Download required files from SRB and uncompress using modified bzip2

– Data Processing• Read the radar files using RSL• Process the data from

multiple sites • Convert them into render-able

3D volumes

– Visualization/Data Rendering• Import the volumetric data

from the disk.• Create 3D textures and slices

and apply the texture-based volume-rendering techniques.

• Utilize transfer functions to render the data on GPU.

Page 9: We have data, now what?

Slide 10/19

Scaling using Teragrid

• How to scale? Key Observations:– Spatial parallelism: between stations– Temporal parallelism: volumes generated for intervals are

indpendent– Data access can be parallel as well

• Two types of computation tasks– Processing per station per interval– Merging: combines 3D volumes from all sites and creates the

full 3D volume for each interval• Granularity of Parallelization

– Depends on the processing power available– Either fine grained (per site per interval ) or coarse grained (per

site )– Using Condor DAGMan to orchestrate jobs

Main Job

Processing Site 1

.......

Processing Site 2

Processing Site N

Merge

TeraGrid

Page 10: We have data, now what?

Slide 11/19

Example

Images rendered at different timestamps using a dataset from scanning a 24-hour supercell storm on March 12, 2006, in the Midwest region of the United States.

Page 11: We have data, now what?

Slide 12/19

Hurricane Ike reminant

• Hurricane Ike, data from 4 stations (3 in IL and 1 in IN) between 10-noon on Sept. 14, 2008

Page 12: We have data, now what?

Slide 13/19

A Service Architecture

Page 13: We have data, now what?

Slide 14/19

Services through multiple interfaces

• Expert use mode– Need to see details (large data, lots of processing), highly

interactive, ability to manipulate color mapping and other settings.

– With accelerated graphics hardware• Learning/casual use mode

– Simple interface, no learning curve– Does not require high degree of details

• Remote access mode– Through web browser– No special hardware– Need interactivity

• Application developers– Need API or web service interfaces to integrate with their

applications

Page 14: We have data, now what?

Slide 15/19

Workload distribution & Scalability

• Web 2.0 gadget for the masses– Data preproposed, rendered, composed into animation on

server; animation (or sequence of images) sent over web• Desktop client for maximum interactivity and performance

– Data preprocessed offline and 3D data volumes cached on server

– 3D Graphics rendering on user’s computer (GPU enabled)• Web browser access for interactivity but slower display

– Data preproposed offline, 3D volumes cached and rendered into 3D graphics

– Images sent over the network – User accesses the interactive application through a VNC

based Java applet

Page 15: We have data, now what?

Slide 16/19

Reach out to the masses

A LiveRadar3D Google gadget displaying 3D visualization of radar data, continuously updated with streaming data

Page 16: We have data, now what?

Slide 17/19

The fully Interactive 3D visualization Client

Page 17: We have data, now what?

Slide 18/19

3D Visualization of all stations

vinai
* Run a small commercial for Viz Gallery Entry* Include video if possible
Page 18: We have data, now what?

Slide 19/19

Summary

• Remote 3D visualization services delivered through multiple interfaces

• Application interface of data services for third party integration• An architecture that scales to different use scenarios• Parallel data pre-processing using the TeraGrid Condor

resources and partial volume caching which improve the response time and scalability of the system.

Continuing effort• User feedback• Scale – support multiple users simultaneously• Hierarchical 3D volume structure to support multi-scale

investigation

Page 19: We have data, now what?

Slide 20/19

Thank you!

Publications, URLs available.Feel free to contact Carol

Page 20: We have data, now what?

Slide 21/19

PRESTIGEPurdue Real-Time Satellite Information Gateway

• User Requirement– Receive continuous

data updates– Real-time or near-real-

time access– Custom-tailored data

configurations

• Current Systems– Impossible to generate

complete range of data products

– Have to route through the support staff

– Manual process which is time consuming and error-prone

Page 21: We have data, now what?

Slide 22/19

Range of MODIS Data Products

• Level 1A (MOD01)• Vegetation Index (MOD09)• Geolocation (MOD03)• Aerosol (MOD04)• Water Vapor (MOD05)• Clouds (MOD06)• Atmospheric Profiles (MOD07)• Reflectance (MOD09)• Snow (MOD10)• Fire Detection (MOD14)• Ocean Color (MOD18)• Sea Surface Temperature

(MOD28)• Sea Ice (MOD29)• Cloud Mask (MOD35)• Also Multiday composites of

above

Note that each data set product may contain a few to many variables.

Page 22: We have data, now what?

Slide 23/19

System Design

• User-driven publish/subscribe model

– Dynamic data generation

– User specifies, controls, and receives custom-tailored data

– Continuous data updates in near-real-time

– Multiple ways to access the data

Page 23: We have data, now what?

Slide 24/19

Page 24: We have data, now what?

Slide 25/19

Satellite Data Subscription

Page 25: We have data, now what?

Slide 26/19

Data Subscription

• Web portal based user interface– Choice list based option selection – Options include – Satellite, Coverage area, Data product,

Projection type and Data format– Ability to select date range for subscription validity– User-driven product choice expansion– Individual user-based subscriptions

• User-initiated data production – Data products generated only when some user is

subscribed to the product– Data production automatically turned off when no active

subscription exists

Page 26: We have data, now what?

Slide 27/19

Data Notification

• Push-based notifications– Near real-time delivery of new data notification through

email– Implemented by automatically invoking a web-service

from the processing cluster when new data is available– Subscription database used to query active subscriptions

• Data delivery mechanism– Data scp’ed from processing cluster to webserver-

accessible storage space– Thumbnail generated for images to provide a quick look

feature– Link to the webserver data location provided in the

notification email