efficient visualization and analysis of very large climate data
DESCRIPTION
Efficient Visualization and Analysis of Very Large Climate Data. Lawrence Livermore National Laboratory. Hank Childs, Lawrence Berkeley National Laboratory December 8, 2011. Problem Statement. Climate data is getting increasingly large (massively data!), both spatially and temporally - PowerPoint PPT PresentationTRANSCRIPT
Efficient Visualization and Analysis of Very Large Climate Data
Hank Childs, Lawrence Berkeley National Laboratory
December 8, 2011
Lawrence Livermore National Laboratory
Problem Statement
• Climate data is getting increasingly large (massively data!), both spatially and temporally
• Climate scientists want to process this data in lots of ways:– Analyze simulation recently run on their supercomputer– Download remote data to nearby supercomputer for
analysis– Download remote data to desktop computer for analysis– Send analysis routines to remote machine location,
results are returned.
VisIt is an open source, richly featured, turn-key application for large data.
• Used by:– Visualization experts– Simulation code developers– Simulation code consumers
• Popular– R&D 100 award in 2005– Used on many of the Top500– >>>100K downloads
• Developed by:– NNSA, SciDAC, NEAMS, NSF,
and more217 pin reactor cooling simulation
Run on ¼ of Argonne BG/P Image credit: Paul Fischer, ANL
1 billion grid points / time slice
VisIt employs a parallelized client-server architecture.
• Client-server observations:– Good for remote visualization– Leverages available resources– Scales well– No need to move data
remote machine
Parallel vis resources
Userdata
localhost – Linux, Windows, Mac
Graphics Hardware
VisIt recently demonstrated good performance at unprecedented scale.
● Weak scaling study: ~62.5M cells/core
5
#coresProblem Size
ModelMachine
8K0.5TIBM P5Purple
16K1TSunRanger
16K1TX86_64Juno
32K2TCray XT5JaguarPF
64K4TBG/PDawn
16K, 32K1T, 2TCray XT4Franklin
Two trillion cell data set, rendered in VisIt by David Pugmire on ORNL
Jaguar machine
VisIt’s data processing techniques are more than scalability at massive concurrency;
we are leveraging a suite of techniques developed over the last decade by VACET,
the NNSA, and more.
VisIt’s data processing techniques are more than scalability at massive concurrency;
we are leveraging a suite of techniques developed over the last decade by VACET,
the NNSA, and more.
VisIt is used to look at simulated and experimental data from many areas.
Fusion, Sanderson, UUtah
Particle accelerators, Ruebel, LBNL
Astrophysics, Childs
Nuclear Reactors, Childs
Problem Statement
• Climate data is getting increasingly massive, both spatially and temporally
• Climate scientists want to process this data in lots of ways:– Analyze simulation recently run on their supercomputer– Download remote data to nearby supercomputer for
analysis– Download remote data to desktop computer for analysis– Send analysis routines to remote machine location,
results are returned.
Mission accomplished?Mission accomplished?
General-purpose tools vs application-specific tools
• Are made specifically to solve your problem:– Streamlined user
interface– Application-specific
analysis• But, they have smaller user
and developer communities, so they are:– Less robust– Less efficient algorithms– Smaller set of features
• Are developed by large teams, leading to:– Robustness– Efficient algorithms– Rich set of features
General-purpose tools: Application-specific tools:
• But:– They aren’t streamlined
for a given application area (lots of button clicks)
– They don’t have application-specific methods
So which do users want?So which do users want?
Amazing developments over the last decade…
• Very useful packages now available that make quick tool development possible:– Python, R, VTK, Qt
• But this doesn’t solve the large data issue. – … but tools now are available that do that as well:
• VisIt & ParaView• Great idea: put all these products together into one tool (VisTrails).• Users get:
– The robustness, richness, and efficiency of large development efforts
– Streamlined user interface & climate-specific analysis (via CDAT)
This tool is UV-CDAT.This tool is UV-CDAT.
UV-CDAT
• UV-CDAT = Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)
• Goal: robust tool, capable of doing powerful analysis on large climate data sets.
• Collaboration between two DOE BER teams• One tool that incorporates many packages…
– … including “VisIt”
UV-CDAT Developers:Dean Williams (PI)Jim Ahrens, Andy Bauer, Dave Bader, Berk Geveci, Timo Bremer, Claudio Silva, David Partyka, Charles Doutriaux, Robert Drach, Emanuele Marques, Feiyi Wang, Jerry Potter, Galen Shipman, John Patchett, Phil Jones, Thomas Maxwell, and many more
Integrated UV-CDAT GUI: Project, Plot, and Variables View
P0P1
P3
P2
P8P7 P6
P5
P4
P9
Pieces of data
(on disk)
Read Process Render
Processor 0
Read Process Render
Processor 1
Read Process Render
Processor 2
Parallelized visualizationdata flow network
P0P0 P3P3P2P2
P5P5P4P4 P7P7P6P6
P9P9P8P8
P1P1
Parallel Simulation Code
Big data visualization tools use “pure parallelism” to process data.
This is a good approach for high resolution meshes
… but climate data is often different
This is a good approach for high resolution meshes
… but climate data is often different
Parallelizing Processing Over Time Slices: Improving Performance For Climate Data
Objective
Progress & ResultsImpact
VisIt’s parallel processing techniques were designed for single time slices of very high resolution meshes. We must adapt this approach for the lower resolution and high temporal frequency characteristic of climate data.
• We have modified VisIt’s underlying infrastructure to have a “parallelize over time” processing mode.
• We implemented a simple algorithm (“maximum value over time”) as a proof of concept
Climate scientists with access to parallel resources for their data will be able to process their data significantly faster through parallelization. This software investment will enable other algorithms developed by the project to also be accelerated through parallelization.
P0P0 P1P1 P2P2
VisIt parallelizing over a high resolution spatial
mesh
VisIt parallelizing temporally over a low
resolution mesh
P0P0 P1P1 P2P2
T=0T=0 T=1T=1 T=2T=2
T=3T=3 T=4T=4 T=5T=5
T=6T=6 T=7T=7 T=8T=8
Concurrency Average Time
Speedup
1 480.0s -
2 223.0s 2.1X
4 120.0s 4.0X
8 62.0s 7.8X
16 29.0s 16.5X
32 15.5s 31.0X
64 7.8s 61.5X
128 4.2s 114X
Scaling on 2130 time slices of NetCDF climate data (source: Wehner)
Scaling on 2130 time slices of NetCDF climate data (source: Wehner)
This is just one aspect of the uniqueness of climate data … many more.
This is just one aspect of the uniqueness of climate data … many more.
Success story: spatial extreme value analysis
• We were able to parallelize the spatial extreme value analysis:– Parallelization was both spatial and temporal– Parallelization was done with VisIt infrastructure, but
analysis was done with R (via VisIt-R linkage)• Ran on CCSM3.0, 100 year, daily precipitation data• Near perfect scaling on 36,500 time slices
Summary
• Visualization and analysis of massive data is possible– Other communities have built tools for this purpose– These tools are effective for:
• Local or remote data• Modest to massive parallelization (& serial too!)• Interactive or batch processing
• Our effort focuses on deploying VisIt and R as part of UV-CDAT.– VisIt is excellent with large data and has been adapted to
work with climate data.– We also are collaborating on cutting edge analysis
techniques with climate scientists
Example climate visualizations with VisIt