how to visualize your gpu-accelerated simulation...
TRANSCRIPT
RANGE OF ANALYSIS AND VIZ TASKS
Analysis: Focus quantitative
Visualization: Focus qualitative
Monitoring, Steering
TRADITIONAL HPC WORKFLOW
Workstation
Viz Cluster Supercomputer
File System
Setup
Dump,
Checkpointing Visualization,
Analysis
Analysis,
Visualization
TRADITIONAL WORKFLOW: CHALLENGES
Workstation
Viz Cluster Supercomputer
File System
Setup
Dump,
Checkpointing Visualization,
Analysis
Analysis,
Visualization
Lack of interactivity
prevents “intuition”
I/O becomes main
simulation bottleneck
Viz resources need
to scale with simulation
High-end viz
neglected due
to workflow
complexity
OUTLINE
Visualization applications
CUDA/OpenGL interop
Remote viz
Parallel viz
In-Situ viz
High-level overview. Some parts platform dependent. Check with your sysadmin.
NON-REPRESENTATIVE VIZ TOOLS SURVEY OF 25 HPC SITES
Surveyed sites:
NERSC LLNL-OCF
LLNL-SCF LANL
ORNL- CCS DOD-ORC
AFRL-DSCR AFRL ARL ERDC NAVY MHPCC ORS CCAC
NASA-NAS NASA-NCCS TACC CHPC RZG HLRN Julich CSCS CSC Hector Curie
NON-REPRESENTATIVE VIZ TOOLS SURVEY OF 25 HPC SITES
Surveyed sites:
NERSC LLNL-OCF
LLNL-SCF LANL
ORNL- CCS DOD-ORC
AFRL-DSCR AFRL ARL ERDC NAVY MHPCC ORS CCAC
NASA-NAS NASA-NCCS TACC CHPC RZG HLRN Julich CSCS CSC Hector Curie
VISIT Scalar, vector and tensor field data features
— Plots: contour, curve, mesh, pseudo-color, volume,..
— Operators: slice, iso-surface, threshold, binning,..
Quantitative and qualitative analysis/vis
— Derived fields, dimension reduction, line-outs
— Pick & query
Scalable architecture
Open source
http://wci.llnl.gov/codes/visit/
Cross-platform
— Linux/Unix, OSX, Windows
Wide range of data formats — .vtk, .netcdf, .hdf5,..
Extensible — Plugin architecture
Embeddable
Python scriptable
VISIT
VISIT’S SCALABLE ARCHITECTURE
Client-server architecture
Server MPI parallel
Distributed filtering
(multi-)GPU accelerated,
parallel rendering*
* requires X server on each node
PARAVIEW Scalar, vector and tensor field data features
— Plots: contour, curve, mesh, pseudocolor, volume,..
— Operators: slice, iso-surface, threshold, binning,..
Quantitative and qualitative analysis/vis
— Derived fields, dimension reduction, line-outs
— Pick & query
Scalable architecture
Developed by Kitware, open source
http://www.paraview.org
PARAVIEW’S SCALABLE ARCHITECTURE Client-server-server architecture
Server MPI parallel
Distributed filtering
GPU accelerated, parallel
rendering*
* requires X server on each node
SOME OTHER TOOLS
Wide range of visualization tools
Often emerged from specialized application domain
— Tecplot, EnSight: structural analysis, CFD
— IndeX: seismic data processing & visualization
— IDL: image processing
Early adopters of visual programming
— AVS/Express, OpenDX
FURTHER READING
Paraview Tutorial:
http://www.paraview.org/Wiki/The_ParaView_Tutorial
VisIt Manuals/Tutorials:
http://wci.llnl.gov/codes/visit/manuals.html
VISUALIZATION TOOLKIT
Focus on visualization, not (only) rendering
Provides more complex operations on data (“filtering”)
Introduces visualization pipeline
At the core of many high-level viz tools
http://www.vtk.org
VTK PIPELINE
Source
Filter
Mapper
Renderer
Raw data, shapes
Transform raw data
Map data to geometry
Render the geometry
OPENGL: API FOR GPU ACCELERATED RENDERING
• Primitives: points, lines, polygons
• Properties: colors, lighting, textures, ..
• View: camera position and perspective
• Shaders: Rendering to screen/framebuffer
• C-style functions, enums
See e.g. “What Every CUDA Programmer Should Know About OpenGL”
(http://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf)
A SIMPLE OPENGL EXAMPLE
glColor3f(1.0f,0,0);
glBegin(GL_QUADS);
glVertex3f(-1.0f, -1.0f, 0.0f); // The bottom left corner
glVertex3f(-1.0f, 1.0f, 0.0f); // The top left corner
glVertex3f(1.0f, 1.0f, 0.0f); // The top right corner
glVertex3f(1.0f, -1.0f, 0.0f); // The bottom right corner
glEnd();
glFlush();
State-based API
(sticky attributes)
Drawing
Render to screen
glColor3f(1.0f,0,0);
glBegin(GL_QUADS);
glVertex3f(-1.0f, -1.0f, 0.0f);
glVertex3f(-1.0f, 1.0f, 0.0f);
glVertex3f(1.0f, 1.0f, 0.0f);
glVertex3f(1.0f, -1.0f, 0.0f);
glEnd();
glFlush();
float* vert={-1.0f, -1.0f, ..};
float* d_vert;
cudaMalloc(&d_vert, n);
cudaMemcpy(d_vert, vert, n,
cudaMemcpyHostToDevice);
renderQuad<<<N/128, N>>>(d_vert);
flushToScreen<<<..>>>();
?
CUDA-OPENGL INTEROP: MAPPING MEMORY
• OpenGL: Opaque data buffer object
• Virtex Buffer Object (VBO)
• User has very limited control
• CUDA: C-style memory management
• User has full control
• CUDA-OpenGL Interop:
Map/Unmap OpenGL buffers into CUDA memory space
RENDERING FROM A VBO WITH CUDA INTEROP
Create VBO
Initialize VBO
Populate
Render
Register VOB with CUDA
Map VOB to CUDA
Unmap VBO from CUDA
init
()
dis
pla
y()
Display
MAIN OPENGL-CUDA INTEROP ROUTINES Register VBO for CUDA
cudaGraphicsGLRegisterBuffer(cuda_vbo, *vbo, flags);
Mapping of VBO to CUDA
cudaGraphicsMapResources(1, cuda_vbo, 0);
cudaGraphicsResourceGetMappedPointer(&d_ptr, size,
*cuda_vbo);
Unmapping VBO from CUDA
cudaGraphicsUnmapResources(1, cuda_vbo, 0);
CAN ALL GPUS SUPPORT OPENGL?
GeForce : standard feature set including OpenGL 4.3/4.4
Quadro : + certain highly accelerated features (e.g. CAD)
Tesla: If GPU is in “All on”
operation mode*
nvidia-smi –-query-gpu=gom.current
nvidia-smi –q
*requires “m” or “X” class devices
Graphics capabilities disabled
Graphics capabilities enabled
OPENGL SUMMARY
+ Relatively simple for basic viz
+ Existing/fixed/simple rendering pipeline
- Low level
- Triangles, points, rather than isosurfaces, height-fields
- (currently) depends on Xserver for context creation
- What about remote, parallel etc?
MORE OPENGL INFORMATION AT GTC
S4455 - Multi-GPU Rendering, 03/24, 14:30, 210A
S4825 - Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's latest Developer Tools Suite, 03/24, 14:30, 210E
S4379 - OpenGL 4.4 Scene Rendering Techniques, 3/25, 13:00, 210C
S4810 - NVIDIA Path Rendering: Accelerating Vector Graphics for the Mobile Web,
3/25, 13:30, LL21C
S4610 - OpenGL: 2014 and Beyond, 3/25, 14:00, 210C
S4385 - Order Independent Transparency in OpenGL, 3/25, 210C
OPENGL CONTEXT
State of an OpenGL instance
— Incl. viewable surface
— Interface to windowing system
Context creation: platform specific
— Not part of OpenGL
— Handled by Xserver in Linux/Unix-like systems
GLX: Interaction X<-> OpenGL
LOCAL RENDERING
Application
libGL Xlib
GLX
X11
Events
Com
mands
Driver
GPU, monitor attached
2D/3D X Server
OpenGL
X-FORWARDING: THE SIMPLEST FORM OF “REMOTE” RENDERING
Application
libGL Xlib
OpenGL/GLX
X11 Events
X11 Commands
Driver
GPU, monitor attached
2D/3D X Server On remote system: export DISPLAY=59.151.136.110:0.0
Network
X-FORWARDING + Simple!
+ ssh –X
+ No need to run X on remote machine
- Lots of data crosses the network
- All rendering performed on the local Xserver
- Not useful for visualization of remote CUDA Interop apps
(X-server doesn’t “see” remote GPU)
SERVER-SIDE RENDERING + REMOTE VIZ APPLICATION
Driver
GPU
2D/3D X Server
Application
libGL Xlib
GLX
OpenGL
X11
Events X11
Cmds
Driver
GPU, monitor attached
Network
Client
SERVER-SIDE RENDERING + SCRAPING
Driver
GPU
2D/3D X Server
Application
libGL Xlib
GLX
OpenGL
X11
Events X11
Cmds
Driver
GPU, monitor attached
Network
Scra
per
Images
Client
SERVER-SIDE RENDERING + SCRAPING
+ Full GPU acceleration
+ No X server on client
Question: when to scrape?
— Xserver not informed about direct rendering
— Intercept glxSwapBuffers()
- Not multi-user
=> Occasionally used for remote desktop tools
GLX FORKING: OUT-OF-PROCESS
Driver
GPU
3D X Server
Application
libGL Xlib
OpenGL Images
Images
Driver
GPU, monitor attached
Client App
Network
X11
Events X11
Cmds
Proxy X Server
OpenGL/
GLX
GLX
GLX FORKING: OUT-OF-PROCESS
+ Full GPU acceleration
+ No X server on client
+ Multi-User
- All traffic through Proxy X server
=> Occasionally used for remote desktop tools
GLX FORKING WITH INTERPOSER LIBRARY
Driver
GPU
3D X Server
Application
libGL VirtualGL Xlib
VirtualGL
client GLX
OpenGL Images Images
X11 Events
X11 Commands
VGL transport
(compressed)
Driver
GPU, monitor attached
2D X Server
Network
GLX FORKING WITH INTERPOSER LIBRARY
Driver
GPU
3D X Server
Application
libGL VirtualGL Xlib
GLX
OpenGL Images
Driver
GPU, monitor attached
Network
X11
Events
X11
Cmds
Proxy X
Server Images
Client App
VIRTUAL GL + TURBOVNC
+ Compressed image transport
+ Transparent to the application
+ Fully GPU accelerated OpenGL
+ Client with or without Xserver
- Requires Xserver to access GPU
http://www.virtualgl.org
HOW TO SET UP VIRTUAL GL + TURBOVNC
Requires Xserver running on server
— Root privileges for Xserver
Requires installation of VirtualGL, TurboVNC on server
Start VirtualGL-accelerated VNC server
vncserver :3
TurboVNC viewer on client
— Linux, Windows, Javascript
CONNECTING TO REMOTE VNC SERVER VIA A GATEWAY
Establish tunnel on client
ssh –L 3333:node01.cluster.net:5903 login.cluster.net
Connect client to localhost:3333
Gateway
node01.cluster.net login.cluster.net client
port 5903 port 3333
LAUNCHING REMOTE CUDA/OPENGL APPLICATIONS VIA VGLRUN
vglrun :3 glxgears
export DISPLAY=:3
vglrun simpleGL
REMOTE VIZ IN PARAVIEW/VISIT
Approach 1: VirtualGL and VNC
export DISPLAY=:3
vglrun paraview
Approach 2: Local Client
— On remote server:
pvserver
On local workstation:
paraview
Paraview Client & Server
under VirtualGL
REMOTE VISUALIZATION SUMMARY
Multiple approaches to remote rendering
— X forwarding, remote viz app, scraping, interposer process/library
Currently requires X server to generate context
— EGL will fix this, but requires application changes
PARALLEL VISUALIZATION
Parallelism at multiple levels
— Filtering
— Rendering
- Both supported by VisIt & Paraview
- Heavy lifting already done!
- Challenge: Setup in parallel environment
- Both tools provide support for most common cases
- Both VisIt & Paraview MPI parallel -> need custom build
BASIC STEPS TO PARALLEL VISUALIZATION
Launch visualization server processes
— Most likely through queuing system
Connect client to head node
Tunnel from workstation
Setting up virtualgl on remote node
DOMAIN DECOMPOSITION OF DATA
- Visualization algorithms can work on
decomposed data
- May require ghost cells
- May lead to load imbalance
No ghost cells required
Ghost cells required
PARALLEL COMPOSITING
- Distributed geometry
- Render with depth
information
- Composition using IceT
http://icet.sandia.gov
PARALLEL VISUALIZATION
Particularly important for large datasets
Visualization time often determined by filtering
Rendering important if
— Highly complex visualization
— Complex visualization effects
— Low-power CPU
Transparent support in Paraview, VisIt
BENEFIT OF IN-SITU VISUALIZATION
Pipeline simulation cycle
— Visualization/analysis integral part of simulation
Immediate feedback
Reduce pressure on file system
In some (future) cases: only way to analyze/visualize data
INTERACTIVE VIZ APPLICATION WITH LIBSIM User implements callbacks for VisIt
— meta-data, mesh, variables and domains
GetData: VisIt requests data for visualization
— Work directly on simulation data
— Transfer data from GPU
Command server: Interaction VisIt front-end <-> simulation
— “Steering”
http://www.visitusers.org/index.php?title=VisIt-
tutorial-in-situ
VISUALIZATION WITH GPU ACCELERATED FILTER AND HARDWARE RENDERING
CPU
GPU
Simulation
Filtering
Rendering Filtering
SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT SCALE (2011-2016)
Management, Analysis, Visualization
— In-situ analysis, indexing/compression
— I/O-, Viz frameworks
— Supporting application teams
SDAV tools deployment: Paraview, VisIt, IceT, ..
http://www.sdav-scidac.org/
SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT SCALE (2011-2016)
PISTON (LANL): Data Parallel Visualization Operators
— Isosurface, cut, threshold
— Built on top of Thrust
— Support of most Paraview operators
— Incorporation into VTK
DAX (Sandia), DIY (Argonne), EVAL (ORNL)
S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL
4620 - DAX: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale
VIZ TALKS AT GTC2014 S4571 - Applications of GPU Computing to Mission Design and Satellite Operations at NASA's Goddard Space Flight
Center
Abel Brown ( Principal Systems Engineer, A.I. Solutions
S4632 - Exploring the Earth in 3D: Multiple GPUs for Accelerating Inverse Imaging
S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL
S4620 - Dax: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale
S4410 - Visualization and Analysis of Petascale Molecular Simulations with VMD
S4745 - Now You See It: Unmasking Nuclear and Radiological Threats Around the World
S4203 - Gesture-Based Interactive Visualization of Large-Scale Data using GPU and Latest Web Technologies
S4516 - Scientific Data Visualization on GPU-Enabled, Hybrid HPC Systems
S4599 - An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling
S4778 - Interactive Processing and Visualization of Geospatial Imagery
S4140 - Live, Interactive, In-Situ, In-GPU Visualization of Plasma Simulations Running on GPU Supercomputers
S4400 - Petascale Molecular Ray Tracing: Accelerating VMD/Tachyon with OptiX
S4811 - Extreme Machine Learning with GPUs
SUMMARY
Different methods of visualization at different levels
— OpenGL, VTK, VisIt & Paraview
Remote visualization concepts
— X forwarding, Viz app, scraping, GLX forking
Parallel rendering and compositing
— Handled transparently in key tools
In-situ visualization concepts
— Expose simulation variables to visualization tool, interactive viz
Future directions
— GPU accelerated filtering and rendering
Thanks to Gilles Fourestey (CSCS), Nina Suvanphim (Cray), Jean Favre (CSCS), Adam DeConinck (NVIDIA), Robert Crovella (NVIDIA), Dale Southard (NVIDIA), Hank Childs (U Oregon), Jeremy Meredith (ORNL), Ian
Williams (NVIDIA), Steve Parker (NVIDIA), Kitware, Paraview, VisIt and VirtualGL developers for their support!
ABSTRACT (FOR REFERENCE ONLY) Learn how to take advantage of GPUs to visualize results of your GPU-accelerated simulation! This session will cover a broad range of visualization and analysis techniques allowing you to investigate your data on the fly. Starting with some basic CUDA/OpenGL interoperability, we will introduce more sophisticated data models allowing you to take advantage of widely used tools like ParaView and VisIt to visualize your GPU resident data. Questions like parallel compositing, remote visualization and application steering will be addressed in order to allow you to take full advantage of the GPUs installed in your supercomputing system.