how to visualize your gpu-accelerated simulation...

HOW TO VISUALIZE YOUR

GPU-ACCELERATED SIMULATION RESULTS

Peter Messmer, NVIDIA

RANGE OF ANALYSIS AND VIZ TASKS

Analysis: Focus quantitative

Visualization: Focus qualitative

Monitoring, Steering

TRADITIONAL HPC WORKFLOW

Workstation

Viz Cluster Supercomputer

File System

Setup

Dump,

Checkpointing Visualization,

Analysis

Analysis,

Visualization

TRADITIONAL WORKFLOW: CHALLENGES

Workstation

Viz Cluster Supercomputer

File System

Setup

Dump,

Checkpointing Visualization,

Analysis

Analysis,

Visualization

Lack of interactivity

prevents “intuition”

I/O becomes main

simulation bottleneck

Viz resources need

to scale with simulation

High-end viz

neglected due

to workflow

complexity

OUTLINE

Visualization applications

CUDA/OpenGL interop

Remote viz

Parallel viz

In-Situ viz

High-level overview. Some parts platform dependent. Check with your sysadmin.

VISUALIZATION APPLICATIONS

NON-REPRESENTATIVE VIZ TOOLS SURVEY OF 25 HPC SITES

Surveyed sites:

NERSC LLNL-OCF

LLNL-SCF LANL

ORNL- CCS DOD-ORC

AFRL-DSCR AFRL ARL ERDC NAVY MHPCC ORS CCAC

NASA-NAS NASA-NCCS TACC CHPC RZG HLRN Julich CSCS CSC Hector Curie

VISIT Scalar, vector and tensor field data features

— Plots: contour, curve, mesh, pseudo-color, volume,..

— Operators: slice, iso-surface, threshold, binning,..

Quantitative and qualitative analysis/vis

— Derived fields, dimension reduction, line-outs

— Pick & query

Scalable architecture

Open source

http://wci.llnl.gov/codes/visit/

https://wci.llnl.gov/codes/visit/

Cross-platform

— Linux/Unix, OSX, Windows

Wide range of data formats — .vtk, .netcdf, .hdf5,..

Extensible — Plugin architecture

Embeddable

Python scriptable

VISIT

VISIT’S SCALABLE ARCHITECTURE

Client-server architecture

Server MPI parallel

Distributed filtering

(multi-)GPU accelerated,

parallel rendering*

* requires X server on each node

PARAVIEW Scalar, vector and tensor field data features

— Plots: contour, curve, mesh, pseudocolor, volume,..

— Operators: slice, iso-surface, threshold, binning,..

Quantitative and qualitative analysis/vis

— Derived fields, dimension reduction, line-outs

— Pick & query

Scalable architecture

Developed by Kitware, open source

http://www.paraview.org

https://wci.llnl.gov/codes/visit/

PARAVIEW’S SCALABLE ARCHITECTURE Client-server-server architecture

Server MPI parallel

Distributed filtering

GPU accelerated, parallel

rendering*

* requires X server on each node

SOME OTHER TOOLS

Wide range of visualization tools

Often emerged from specialized application domain

— Tecplot, EnSight: structural analysis, CFD

— IndeX: seismic data processing & visualization

— IDL: image processing

Early adopters of visual programming

— AVS/Express, OpenDX

FURTHER READING

Paraview Tutorial:

http://www.paraview.org/Wiki/The_ParaView_Tutorial

VisIt Manuals/Tutorials:

http://wci.llnl.gov/codes/visit/manuals.html

VTK - THE VISUALIZATION TOOLKIT

VISUALIZATION TOOLKIT

Focus on visualization, not (only) rendering

Provides more complex operations on data (“filtering”)

Introduces visualization pipeline

At the core of many high-level viz tools

http://www.vtk.org

VTK PIPELINE

Source

Filter

Mapper

Renderer

Raw data, shapes

Transform raw data

Map data to geometry

Render the geometry

OpenGL

OPENGL

From the CUDA perspective

OPENGL: API FOR GPU ACCELERATED RENDERING

• Primitives: points, lines, polygons

• Properties: colors, lighting, textures, ..

• View: camera position and perspective

• Shaders: Rendering to screen/framebuffer

• C-style functions, enums

See e.g. “What Every CUDA Programmer Should Know About OpenGL”

(http://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf)

A SIMPLE OPENGL EXAMPLE

glColor3f(1.0f,0,0);

glBegin(GL_QUADS);

glVertex3f(-1.0f, -1.0f, 0.0f); // The bottom left corner

glVertex3f(-1.0f, 1.0f, 0.0f); // The top left corner

glVertex3f(1.0f, 1.0f, 0.0f); // The top right corner

glVertex3f(1.0f, -1.0f, 0.0f); // The bottom right corner

glEnd();

glFlush();

State-based API

(sticky attributes)

Drawing

Render to screen

glColor3f(1.0f,0,0);

glBegin(GL_QUADS);

glVertex3f(-1.0f, -1.0f, 0.0f);

glVertex3f(-1.0f, 1.0f, 0.0f);

glVertex3f(1.0f, 1.0f, 0.0f);

glVertex3f(1.0f, -1.0f, 0.0f);

glEnd();

glFlush();

float* vert={-1.0f, -1.0f, ..};

float* d_vert;

cudaMalloc(&d_vert, n);

cudaMemcpy(d_vert, vert, n,

cudaMemcpyHostToDevice);

renderQuad<<<N/128, N>>>(d_vert);

flushToScreen<<<..>>>();

?

CUDA-OPENGL INTEROP: MAPPING MEMORY

• OpenGL: Opaque data buffer object

• Virtex Buffer Object (VBO)

• User has very limited control

• CUDA: C-style memory management

• User has full control

• CUDA-OpenGL Interop:

Map/Unmap OpenGL buffers into CUDA memory space

RENDERING FROM A VBO

Create VBO

Initialize VBO

Populate

Render

init

()

dis

pla

y()

Display

RENDERING FROM A VBO WITH CUDA INTEROP

Create VBO

Initialize VBO

Populate

Render

Register VOB with CUDA

Map VOB to CUDA

Unmap VBO from CUDA

init

()

dis

pla

y()

Display

MAIN OPENGL-CUDA INTEROP ROUTINES Register VBO for CUDA

cudaGraphicsGLRegisterBuffer(cuda_vbo, *vbo, flags);

Mapping of VBO to CUDA

cudaGraphicsMapResources(1, cuda_vbo, 0);

cudaGraphicsResourceGetMappedPointer(&d_ptr, size,

*cuda_vbo);

Unmapping VBO from CUDA

cudaGraphicsUnmapResources(1, cuda_vbo, 0);

CAN ALL GPUS SUPPORT OPENGL?

GeForce : standard feature set including OpenGL 4.3/4.4

Quadro : + certain highly accelerated features (e.g. CAD)

Tesla: If GPU is in “All on”

operation mode*

nvidia-smi –-query-gpu=gom.current

nvidia-smi –q

*requires “m” or “X” class devices

Graphics capabilities disabled

Graphics capabilities enabled

OPENGL SUMMARY

+ Relatively simple for basic viz

+ Existing/fixed/simple rendering pipeline

- Low level

- Triangles, points, rather than isosurfaces, height-fields

- (currently) depends on Xserver for context creation

- What about remote, parallel etc?

MORE OPENGL INFORMATION AT GTC

S4455 - Multi-GPU Rendering, 03/24, 14:30, 210A

S4825 - Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's latest Developer Tools Suite, 03/24, 14:30, 210E

S4379 - OpenGL 4.4 Scene Rendering Techniques, 3/25, 13:00, 210C

S4810 - NVIDIA Path Rendering: Accelerating Vector Graphics for the Mobile Web,

3/25, 13:30, LL21C

S4610 - OpenGL: 2014 and Beyond, 3/25, 14:00, 210C

S4385 - Order Independent Transparency in OpenGL, 3/25, 210C

REMOTE VISUALIZATION

OPENGL CONTEXT

State of an OpenGL instance

— Incl. viewable surface

— Interface to windowing system

Context creation: platform specific

— Not part of OpenGL

— Handled by Xserver in Linux/Unix-like systems

GLX: Interaction X<-> OpenGL

LOCAL RENDERING

Application

libGL Xlib

GLX

X11

Events

Com

mands

Driver

GPU, monitor attached

2D/3D X Server

OpenGL

X-FORWARDING: THE SIMPLEST FORM OF “REMOTE” RENDERING

Application

libGL Xlib

OpenGL/GLX

X11 Events

X11 Commands

Driver


2D/3D X Server On remote system: export DISPLAY=59.151.136.110:0.0

Network

X-FORWARDING + Simple!

+ ssh –X

+ No need to run X on remote machine

- Lots of data crosses the network

- All rendering performed on the local Xserver

- Not useful for visualization of remote CUDA Interop apps

(X-server doesn’t “see” remote GPU)

SERVER-SIDE RENDERING + REMOTE VIZ APPLICATION

Driver

GPU

2D/3D X Server

Application

libGL Xlib

GLX

OpenGL

X11

Events X11

Cmds

Driver


Network

Client

SERVER-SIDE RENDERING + SCRAPING

Driver

GPU

2D/3D X Server

Application

libGL Xlib

GLX

OpenGL

X11

Events X11

Cmds

Driver


Network

Scra

per

Images

Client

SERVER-SIDE RENDERING + SCRAPING

+ Full GPU acceleration

+ No X server on client

Question: when to scrape?

— Xserver not informed about direct rendering

— Intercept glxSwapBuffers()

- Not multi-user

=> Occasionally used for remote desktop tools

GLX FORKING: OUT-OF-PROCESS

Driver

GPU

3D X Server

Application

libGL Xlib

OpenGL Images

Images

Driver


Client App

Network

X11

Events X11

Cmds

Proxy X Server

OpenGL/

GLX

GLX

GLX FORKING: OUT-OF-PROCESS

+ Full GPU acceleration

+ No X server on client

+ Multi-User

- All traffic through Proxy X server

=> Occasionally used for remote desktop tools

GLX FORKING WITH INTERPOSER LIBRARY

Driver

GPU

3D X Server

Application

libGL VirtualGL Xlib

VirtualGL

client GLX

OpenGL Images Images

X11 Events

X11 Commands

VGL transport

(compressed)

Driver


2D X Server

Network

GLX FORKING WITH INTERPOSER LIBRARY

Driver

GPU

3D X Server

Application

libGL VirtualGL Xlib

GLX

OpenGL Images

Driver


Network

X11

Events

X11

Cmds

Proxy X

Server Images

Client App

VIRTUAL GL + TURBOVNC

+ Compressed image transport

+ Transparent to the application

+ Fully GPU accelerated OpenGL

+ Client with or without Xserver

- Requires Xserver to access GPU

http://www.virtualgl.org

HOW TO SET UP VIRTUAL GL + TURBOVNC

Requires Xserver running on server

— Root privileges for Xserver

Requires installation of VirtualGL, TurboVNC on server

Start VirtualGL-accelerated VNC server

vncserver :3

TurboVNC viewer on client

— Linux, Windows, Javascript

CONNECTING CLIENT TO REMOTE SERVER

CONNECTING TO REMOTE VNC SERVER VIA A GATEWAY

Establish tunnel on client

ssh –L 3333:node01.cluster.net:5903 login.cluster.net

Connect client to localhost:3333

Gateway

node01.cluster.net login.cluster.net client

port 5903 port 3333

LAUNCHING REMOTE CUDA/OPENGL APPLICATIONS VIA VGLRUN

vglrun :3 glxgears

export DISPLAY=:3

vglrun simpleGL

REMOTE VIZ IN PARAVIEW/VISIT

Approach 1: VirtualGL and VNC

export DISPLAY=:3

vglrun paraview

Approach 2: Local Client

— On remote server:

pvserver

On local workstation:

paraview

Paraview Client & Server

under VirtualGL

REMOTE VISUALIZATION SUMMARY

Multiple approaches to remote rendering

— X forwarding, remote viz app, scraping, interposer process/library

Currently requires X server to generate context

— EGL will fix this, but requires application changes

Parallel Viz

PARALLEL VISUALIZATION


Parallelism at multiple levels

— Filtering

— Rendering

- Both supported by VisIt & Paraview

- Heavy lifting already done!

- Challenge: Setup in parallel environment

- Both tools provide support for most common cases

- Both VisIt & Paraview MPI parallel -> need custom build

BASIC STEPS TO PARALLEL VISUALIZATION

Launch visualization server processes

— Most likely through queuing system

Connect client to head node

Tunnel from workstation

Setting up virtualgl on remote node

DOMAIN DECOMPOSITION OF DATA

- Visualization algorithms can work on

decomposed data

- May require ghost cells

- May lead to load imbalance

No ghost cells required

Ghost cells required

PARALLEL COMPOSITING

- Distributed geometry

- Render with depth

information

- Composition using IceT

http://icet.sandia.gov


Particularly important for large datasets

Visualization time often determined by filtering

Rendering important if

— Highly complex visualization

— Complex visualization effects

— Low-power CPU

Transparent support in Paraview, VisIt

IN-SITU VISUALIZATION & STEERING

BENEFIT OF IN-SITU VISUALIZATION

Pipeline simulation cycle

— Visualization/analysis integral part of simulation

Immediate feedback

Reduce pressure on file system

In some (future) cases: only way to analyze/visualize data

LIBSIM IN VISIT: VISIT SERVER AS A LIBRARY

CONNECTING TO A RUNNING APPLICATION

BUILD VISUALIZATION PIPELINE FOR RUNNING APPLICATION

BASIC INTERACTION/STEERING WITH RUNNING APPLICATION

INTERACTIVE VIZ APPLICATION WITH LIBSIM User implements callbacks for VisIt

— meta-data, mesh, variables and domains

GetData: VisIt requests data for visualization

— Work directly on simulation data

— Transfer data from GPU

Command server: Interaction VisIt front-end <-> simulation

— “Steering”

http://www.visitusers.org/index.php?title=VisIt-

tutorial-in-situ

The Future

THE FUTURE

VISUALIZATION DATA FLOW

CPU

GPU

Simulation

Filtering Rendering

VISUALIZATION WITH GPU ACCELERATED FILTER

CPU

GPU

Simulation

Filtering Rendering

Filtering

VISUALIZATION WITH GPU ACCELERATED FILTER AND HARDWARE RENDERING

CPU

GPU

Simulation

Filtering

Rendering Filtering

FULLY GPU ACCELERATED VISUALIZATION

CPU

GPU

Simulation Rendering Filtering

SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT SCALE (2011-2016)

Management, Analysis, Visualization

— In-situ analysis, indexing/compression

— I/O-, Viz frameworks

— Supporting application teams

SDAV tools deployment: Paraview, VisIt, IceT, ..

http://www.sdav-scidac.org/

SDAV – SCIDAC-3 INSTITUTE ON VIZ/ANALYSIS AT SCALE (2011-2016)

PISTON (LANL): Data Parallel Visualization Operators

— Isosurface, cut, threshold

— Built on top of Thrust

— Support of most Paraview operators

— Incorporation into VTK

DAX (Sandia), DIY (Argonne), EVAL (ORNL)

S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL

4620 - DAX: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale

VIZ TALKS AT GTC2014 S4571 - Applications of GPU Computing to Mission Design and Satellite Operations at NASA's Goddard Space Flight

Center

Abel Brown ( Principal Systems Engineer, A.I. Solutions

S4632 - Exploring the Earth in 3D: Multiple GPUs for Accelerating Inverse Imaging

S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL

S4620 - Dax: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale

S4410 - Visualization and Analysis of Petascale Molecular Simulations with VMD

S4745 - Now You See It: Unmasking Nuclear and Radiological Threats Around the World

S4203 - Gesture-Based Interactive Visualization of Large-Scale Data using GPU and Latest Web Technologies

S4516 - Scientific Data Visualization on GPU-Enabled, Hybrid HPC Systems

S4599 - An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling

S4778 - Interactive Processing and Visualization of Geospatial Imagery

S4140 - Live, Interactive, In-Situ, In-GPU Visualization of Plasma Simulations Running on GPU Supercomputers

S4400 - Petascale Molecular Ray Tracing: Accelerating VMD/Tachyon with OptiX

S4811 - Extreme Machine Learning with GPUs

SUMMARY

Different methods of visualization at different levels

— OpenGL, VTK, VisIt & Paraview

Remote visualization concepts

— X forwarding, Viz app, scraping, GLX forking

Parallel rendering and compositing

— Handled transparently in key tools

In-situ visualization concepts

— Expose simulation variables to visualization tool, interactive viz

Future directions

— GPU accelerated filtering and rendering

Thanks to Gilles Fourestey (CSCS), Nina Suvanphim (Cray), Jean Favre (CSCS), Adam DeConinck (NVIDIA), Robert Crovella (NVIDIA), Dale Southard (NVIDIA), Hank Childs (U Oregon), Jeremy Meredith (ORNL), Ian

Williams (NVIDIA), Steve Parker (NVIDIA), Kitware, Paraview, VisIt and VirtualGL developers for their support!

ENJOY THE CONFERENCE AND SEE YOU AT GTC 2015!

ABSTRACT (FOR REFERENCE ONLY) Learn how to take advantage of GPUs to visualize results of your GPU-accelerated simulation! This session will cover a broad range of visualization and analysis techniques allowing you to investigate your data on the fly. Starting with some basic CUDA/OpenGL interoperability, we will introduce more sophisticated data models allowing you to take advantage of widely used tools like ParaView and VisIt to visualize your GPU resident data. Questions like parallel compositing, remote visualization and application steering will be addressed in order to allow you to take full advantage of the GPUs installed in your supercomputing system.

how to visualize your gpu-accelerated simulation...

Documents