seminario fabio marton, 4-10-2012

www.crs4.it/vic/

Massive Model RenderingMassive Model RenderingFabio Marton

CRS4

Visual Computing

F. Marton– CRS4/Visual Computing, October 2012

Goal: interactive inspection of Goal: interactive inspection of massive models on PC platforms…massive models on PC platforms…

Massive datasets rendered on a commodity PC


Application domains / data sourcesApplication domains / data sources

• Many important application domains

• Today’s models exceed

– O(108-1010) samples

– O(109-1011) bytes

Impossibile v isualizzare l'immagine. La memoria del computer potrebbe essere insufficiente per aprire l'immagine oppure l'immagine potrebbe essere danneggiata. Riavviare il computer e aprire di nuovo il file. Se v iene visualizzata di nuovo la x rossa, potrebbe essere necessario eliminare l'immagine e inserirla di nuovo.

Local Terrain Models2.5D – Flat – Dense regular

sampling

Planetary terrain models2.5D – Spherical – Dense

regular sampling

Laser scanned models – O(10 -10 ) bytes

• Varying

– Dimensionality

– Topology

– Sampling distribution

Laser scanned models3D – Moderately simple topology –

low depth complexity - dense

CAD models3D – complex topology – high

depth complexity – structured - ‘ugly’ mesh

Natural objects / Simulation results

3D – complex topology + high depth complexity + unstructured/high frequency details


The (minimal) challenge: realThe (minimal) challenge: real--time time rendering of massive static modelsrendering of massive static models

• Explore very large models at interactive rates

– Update screen at “interactive rates” as viewpoint changes

I/O

Mega Pixels/frameat 10/100 fps

Giga/Tera Bytes

Limited bandwidth(network/disk/RAM/CPU/PCIe/GPU/…)

I/O

Storage ScreenView parameters

Projection + Visibility + Shading


A realA real--time data filtering problem!time data filtering problem!

• Models of unbounded complexity on limited computers– Need for output-sensitive techniques (O(N), not O(K))

• We assume less data on screen (N) than in model (K →∞→∞→∞→∞)

– Need for memory-efficient techniques (maximize cache hits!)

– Need for parallel techniques (maximize CPU/GPU core – Need for parallel techniques (maximize CPU/GPU core usage)

I/O

Storage Screen

10-100 HzO(N=1M-100M) pixels

O(K=unbounded) bytes (triangles, points, …)


View parameters








I/O

Storage Screen




View parameters


SmallWorking Set


OutputOutput--sensitive techniquessensitive techniques

• At preprocessing time: build MR hierarchy– Data prefiltering!

– Visibility + simplification

– Not output sensitive

COARSE


• At run-time: selective view-dependent refinement from out-of-core data– Must be output sensitive

– Access to prefiltered data under real-time constraints

– Visibility + LOD

FINE


OutputOutput--sensitive techniquessensitive techniques

• At preprocessing time: build MR hierarchy– Data prefiltering!

– Visibility + simplification


FRONT


• At run-time: selective view-dependent refinement from out-of-core data– Must be output sensitive

– Access to prefiltered data under real-time constraints

– Visibility + LOD

Occluded / Out-of-view

Inaccurate

Accurate


Our contributionsOur contributionsGPUGPU--friendly outputfriendly output--sensitive techniquessensitive techniques

• Chunk-based multiresolutionstructures

– Combine space partitioning + level of detail

– Same structure used for visibility and detail culling

• Seamless combination of chunks– Dependencies ensure consistency at the level of

Partitioning and simplification

Adaptive rendering GPU

Cache

– Dependencies ensure consistency at the level of chunks

• Complex rendering primitives– GPU programming features

– Curvilinear patches, view-dependent voxels, …

• Chunk-based external memory management

– Compression/decompression, block transfers, caching

simplificationrendering GPU

Multiresolution structure (data+dependency)

Off-line On-line

Network / Bus



*-BDAM – Local and Global Terrain ModelsGobbetti/Marton (CRS4), Cignoni/Ganovelli/Ponchio/Scopigno (CNR)EG 2003, IEEE Viz 2003, EG 2005

Adaptive Tetrapuzzles – Dense meshesGobbetti/Marton (CRS4), Cignoni/Ganovelli/Ponchio/Scopigno (CNR)SIGGRAPH 2004

Layered Point Clouds – Dense clouds

Impossibile v isualizzare l'immagine. La memoria del computer potrebbe essere insufficiente per aprire l'immagine

oppure l'immagine potrebbe essere danneggiata. Riavviare il computer e aprire di nuovo il file. Se v iene visualizzata di nuovo la x rossa, potrebbe essere necessario eliminare l'immagine e inserirla di nuovo.

Layered Point Clouds – Dense cloudsGobbetti/Marton (CRS4)SPBG 2004 / Computers & Graphics 2004

Far Voxels – General Gobbetti/Marton (CRS4)SIGGRAPH 2005

MOVR – Volumetric models Gobbetti/Marton/Iglesias Guitian (CRS4)CGI 2008

Blockmaps – Hybrid volumetric city modelGobbetti/Marton (CRS4), Cignoni/Ganovelli/Di Benedetto/Scopigno (CNR)EG 2007








RASTERIZATION





RAYCASTING








MESH-BASED FRAMEWORK





MESH-LESS FRAMEWORK








Chunked Multi-TriangulationsGobbetti/Marton (CRS4), Cignoni/Ganovelli/Ponchio/Scopigno

(CNR) IEEE Viz 2005

Specialize





Generalize

Specialize

View-dep.VolumetricModelIn progress

Generalize


RealReal--time adaptive meshestime adaptive meshes

• The problem: efficiently create view-dependent meshes

• Constraints:

– must approximate original surface with controlled surface with controlled screen-space error

– must preserve continuity (conforming meshes)

– must handle meshes of varying topology

– must be efficiently rendered


Chunked Multi TriangulationsChunked Multi TriangulationsThe Multi Triangulation FrameworkThe Multi Triangulation Framework

• Theoretical basis

– MT multiresolutionframework (Puppo 1996)

• Our contribution

– GPU friendly implementation

Partitioning and simplification


Cache

– GPU friendly implementation based on surface chunks with boundary constraints

– Optimized implicit specializations (TetraPuzzles/V-Partitions)

– Parallel out-of-core pre-processing and out-of-core run-time Cignoni, Ganovelli, Gobbetti, Marton, Ponchio, and Scopigno.

Batched Multi Triangulation .In Proc. IEEE Visualization. Pages 207-214. October 2005.

Multiresolutionstructure (data+dependency)

Off-line On-line

Network / Bus



• Consider a sequence of local modifications over a given description D

– Each modification replaces a portion of the domain with a different conforming portion different conforming portion (simplified)

– f1 floor

– g1 the new fragment

D’=D \ f∪ gDi+1=Di⊕ gi+1



• Dependencies between modifications can be arranged in a DAG



• Dependencies between modifications can be arranged in a DAG

– Adding a sink to – Adding a sink to the DAG we can associate each fragment to an arc leaving a node


Chunked Multi TriangulationsChunked Multi TriangulationsMT CutsMT Cuts

• A cut of the DAG defines a new representation

– Just paste all the fragments above the cutcut

D*=D0 ⊕ g1 ⊕ g4


Chunked Multi TriangulationsChunked Multi TriangulationsMT CutsMT Cuts

• A cut of the DAG defines a new representation

– Collect all the fragment floors of cut arcs and you get a new conforming meshmesh

D*=D0 ⊕ g1 ⊕ g4 = f0∞ ∪ f02 ∪ f03 ∪ f13 ∪ f1∞ ∪ f4∞


Chunked Multi TriangulationsChunked Multi TriangulationsGPU Friendly MT GPU Friendly MT

• Chunked MT assume fragments are triangle patches with proper boundary constraints

– DAG << original mesh (patches composed by (patches composed by thousands of tri)

– Structure memory + traversal overhead amortized over thousands of triangles

– Per-patch optimizations


Chunked Multi TriangulationsChunked Multi TriangulationsGPU Friendly MT GPU Friendly MT

• Chunked MT assume regions provide good hierarchical space-partitioning

– Compact• Close-to-spherical• Close-to-spherical

– Used for computing fast projected error upper bounds

– Used for visibility queries


Chunked Multi TriangulationsChunked Multi TriangulationsGPU Friendly MTGPU Friendly MT

• Construction– Start with hires triangle soup

– Partition model using a hierarchical space partitioning scheme

– Construct non-leaf cells by bottom-up recombination bottom-up recombination and simplification of lower level cells

– Assign model space errorsto cells

• Rendering– Refine conformal hierarchy,

render selected precomputed cells

– Project errors to screen

– Dual queue


Cache

On-line


Chunked Multi TriangulationsChunked Multi TriangulationsDAG problemsDAG problems

• Not all MTs are good MTs!

– The topology of dependenciesmay lower the adaptivity of the multiresolution structure

• Cascading dependencies are BAD!!!

– The geometry of DAG regionsmay cause problems in view-dependent renderingdependent rendering

• Compact regions

• Proposed solutions:

– SIGGRAPH 2004: Efficient constrained technique (TetraPuzzles)

– IEEE Viz 2005: General construction technique (V-Partition)

– … see also QVDR, IEEE Viz 2004 and other related work…


Adaptive Adaptive TetraPuzzlesTetraPuzzles

• Construction

– Start with hires triangle soup

– Partition model using a conformal hierarchy of tetrahedratetrahedra

– Construct non-leaf cells by bottom-up recombinationand simplification of lowerlevel cells

• Rendering

– Refine conformalhierarchy, render selectedprecomputed cells


Adaptive Adaptive TetraPuzzlesTetraPuzzles

• Construction



– Construct non-leaf cells by bottom-up recombination and simplification of lower level cells

• Rendering

– Refine conformal hierarchy, render selected precomputed cells


Adaptive TetraPuzzlesAdaptive TetraPuzzlesOverviewOverview

• Construction



– Construct non-leaf cells by bottom-up recombinationand simplification of lowerlevel cells

• Rendering

– Refine conformalhierarchy, render selectedprecomputed cells

View dependent mesh refinement


Adaptive TetraPuzzlesAdaptive TetraPuzzlesResultsResults

Michelangelo’s St. Matthew

Source: Digital Michelangelo

ProjectProject

Data: 374M triangles

Intel Xeon 2.4GHz 1GB

GeForce FX 5800U AGP8X


Advantages of meshAdvantages of mesh--based based multiresolution modelsmultiresolution models• First GPU bound methods

for very large meshes

– Adaptive conforming meshes

• Reduced overdraw

– Extensive optimization– Extensive optimization• Stripification, cache

coherence, compression, …

– State of the art performance

• GPU bound, >4Mtri/frame at >30 fps on modern GPUs

• Extremely high quality for large dense models with “well behaved” surface


Limitations of meshLimitations of mesh--based based multiresolution modelsmultiresolution models• Visibility and multiresolution

solved as separate problems

– Error measured on boundary surfaces

– LOD construction based on local surface local surface coarsening/simplification operations

– LOD construction unaware of visibility (view-independent approximations)

• Hard to apply to models with high detail and complex topology and high depth complexity!


Overcoming limitations of local Overcoming limitations of local mesh refinement techniquesmesh refinement techniques• Tight integration of

visibility and LOD construction

– Multi-scale modeling of appearance rather than geometry geometry

– Volume-based rather than surface-based









(CNR) IEEE Viz 2005

Specialize





Generalize

Specialize


Generalize


Far VoxelsFar VoxelsHandling Huge Complex 3D modelsHandling Huge Complex 3D models

• General purpose technique that targets many model kinds

• Underlying ideas

– Multi-scale modeling of appearance rather than appearance rather than geometry

– Volume-based rather than surface-based

– Tight integration of visibility and LOD construction

– GPU accelerated (programmabilty + batching)


Far VoxelsFar VoxelsThe Far Voxel ConceptThe Far Voxel Concept

• Assumption: opaque surfaces, non participating medium

• Goal is to represent the appearance of complex far geometry

– Near geometry can be – Near geometry can be represented at full resolution

• Idea is to discretize a model into many small volumes located in the neighborood of surfaces

– Approximates how a small subvolume of the model reflects the incoming light

=> View-dependent cubical voxel



• Assumption: opaque surfaces, non participating medium

• Goal is to represent the appearance of complex far geometry

– Near geometry can be – Near geometry can be represented at full resolution

• Idea is to discretize a model into many small volumes located in the neighborhood of surfaces

– Approximates how a small subvolume of the model reflects the incoming light

=> View-dependent voxel



• A far voxel returns color attenuation given

– View direction

– Light direction

• Rendered using a customized vertex shader executed on the GPU

Shader = f (view direction, light direction)


Far VoxelsFar VoxelsConstruction overviewConstruction overview


Far VoxelsFar VoxelsConstruction overview: Inner nodesConstruction overview: Inner nodes

• Sample a model subvolume to build a grid of far voxels

• Voxels are far

– Project to worst case θmax

– Viewed not closer than d

D min

θθθθ max

– Viewed not closer than dmin

Section of the 3D grid of far voxels


Far VoxelsFar VoxelsConstruction overview: Inner nodesConstruction overview: Inner nodes

• Sample a model subvolume to build a grid of far voxels

• Voxels are far

– Project to worst case θmax

– Viewed not closer than d

D min

θθθθ max

– Viewed not closer than dmin

• Raycasting samples original model and identifies visible voxels



Far VoxelsFar VoxelsConstruction overview: Object Space Construction overview: Object Space OcclusionOcclusion• Environment occlusion

• Cull interior part of grid of far voxels

D min

X θθθθ max


X




XD min

θθθθ maxX





XD min

θθθθ max

• Culls 40% of the high depth complexity Boeing 777 model,• worst case θmax = 0.5 deg

(~10 pixel tolerance for 1024x1024 viewport using 50deg FOV)

• Minimize artifacts due to leaking of occluded parts of different colors

X



Far VoxelsFar VoxelsConstruction overview: Far VoxelConstruction overview: Far Voxel

• Consider voxel subvolume

• Samples gathered from unoccluded directions

– Sample: – Sample: • (BRDF, n) = f(view direction)


Far VoxelsFar VoxelsConstruction overview: Far VoxelConstruction overview: Far Voxel

• Consider voxel subvolume

• Samples gathered from unoccluded directions

– Sample: – Sample: • (BRDF, n) = f(view direction)

• Compress shading information by fitting samples to a compact analytical representation


Far VoxelsFar VoxelsConstruction overview: Far Voxel ShadersConstruction overview: Far Voxel Shaders

• Build all the K different far voxels representations

– K = flat, smooth..

– Principal component analysis

• Evaluate each representation error

Flat proxy:2 components

Smooth proxy:6 components

error

– Compare real values (samples) with the voxel approximations from the sample direction

• Choose approximation with lowest error

…

Others…

Err(k) =


Far VoxelsFar VoxelsRenderingRendering

• Hierarchical traversal with coherent culling

– Stop when out-of view, occluded (GPU feedback), or accurate enough

• Leaf node: Triangle rendering

– Draw the precomputed triangle strip

• Inner node: Voxel rendering• Inner node: Voxel rendering

– For each far voxel type• Enable its shader

• Draw all its view dependent primitives using glDrawArrays

– Splat voxels as antialiased point primitives

– Limits• Does not consider primitive opacity

• Rendering quality similar to one-pass point splat methods (no sorting/blending)

TrianglesTrianglesFar VoxelsFar Voxels


Far VoxelsFar VoxelsResultsResults

• Tested on extremely complex heterogeneous surface models

– St.Matthew, Boeing 777, Richtmyer Meshkov isosurf., all at once

• Tested in a number of situations

– Single processor / cluster construction– Single processor / cluster construction

– Workstation viewing, large scale display

373M triangles373M triangles14.5 GB14.5 GB



1.2G triangles1.2G triangles46.6 GB46.6 GB



• 1-16 Athlon 2200+ CPU, 3 x 70GB ATA 133 Disk (IDE+NFS)

• 1-20K triangles/sec

– Scales well, limited by slow disk I/O for large meshes

– Slow!! (but similar to recent adaptive tessellation methods)

• Avg. triangles per leaf 5K• Avg. triangles per leaf 5K

• Avg. voxels per inner node 2.5K

5h18m (16 CPU)5h18m (16 CPU)10.6 GB10.6 GB

6h51m (16 CPU)6h51m (16 CPU)14.9 GB14.9 GB

8h06m (16 CPU)8h06m (16 CPU)16.1 GB16.1 GB 41.6 GB41.6 GB



• Xeon 2.4GHz, 70GB SCSI 320 Disk, GeForce FX6800GT AGP 8x

• Window size: from video resolution to stereo projector display

– St.Matthew, Boeing, Isosurface: 640 x 480

– All at once: 640 x 480 and Stereo 2 x 1024 x 768– All at once: 640 x 480 and Stereo 2 x 1024 x 768

• Pixel tolerance: [Target 1 | Actual ~0.9 | Max ~10]

• Resident set size limited to ~200 MB

45 Fps45 Fps51 MPrim/s51 MPrim/s



2 x 1024 x 7682 x 1024 x 76820 Fps20 Fps40 MPrim/s40 MPrim/s

640 x 480640 x 48020 Fps20 Fps42 MPrim/s42 MPrim/s


Far VoxelsFar VoxelsConclusionsConclusions

• General purpose technique that targets many model kinds– Seamless integration of

• multiresolution

• occlusion culling

• out-of-core data management• out-of-core data management

– High performance

– Scalability

• Main limitations– Slow preprocessing

– Non-photorealistic rendering quality

Intel Xeon 2.4GHz 1GB, GeForce 6800GT AGP8X









(CNR) IEEE Viz 2005

Specialize



MOVR – COVRA Volumetric models Gobbetti/Marton/Iglesias Guitian (CRS4)CGI 2008


Generalize

Specialize


Generalize

www.crs4.it/vic/

Recent Advances in Massive Recent Advances in Massive

Volume VisualizationVolume Visualization


IntroductionIntroduction

GoalGoal• Visualization of massive scalar

volumes without size limitations

– A single-pass raycastingtechnique working out-of-core on GPU parallel architectures

• Compress data to facilitate data • Compress data to facilitate data streaming and 4D visualizations

– Novel compression architecture and novel compression methods

56



Teaser Teaser

57

Compression-domain adaptive volume rendering based on sparse representation of voxel blocks. NVIDIA GTX 560


MOVR: A singleMOVR: A single--pass raycasting pass raycasting technique working outtechnique working out--ofof--core on core on

The Visual Computer 2008 & 2010

technique working outtechnique working out--ofof--core on core on GPU parallel architectures GPU parallel architectures

58


Accumulation

Early ray termination

Massive Volumes Visualization Massive Volumes Visualization

Volume rendering problemVolume rendering problem

Order dependentOrder independent

Empty space skippingPixel

59



Volume rendering problemVolume rendering problem

• Current interactive solutions are based on GPU architectures

– Massive parallelism

– Huge memory bandwidth

• E.g. GeForce GTX 580

– has a 192.4 GB/s of bandwidth

– Has 1581.1 GFLOPs

[ hardwareinsight.com ]

60


• Current high quality solutions based on GPUs implementing …

– Slice-based methods

– Ray casting techniques

Massive Volumes VisualizationMassive Volumes Visualization

Related work. Moderately sized volumesRelated work. Moderately sized volumes

– Ray casting techniques

• �� The full volume must fit

on GPU memory

[ Li et al, 2003 ]

[ Krüger et al., 2003 ]

61


• Multiresolution out-of-core Volume Renderer– Preprocessing

• build multiresolution octree of volume bricks

– Rendering: • Adaptive CPU loading of the data from local/remote repository


Contribution Contribution to the stateto the state--ofof--thethe--artart

• Adaptive CPU loading of the data from local/remote repository cooperates with separate render thread fully executed in the GPU

• Stackless traversal of an adaptive working set

• Exploitation of the visibility feedback

62

E. Gobbetti, F. Marton, and J. A. Iglesias Guitián. A single-pass GPU ray casting framework for interactive out-of-core rendering of massive volumetric datasets.The Visual Computer, 24, 2008.

J. A. Iglesias Guitián, E. Gobbetti and F. MartonView-dependent exploration of massive volumetric models on large-scale light field displays.The Visual Computer, 26, 2010.


• Use CPU for …– Creation & loading

– Octree refinement

– Encode current cut using an spatial index


Contribution to the stateContribution to the state--ofof--thethe--artart

• Use GPU for …– Stackless octree traversal

• Using neighbour pointers

– Rendering• Flexible ray traversal /

compositing strategies

• Improved visibility feedback

63

Architecture overview

Neighbour pointer navigation


adaptive loaderpreprocessing

visibility

feedbackoctree refinement

[ creation and maintainance ] [ rendering ]

offli

ne


Method overviewMethod overview

volume

render

storage

octree node

database

has current working set enough accuracy?

yes

prepare to render

no

GPUCPU

64


• Working set reduction

– Opaque 1731 -> 1035 bricks

– Transp. 1984 -> 1789 bricks


Visibility feedbackVisibility feedback

65

• Rendered on window size 1024x576



Results (2/2)Results (2/2)

Interactive exploration of a 16bit 2GB CT volume on a consumer NVidia 8800 GTS graphics board with 640MB (2008)

66

640MB (2008)


Compression Compression –– Domain Domain Volume RenderingVolume Rendering

67

• 60 Time steps of the 432^3 supernova dataset


Volume CompressionVolume Compression


• Limited bandwidth and memory =>– LOD (MOVR)

– Compression

• Compression is fully exploited if data is maintained in compressed form through the maintained in compressed form through the entire pipe-line– Compression-domain volume renderers + deferred filtering

• Highly asymmetric encoding/decoding schemes– We can afford slow offline compression and precomputation

– Fast real-time data decoding, interpolation and shading

– Spatially independent random-access to data

68


StateState--ofof--thethe--artart

• CPU decompression

– Do not limit bandwidth and memory• [Ning & Hesselink, 92] and many others...

• [Gobbetti et al. 08, Iglesias et al. 10]

• Hardware based– E.g. S3TC [Brown], NVidia VTC [Craighead]

– Full random access– Full random access

– Limited compression

• GPU decompression

– Full working set GPU decompression

• Tensor Approximation [Suter et al.2010]

• Do not limit memory

• Limit Bandwidth

– Partial working set

• Limit both memory and bandwitdh


Tensor Approximation Tensor Approximation (CRS4 & UZH 2010)(CRS4 & UZH 2010)• Multiresolution

• Brick Based

• Extract dominant data features

• Real Time GPU Reconstruction– Full Working set

• Bandwidth optimization

• Memory Consumption

S. Suter, J. A. Iglesias Guitián, F.Marton, M. Agus, A. Elsener, C. Zollikofer, M. Gopi, E. Gobbetti, and R. Pajarola. Interactive Multiscale Tensor Reconstruction for MultiresolutionVolume Visualization. In: IEEE Transactions on Visualization and Computer Graphics, pp. 2135–2143, vol 17, 2011



Contribution to the stateContribution to the state--ofof--thethe--artart

• COVRA: Compression-domain Output-sensitive Volume Rendering Architecture

– Novel architecture w/ parameterized cache behaviour

– Supports and extend state-of-the-art compression methods

• ☺☺☺☺ Efficient multisampling (HQ shading)•

• ☺☺☺☺ No perspective limitations

• ☺☺☺☺ Fully adaptive multiresolution approach

• ☺☺☺☺ Multipass working set decompression

• ☺☺☺☺ High compression ratios and signal quality

J. A. Iglesias Guitián, F.Marton and E. Gobbetti. COVRA: a Compression Domain Output-Sensitive Volume Rendering Architecture based on sparse representation of voxel blocksIn: proceedings of Eurovis 2012



COVRA: OverviewCOVRA: Overview

• Main concepts:

– Preprocessor builds multiresolution octree of compressed nodes

– Data travel in compressed format until last stage.

– Fully adaptive Rendering

– Highly integrated decompression / rendering supporting high quality filtering and shading

72


RunRun--timetime

COVRA: Subtree managementCOVRA: Subtree management

• Three rendering steps:1. CPU multiresolution octree

Adaptive refinement

2. Partitioning of the octree into a set of subtrees• Use GPU decompressed cache size as

constraintconstraint

• Front-to-back order decided at real-time during the octree traversal

3. Subtree decompression, raycasting and compositing

• Decompress to temporary buffer or available GPU cache

• Raycast decompressed octree nodes

• Compose with previous results

73

Framebuffer



Sparse coding of volume blocksSparse coding of volume blocks

• Each multiresolution octree node decomposed in blocks.

• Each block, made of few^3 voxels, is compressed

Single octree node containing overlapping information Compressed block

• Each block represented by a sparse linear combination of few dictionary elements

– Data specific representation

– Compression is achieved by storing indices and magnitudes

74

overlapping information



Sparse coding of volume blocksSparse coding of volume blocks

• Generalization of vector quantization

– Combine vectors instead of choosing single ones

– Overcomes limitations due to dictionary sizes

• Generalization of data-specific bases

– Dictionary is an overcomplete basis– Dictionary is an overcomplete basis

– Sparse projection

• Encoding in two steps

– Training: Find data specific dictionary

– Sparse coding: Find best representation of each block using linear combination of dictionary elements under sparsity constraint

• We employ ORMP via Choleski Decomposition

75



Finding an optimal dictionaryFinding an optimal dictionary

• We employ the K-SVD algorithm for dictionary training

– Algorithm for designing overcomplete dictionaries for sparse representations [Aharon et al. 06]

• But running K-SVD calculations directly on massive volumes would be unfeasible, massive volumes would be unfeasible, therefore …

– … we applied the concept of coreset [Agarwal et al. 05] to smartly subsample and reweight the original training set [Feldman & Langberg 11, Feigin et al. 11]

76


• K-SVD can be seen as a K-Means generalization

• Basic steps:– Sparse coding of signals in X, producing Γ


Dictionary learning (KDictionary learning (K--SVD)SVD)

– Update dictionary atoms given the sparse representations• Optimize one atom at a time, keeping the rest fixed

• The size of E is proportional to the number of training signals

– As in [Rubinstein et al. 08] we replace the SVD computation with a simpler numerical approximation

77



Coreset constructionCoreset construction

• Calculations on massive input volumes are still unfeasible, but we can …

– … reduce the amount of data used for training

– … use importance sampling

• We associate an importance to each of the • We associate an importance to each of the original blocks, being the standard deviation of the entries in

– Picking C elements with probability proportional to

– More important blocks should finish in our coreset

78



Coreset constructionCoreset construction

• Non-uniform sampling introduces a severe bias

– Scale each selected block by a weight where is the associated probability

– Applying K-SVD to scaled coefficients will converge to a dictionary associated with the original problem

• Coreset scalability

79



COVRA: ResultsCOVRA: Results

• PSNR vs. Bits Per Sample

80




• Comparison against state-of-the-art GPU-based decompression methods




82




• Gradient mapped to RGB color

83



COVRA: VideoCOVRA: Video

84

Compression-domain adaptive volume rendering based on sparse representation of voxel blocks. NVIDIA GTX 560. (2012)


• Improved the scalability of state-of-the-art volume rendering techniques– MOVR: a novel single-pass GPU ray casting framework supporting a

flexible ray traversal and incorporating visibility feedback for interactiveexploration of large volumes without size limitations

• Improved compression and streaming of large

Summary and ConclusionsSummary and Conclusions

SummarySummary

• Improved compression and streaming of largeand time-varying volumes– COVRA: Proposed a novel compression-domain architecture, supporting

state-of-the-art compression methods, random-access to compresseddata and HQ shading

– A novel compression method for massive volumes based on sparse-coding (K-SVD) and coreset training sets

85









(CNR) IEEE Viz 2005

Specialize



MOVR – COVRA Volumetric models Gobbetti/Marton/Iglesias Guitian (CRS4)CGI 2008


Generalize

Specialize


Generalize







I/O

Storage Screen




View parameters


SmallWorking Set


THANK YOU!THANK YOU!

QuestionsQuestions and and AnswersAnswers

NextNext SessionSessionNextNext SessionSession

Technologies for improving realTechnologies for improving real--time time immersive exploration of massive immersive exploration of massive

(volumetric) (volumetric) models.models.presented bypresented byMarco Marco AgusAgus

seminario fabio marton, 4-10-2012

Technology

marton crs4visual computing

real models

parallel techniques

sensitive access

large models

models o109

local terrain models

memoryefficient techniques