thinking parallel, part i
TRANSCRIPT
-
7/27/2019 Thinking Parallel, Part I
1/6
10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone
https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 1/6
NVIDIA Developer Zone
Secondary links
CUDA ZONE
Hom e > CUDA ZONE > Parallel Fora ll
Thinking Parallel, Part I: Collision Detection on the GPU
By Tero Karras, posted Nov 1 2 2 012 at 1 1 :59AM
Tags: Algorithms, Parallel Programming
This series o f posts aims to highlight some o f the main differences between co nventional
programming and parallel programming on the algorithmic level, using broad-phase collision
detec tion as an example. The first part will giv e some background, discuss two commonly used
approac hes, and introduc e the concept of divergenc e. The second part will switch gears to
hierarchical tree trav ersal in order to show how a good single-core algorithm can turn out to be a
poor cho ice in a parallel setting, and vic e v ersa. The third and final part will discuss parallel tree
construction, introduce the c oncept of occupancy , and present a recently published algorithmthat has specifically been designed with massive parallelism in mind.
Why Go Parallel?
The computing world is changing. In the past, Moore 's law meant that the performance of
integrated circuits would roughly double e very two y ears, and that you could expect any
program to automatically run faster on newer proc essors. Howev er, ev er since processor
architectures hit thePower Wallaround 2002, opportunities for improving the raw performance
of individual processor c ores have become ve ry limited. Today, Moore's law no longer meansy ou get faster cor esit me ans y ou get mo re of them. A s a result , pro grams will not get any faster
unless they can effectively utilize the ever-increasing number of cores.
Out of the curr ent consumer-level processors, GPUs represent one extreme of this dev elopment.
NVIDIA GeForce GTX 480, for example, can ex ecute 23,040 threads in parallel, and in practice
requires at least 15 ,000 threads to reach full performance. The benefit of this design point is that
individual threads are v ery lightweight, but together they can achieve extremely high instruction
throughput.
One might argue that GPUs are somewhat esoteric proc essors that are o nly interesting toscientists and performance e nthusiasts working on spec ialized applications. While this may be
true to some ex tent, the general direction towards more and more parallelism seems inevitable.
Learning to write efficient GPU programs not only helps you get a substantial performance bo ost,
Dev eloper Cent er s Tech nologies Tools
Resou rces Com m unity
Log In
https://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/blog/tags/4843https://developer.nvidia.com/blog/tags/231https://developer.nvidia.com/category/zone/cuda-zonehttps://developer.nvidia.com/blog/term/2https://developer.nvidia.com/user/loginhttps://developer.nvidia.com/https://developer.nvidia.com/technologieshttps://developer.nvidia.com/toolshttps://developer.nvidia.com/https://developer.nvidia.com/user/loginhttps://developer.nvidia.com/https://developer.nvidia.com/suggested-readinghttps://developer.nvidia.com/toolshttps://developer.nvidia.com/technologieshttps://developer.nvidia.com/http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.htmlhttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/blog/tags/231https://developer.nvidia.com/blog/tags/4843https://developer.nvidia.com/blog/term/2https://developer.nvidia.com/category/zone/cuda-zonehttps://developer.nvidia.com/https://developer.nvidia.com/ -
7/27/2019 Thinking Parallel, Part I
2/6
10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone
https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 2/6
but it also highlights some of the fundamental algor ithmic considerations that I believe will
ev entually b ecome r elevant for all types o f computing.
Many people think that parallel programming is hard, and they are partly right. Ev en though
parallel programming languages, libraries, and too ls have taken huge leaps forward in quality and
productiv ity in recent years, they still lag behind their single-core c ounterparts. This is not
surprising; those counterparts have evolved for decades while mainstream massively parallel
general-purpose co mputing has been around only a few short y ears.
More importantly , parallel programmingfee ls hard bec ause the rules are different. We
programmers have learned about algorithms, data structures, and complexity analysis, and we
have dev eloped intuition for what kinds of algorithms to co nsider in order to solv e a particular
problem. When programming for a massively parallel processor, some of this knowledge and
intuition is not only inacc urate, but it may ev en be co mpletely wrong. It's all about learning to
"think parallel".
Sort and Sweep
Collision detection is an important ingredient in almost any physics simulation, including special
effects in games and mov ies, as well as scientific research. The problem is usually div ided into
two phases, the br oad phase and the narrow phase. The broad phase is responsible for quickly
finding pairs of 3D objec ts than can potentially c ollide with each other, whereas the narrow phase
looks more c losely at each potentially c olliding pair found in the broad phase to see whether a
co llision actually oc curs. We will concentrate on the bro ad phase, for which there are a number
of well-known algorithms. The most straightforward one is sort and swee p.
The sort and sweep algorithm works by assigning an axis-aligned bounding box (AABB) for each3D objec t, and projecting the bounding boxes to a cho sen one-dimensional axis. After projection,
each object will correspond to a 1 D range on the axis, so that two ob jects can co llide only if their
ranges overlap each other.
In order to find the
overlapping ranges,
the algorithm collects
the start points (S ,
S , S ) and end points
(E , E , E ) of theranges into an array ,
and sorts them along
the axis. For each
object, it then sweeps
the list from the
objec t's start and end
points (e.g. S and E )
and identifies all
objects whose start
point lies between
them (e.g. S ). For each pair of objects found this way, the algorithm further checks their 3D
bo unding boxes for overlap, and reports the overlapping pairs as potential c ollisions.
1
2 3
1 2 3
2 2
3
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch32.html -
7/27/2019 Thinking Parallel, Part I
3/6
10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone
https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 3/6
Sort and sweep is very easy to implement on a parallel processor in three processing steps,
synchronizing the ex ecution after each step. In the first step, we launch one thread per object to
calculate its bounding box , projec t it to the chosen ax is, and write the start and end points of the
projec ted range into fixed loc ations in the output array . In the second step, we sort the array into
ascending order. Parallel sorting is an interesting topic in its own right, but the botto m line is
that we can useparallel radix so rtto y ield a linear ex ecution time with respect to the number of
objec ts (given that there is enough work to fill the GPU). In the third step, we launch one thread
per array element. If the element indicates an end point, the thread simply ex its. If the elementindicates a start point, the thread walks the array forward, performing ov erlap tests, until it
encounters the corresponding end point.
The most obv ious downside of this algorithm is that it may need to per form up to O(n ) overlap
tests in the third step, regardless of how many bo unding boxes ac tually ov erlap in 3D. Even if we
choose the pro jection axis as intelligently as we can, the worst case is still prohibitively slow as
we may need to test any giv en o bje ct against other ob jec ts that are arbitrarily far away . While
this effect hurts both the serial and parallel implementations of the sort and sweep algorithm
equally , there is another factor that we need to take into acc ount when analy zing the parallel
implementation: divergence.
Divergence
Divergence is a measure of whether nearby threads are do ing the same thing or different things.
There are two flavors: execution divergence means that the threads are executing different code
or making different contro l flow dec isions, while data divergence means that they are reading or
writing disparate lo cations in memor y. Both are bad for performance on parallel mac hines.
Moreov er, these are not just artifacts of curre nt GPU architec turesperformance of any
sufficiently parallel proce ssor will necessarily suffer from div ergence to some degree .
On traditional CPUs we hav e learned to r ely on the large data caches built right next to the
ex ecut ion pipeline. In most cases, this makes memory ac cesses prac tically free: the data we want
to acc ess is almost always already pre sent in the cache. Howev er, increasing the amount of
parallelism by multiple orders of magnitude changes the picture co mpletely . We cannot really
add a lot more on-chip memory due to power and area constraints, so we will have to serve a
much larger group of threads from a similar size cac he. This is not a problem if the threads are
doing more or less the same thing at the same time, since their co mbined working set is still likely
to stay reasonably small. But if they are doing something completely different, the working set
ex plodes, and the effective cache size from the perspectiv e of one thread may shrink to just ahandful of by tes.
The third step of the sort and sweep algorithm suffers mainly from execut ion divergence. In the
implementation described abov e, the threads that happen to land on an end point will terminate
immediately, and the re maining ones will walk the array for a v ariable number of steps. Threads
responsible for large objects will generally perform more work than those respo nsible for smaller
ones. If the scene co ntains a mixture o f different object sizes, the execution times of nearby
threads will vary wildly. I n other words, the e xec ution divergence will be high and the
performance will suffer.
Uniform Grid
Another algorithm that lends itself to paralle l implementatio n is uniform gridcollision detection.
2
http://http.developer.nvidia.com/GPUGems3/gpugems3_ch32.htmlhttp://code.google.com/p/back40computing/wiki/RadixSortinghttp://scholar.google.com/scholar?q=parallel+sorting -
7/27/2019 Thinking Parallel, Part I
4/6
10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone
https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 4/6
In its basic form, the algorithm assumes that all objects are roughly equal in size. The idea is to
construct a uniform 3D grid whose ce lls are at least the same size as the largest objec t.
In the first step, we
assign each object to
one ce ll of the grid
according to the
centro id of itsbounding box . In the
second step, we look
at the 3x3x 3
neighborhood in the
grid (highlighted).
For every object we
find in the
neighboring cells
(green chec k mark),
we check thecorresponding
bounding box for
overlap.
If all objects are indeed ro ughly the same size, they are more or less ev enly distributed, the grid
is not too large to fit in memory , and objects near each o ther in 3D happen to get assigned to
nearby threads, this simple algorithm is actually ve ry efficient. Ev ery thread ex ecutes ro ughly
the same amount of work, so the ex ecution divergence is low. Eve ry thread also ac cesses roughly
the same memory locations as the nearby ones, so the data div ergence is also low. But it took
many assumptions to get this far. While the assumptions may hold in certain special cases, like
the simulation of fluid particles in a box , the algorithm suffers from the so-called teapot-in-a-
stadium problem that is common in many real-world scenarios.
The teapot-in-a-stadium problem arises when yo u have a large scene with small details. There are
objec ts of all different sizes, and a lot of empty space. I f y ou place a uniform grid ove r the entire
scene, the cells will be too large to handle the small objec ts efficiently, and most of the storage
will get wasted on empty space. To handle suc h a case effic iently , we need a hierarc hic al
approac h. And that's where things get interesting.
Discussion
We hav e so far seen two simple algorithms that illustrate the basic s of paralle l programming quite
well. The c ommo n way to think about paralle l algor ithms is to div ide them into multiple steps, so
that each step is exec uted independently for a large number of items (objec ts, array e lements,
etc.). By v irtue of synchronization, subsequent steps are free to access the re sults of the prev ious
ones in any way they please.
We hav e also identified div ergence as one o f the most impor tant things to keep in mind when
comparing different ways to parallelize a given computation. Based on the ex amples presented so
far, it would seem that minimizing divergenc e tilts the scales in favor o f "dumb" or "brute force"
algorithmsthe uniform grid, which reaches low diver gence, has to indeed rely on many
simplify ing assumptions in order to do that.
-
7/27/2019 Thinking Parallel, Part I
5/6
10/10/13 Thinking Parallel, Part I: Coll ision Detection on the GPU | NVIDIA Developer Zone
https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 5/6
In my next po st I will focus on hierarchical tree trav ersal as a means of performing collision
detec tion, with an aim to shed more light on what it really means to optimize for low div ergence .
Abo ut the autho r: Tero Karras is a graphics researc h scientist at NV IDIA Researc h.
Parallel Forall is the NVI DIA Parallel Programming blog. If you enjoy ed this post, subsc ribe to
theParallel Forall RSS fee d!
NVIDIA Developer Programs
Get exclusiv e access to the lat est softwar e, report bugs and r eceiv e notifications for special ev ents.
Lear n m ore and Register
Recommended Reading
About Par al lel Forall
Conta ct Para llel Forall
Parallel Forall Blog
Featured Articles
Prev iousPauseNext
Tag Index
accelerometer (1) Alg orithm s (3) An droid (1 ) ANR (1 ) ARM (2) Ar ray Fire (1) Au di (1) Aut omotiv e &
Embedded (1 ) Blog (2 0) Blog (2 3) Cluster (4 ) competition (1 ) Compilation (1 ) Concur rency (2)
Copperh ead (1 ) CUDA (2 3) CUDA 4.1 (1 ) CUDA 5.5 (3 ) CUDA C (1 5) CUDA Fort ra n (1 0) CUDA Pro Tip
(1 ) CUDA Profiler (1 ) CUDA Spotligh t (1 ) CUDA Zone (81 ) CUDACasts (2) Debug (1 ) Debugg er (1 )
Debugging (3) Dev elop 4 Shield (1) dev elopment kit (1 ) DirectX (3) Eclipse (1 ) Ev ents (2) FFT (1 ) Finite
Difference (4) Floatin g Point (2 ) Gam e & Graphics Dev elopment (3 5) Gam es and Gra phics (8) GeForce
Dev eloper Stories (1 ) getting started (1) google io (1 ) GTC (2) Hardwar e (1 ) Interv iew (1 ) Kepler (1 )
Lamborghini (1 ) Librar ies (4) m emory (6) Mobile Dev elopment (2 7 ) Monte Car lo (1 ) MPI (2) Multi -GPU
(3) n ativ e_app_glue (1 ) NDK (1 ) NPP (1 ) Nsight (2) N Sight Eclipse Edition (1 ) Nsight Tegra (1 )
NSIGHT Visual Stu dio Edition (1 ) Num baPro (2) Nu m erics (1) NV IDIA Parallel Nsight (1 ) nv idia-smi
(1 ) Occupancy (1 ) OpenACC (6) OpenGL (3) OpenGL ES (1) Parallel Forall (6 9) Parallel Nsight (1 )
Parallel Progr am m ing (5) PerfHUD ES (2) Perform anc e (4) Port ability (1 ) Port ing (1 ) Pro Tip (5)
Professional Gra phics (6) Profiling (3) Program m ing Lang ua ges (1) Py thon (3 ) Robotics (1 ) Shape
Nsight Visual Studio Edition 3 .1 Final
Now...
https://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/content/nsight-visual-studio-edition-3_1-final-now-available-visual-studio-2012-direct3d-111-and-cudahttps://developer.nvidia.com/content/nsight-visual-studio-edition-3_1-final-now-available-visual-studio-2012-direct3d-111-and-cudahttps://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4872?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4858?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4841?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4526?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4847?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4869?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4850?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3546?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4831?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/231?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/176?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4880?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/311?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3356?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4621?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4846?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4944?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4626?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4681?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4953?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4903?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4938?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4949?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4776?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4701?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/296?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4866?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4853?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4756?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4871?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4836?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/431?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4862?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4870?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4859?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/181?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4201?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/421?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4948?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4977?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/1?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4851?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4946?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4741?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4811?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3131?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/276?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4915?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4521?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4372?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4874?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/2?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4336?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/466?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4873?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/456?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4501?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4933?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4631?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4857?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4845?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4864?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4860?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4852?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4917?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4905?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4786?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/436?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4842?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4868?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4867?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/496?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4843?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/321?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttp://developer.nvidia.com/blog/feed/2http://developer.nvidia.com/content/contact-parallel-forallhttp://developer.nvidia.com/about-parallel-forallhttps://developer.nvidia.com/blog?term=2https://developer.nvidia.com/registered-developer-programshttps://developer.nvidia.com/blog/feed/2 -
7/27/2019 Thinking Parallel, Part I
6/6
10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone
https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 6/6
Sensing (1 ) Shared Memory (6) Shield (1) Stream s (2) tablet (1 ) TADP (1 ) Technologies (3) t egra (5)
Tegra A ndroid Developer Pack (1 ) Tegra A ndroid Development Pack (1 ) Tegra Dev eloper Stories (1)
Tegra Profiler (1 ) Tegra Zone (1 ) Textur es (1) Th rust (3 ) Tools (10) tools (2) Toradex (1 ) Visual Stu dio
(3 ) Windows 8 (1 ) xoom (1 ) Zone In (1 )
Developer Blogs
Parallel Forall Blog
About
Contact
Copyright 2013 NVIDIA Corporation
Legal Inf ormation
Priv acy Policy
Code of C onduct
https://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/code-of-conducthttp://www.nvidia.com/object/privacy_policy.htmlhttp://www.nvidia.com/object/legal_info.htmlhttps://developer.nvidia.com/contacthttp://www.nvidia.com/page/companyinfo.htmlhttp://developer.nvidia.com/blog/feed/2https://developer.nvidia.com/blog/tags/4392?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4211?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4945?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4161?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4861?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/256?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/50?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4821?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4849?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4397?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4939?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4971?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4940?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4914?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/51?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/221?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4865?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/316?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4844?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4854?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4848?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublas