thinking parallel, part i

Upload: cmaestrofdez

Post on 14-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/27/2019 Thinking Parallel, Part I

    1/6

    10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 1/6

    NVIDIA Developer Zone

    Secondary links

    CUDA ZONE

    Hom e > CUDA ZONE > Parallel Fora ll

    Thinking Parallel, Part I: Collision Detection on the GPU

    By Tero Karras, posted Nov 1 2 2 012 at 1 1 :59AM

    Tags: Algorithms, Parallel Programming

    This series o f posts aims to highlight some o f the main differences between co nventional

    programming and parallel programming on the algorithmic level, using broad-phase collision

    detec tion as an example. The first part will giv e some background, discuss two commonly used

    approac hes, and introduc e the concept of divergenc e. The second part will switch gears to

    hierarchical tree trav ersal in order to show how a good single-core algorithm can turn out to be a

    poor cho ice in a parallel setting, and vic e v ersa. The third and final part will discuss parallel tree

    construction, introduce the c oncept of occupancy , and present a recently published algorithmthat has specifically been designed with massive parallelism in mind.

    Why Go Parallel?

    The computing world is changing. In the past, Moore 's law meant that the performance of

    integrated circuits would roughly double e very two y ears, and that you could expect any

    program to automatically run faster on newer proc essors. Howev er, ev er since processor

    architectures hit thePower Wallaround 2002, opportunities for improving the raw performance

    of individual processor c ores have become ve ry limited. Today, Moore's law no longer meansy ou get faster cor esit me ans y ou get mo re of them. A s a result , pro grams will not get any faster

    unless they can effectively utilize the ever-increasing number of cores.

    Out of the curr ent consumer-level processors, GPUs represent one extreme of this dev elopment.

    NVIDIA GeForce GTX 480, for example, can ex ecute 23,040 threads in parallel, and in practice

    requires at least 15 ,000 threads to reach full performance. The benefit of this design point is that

    individual threads are v ery lightweight, but together they can achieve extremely high instruction

    throughput.

    One might argue that GPUs are somewhat esoteric proc essors that are o nly interesting toscientists and performance e nthusiasts working on spec ialized applications. While this may be

    true to some ex tent, the general direction towards more and more parallelism seems inevitable.

    Learning to write efficient GPU programs not only helps you get a substantial performance bo ost,

    Dev eloper Cent er s Tech nologies Tools

    Resou rces Com m unity

    Log In

    https://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/blog/tags/4843https://developer.nvidia.com/blog/tags/231https://developer.nvidia.com/category/zone/cuda-zonehttps://developer.nvidia.com/blog/term/2https://developer.nvidia.com/user/loginhttps://developer.nvidia.com/https://developer.nvidia.com/technologieshttps://developer.nvidia.com/toolshttps://developer.nvidia.com/https://developer.nvidia.com/user/loginhttps://developer.nvidia.com/https://developer.nvidia.com/suggested-readinghttps://developer.nvidia.com/toolshttps://developer.nvidia.com/technologieshttps://developer.nvidia.com/http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.htmlhttps://developer.nvidia.com/sites/default/files/akamai/cuda/images/parallel-forall/sort-and-sweep_thumnail.jpghttps://developer.nvidia.com/blog/tags/231https://developer.nvidia.com/blog/tags/4843https://developer.nvidia.com/blog/term/2https://developer.nvidia.com/category/zone/cuda-zonehttps://developer.nvidia.com/https://developer.nvidia.com/
  • 7/27/2019 Thinking Parallel, Part I

    2/6

    10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 2/6

    but it also highlights some of the fundamental algor ithmic considerations that I believe will

    ev entually b ecome r elevant for all types o f computing.

    Many people think that parallel programming is hard, and they are partly right. Ev en though

    parallel programming languages, libraries, and too ls have taken huge leaps forward in quality and

    productiv ity in recent years, they still lag behind their single-core c ounterparts. This is not

    surprising; those counterparts have evolved for decades while mainstream massively parallel

    general-purpose co mputing has been around only a few short y ears.

    More importantly , parallel programmingfee ls hard bec ause the rules are different. We

    programmers have learned about algorithms, data structures, and complexity analysis, and we

    have dev eloped intuition for what kinds of algorithms to co nsider in order to solv e a particular

    problem. When programming for a massively parallel processor, some of this knowledge and

    intuition is not only inacc urate, but it may ev en be co mpletely wrong. It's all about learning to

    "think parallel".

    Sort and Sweep

    Collision detection is an important ingredient in almost any physics simulation, including special

    effects in games and mov ies, as well as scientific research. The problem is usually div ided into

    two phases, the br oad phase and the narrow phase. The broad phase is responsible for quickly

    finding pairs of 3D objec ts than can potentially c ollide with each other, whereas the narrow phase

    looks more c losely at each potentially c olliding pair found in the broad phase to see whether a

    co llision actually oc curs. We will concentrate on the bro ad phase, for which there are a number

    of well-known algorithms. The most straightforward one is sort and swee p.

    The sort and sweep algorithm works by assigning an axis-aligned bounding box (AABB) for each3D objec t, and projecting the bounding boxes to a cho sen one-dimensional axis. After projection,

    each object will correspond to a 1 D range on the axis, so that two ob jects can co llide only if their

    ranges overlap each other.

    In order to find the

    overlapping ranges,

    the algorithm collects

    the start points (S ,

    S , S ) and end points

    (E , E , E ) of theranges into an array ,

    and sorts them along

    the axis. For each

    object, it then sweeps

    the list from the

    objec t's start and end

    points (e.g. S and E )

    and identifies all

    objects whose start

    point lies between

    them (e.g. S ). For each pair of objects found this way, the algorithm further checks their 3D

    bo unding boxes for overlap, and reports the overlapping pairs as potential c ollisions.

    1

    2 3

    1 2 3

    2 2

    3

    http://http.developer.nvidia.com/GPUGems3/gpugems3_ch32.html
  • 7/27/2019 Thinking Parallel, Part I

    3/6

    10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 3/6

    Sort and sweep is very easy to implement on a parallel processor in three processing steps,

    synchronizing the ex ecution after each step. In the first step, we launch one thread per object to

    calculate its bounding box , projec t it to the chosen ax is, and write the start and end points of the

    projec ted range into fixed loc ations in the output array . In the second step, we sort the array into

    ascending order. Parallel sorting is an interesting topic in its own right, but the botto m line is

    that we can useparallel radix so rtto y ield a linear ex ecution time with respect to the number of

    objec ts (given that there is enough work to fill the GPU). In the third step, we launch one thread

    per array element. If the element indicates an end point, the thread simply ex its. If the elementindicates a start point, the thread walks the array forward, performing ov erlap tests, until it

    encounters the corresponding end point.

    The most obv ious downside of this algorithm is that it may need to per form up to O(n ) overlap

    tests in the third step, regardless of how many bo unding boxes ac tually ov erlap in 3D. Even if we

    choose the pro jection axis as intelligently as we can, the worst case is still prohibitively slow as

    we may need to test any giv en o bje ct against other ob jec ts that are arbitrarily far away . While

    this effect hurts both the serial and parallel implementations of the sort and sweep algorithm

    equally , there is another factor that we need to take into acc ount when analy zing the parallel

    implementation: divergence.

    Divergence

    Divergence is a measure of whether nearby threads are do ing the same thing or different things.

    There are two flavors: execution divergence means that the threads are executing different code

    or making different contro l flow dec isions, while data divergence means that they are reading or

    writing disparate lo cations in memor y. Both are bad for performance on parallel mac hines.

    Moreov er, these are not just artifacts of curre nt GPU architec turesperformance of any

    sufficiently parallel proce ssor will necessarily suffer from div ergence to some degree .

    On traditional CPUs we hav e learned to r ely on the large data caches built right next to the

    ex ecut ion pipeline. In most cases, this makes memory ac cesses prac tically free: the data we want

    to acc ess is almost always already pre sent in the cache. Howev er, increasing the amount of

    parallelism by multiple orders of magnitude changes the picture co mpletely . We cannot really

    add a lot more on-chip memory due to power and area constraints, so we will have to serve a

    much larger group of threads from a similar size cac he. This is not a problem if the threads are

    doing more or less the same thing at the same time, since their co mbined working set is still likely

    to stay reasonably small. But if they are doing something completely different, the working set

    ex plodes, and the effective cache size from the perspectiv e of one thread may shrink to just ahandful of by tes.

    The third step of the sort and sweep algorithm suffers mainly from execut ion divergence. In the

    implementation described abov e, the threads that happen to land on an end point will terminate

    immediately, and the re maining ones will walk the array for a v ariable number of steps. Threads

    responsible for large objects will generally perform more work than those respo nsible for smaller

    ones. If the scene co ntains a mixture o f different object sizes, the execution times of nearby

    threads will vary wildly. I n other words, the e xec ution divergence will be high and the

    performance will suffer.

    Uniform Grid

    Another algorithm that lends itself to paralle l implementatio n is uniform gridcollision detection.

    2

    http://http.developer.nvidia.com/GPUGems3/gpugems3_ch32.htmlhttp://code.google.com/p/back40computing/wiki/RadixSortinghttp://scholar.google.com/scholar?q=parallel+sorting
  • 7/27/2019 Thinking Parallel, Part I

    4/6

    10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 4/6

    In its basic form, the algorithm assumes that all objects are roughly equal in size. The idea is to

    construct a uniform 3D grid whose ce lls are at least the same size as the largest objec t.

    In the first step, we

    assign each object to

    one ce ll of the grid

    according to the

    centro id of itsbounding box . In the

    second step, we look

    at the 3x3x 3

    neighborhood in the

    grid (highlighted).

    For every object we

    find in the

    neighboring cells

    (green chec k mark),

    we check thecorresponding

    bounding box for

    overlap.

    If all objects are indeed ro ughly the same size, they are more or less ev enly distributed, the grid

    is not too large to fit in memory , and objects near each o ther in 3D happen to get assigned to

    nearby threads, this simple algorithm is actually ve ry efficient. Ev ery thread ex ecutes ro ughly

    the same amount of work, so the ex ecution divergence is low. Eve ry thread also ac cesses roughly

    the same memory locations as the nearby ones, so the data div ergence is also low. But it took

    many assumptions to get this far. While the assumptions may hold in certain special cases, like

    the simulation of fluid particles in a box , the algorithm suffers from the so-called teapot-in-a-

    stadium problem that is common in many real-world scenarios.

    The teapot-in-a-stadium problem arises when yo u have a large scene with small details. There are

    objec ts of all different sizes, and a lot of empty space. I f y ou place a uniform grid ove r the entire

    scene, the cells will be too large to handle the small objec ts efficiently, and most of the storage

    will get wasted on empty space. To handle suc h a case effic iently , we need a hierarc hic al

    approac h. And that's where things get interesting.

    Discussion

    We hav e so far seen two simple algorithms that illustrate the basic s of paralle l programming quite

    well. The c ommo n way to think about paralle l algor ithms is to div ide them into multiple steps, so

    that each step is exec uted independently for a large number of items (objec ts, array e lements,

    etc.). By v irtue of synchronization, subsequent steps are free to access the re sults of the prev ious

    ones in any way they please.

    We hav e also identified div ergence as one o f the most impor tant things to keep in mind when

    comparing different ways to parallelize a given computation. Based on the ex amples presented so

    far, it would seem that minimizing divergenc e tilts the scales in favor o f "dumb" or "brute force"

    algorithmsthe uniform grid, which reaches low diver gence, has to indeed rely on many

    simplify ing assumptions in order to do that.

  • 7/27/2019 Thinking Parallel, Part I

    5/6

    10/10/13 Thinking Parallel, Part I: Coll ision Detection on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 5/6

    In my next po st I will focus on hierarchical tree trav ersal as a means of performing collision

    detec tion, with an aim to shed more light on what it really means to optimize for low div ergence .

    Abo ut the autho r: Tero Karras is a graphics researc h scientist at NV IDIA Researc h.

    Parallel Forall is the NVI DIA Parallel Programming blog. If you enjoy ed this post, subsc ribe to

    theParallel Forall RSS fee d!

    NVIDIA Developer Programs

    Get exclusiv e access to the lat est softwar e, report bugs and r eceiv e notifications for special ev ents.

    Lear n m ore and Register

    Recommended Reading

    About Par al lel Forall

    Conta ct Para llel Forall

    Parallel Forall Blog

    Featured Articles

    Prev iousPauseNext

    Tag Index

    accelerometer (1) Alg orithm s (3) An droid (1 ) ANR (1 ) ARM (2) Ar ray Fire (1) Au di (1) Aut omotiv e &

    Embedded (1 ) Blog (2 0) Blog (2 3) Cluster (4 ) competition (1 ) Compilation (1 ) Concur rency (2)

    Copperh ead (1 ) CUDA (2 3) CUDA 4.1 (1 ) CUDA 5.5 (3 ) CUDA C (1 5) CUDA Fort ra n (1 0) CUDA Pro Tip

    (1 ) CUDA Profiler (1 ) CUDA Spotligh t (1 ) CUDA Zone (81 ) CUDACasts (2) Debug (1 ) Debugg er (1 )

    Debugging (3) Dev elop 4 Shield (1) dev elopment kit (1 ) DirectX (3) Eclipse (1 ) Ev ents (2) FFT (1 ) Finite

    Difference (4) Floatin g Point (2 ) Gam e & Graphics Dev elopment (3 5) Gam es and Gra phics (8) GeForce

    Dev eloper Stories (1 ) getting started (1) google io (1 ) GTC (2) Hardwar e (1 ) Interv iew (1 ) Kepler (1 )

    Lamborghini (1 ) Librar ies (4) m emory (6) Mobile Dev elopment (2 7 ) Monte Car lo (1 ) MPI (2) Multi -GPU

    (3) n ativ e_app_glue (1 ) NDK (1 ) NPP (1 ) Nsight (2) N Sight Eclipse Edition (1 ) Nsight Tegra (1 )

    NSIGHT Visual Stu dio Edition (1 ) Num baPro (2) Nu m erics (1) NV IDIA Parallel Nsight (1 ) nv idia-smi

    (1 ) Occupancy (1 ) OpenACC (6) OpenGL (3) OpenGL ES (1) Parallel Forall (6 9) Parallel Nsight (1 )

    Parallel Progr am m ing (5) PerfHUD ES (2) Perform anc e (4) Port ability (1 ) Port ing (1 ) Pro Tip (5)

    Professional Gra phics (6) Profiling (3) Program m ing Lang ua ges (1) Py thon (3 ) Robotics (1 ) Shape

    Nsight Visual Studio Edition 3 .1 Final

    Now...

    https://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/content/nsight-visual-studio-edition-3_1-final-now-available-visual-studio-2012-direct3d-111-and-cudahttps://developer.nvidia.com/content/nsight-visual-studio-edition-3_1-final-now-available-visual-studio-2012-direct3d-111-and-cudahttps://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4872?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4858?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4841?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4526?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4847?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4869?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4850?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3546?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4831?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/231?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/176?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4880?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/311?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3356?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4621?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4846?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4944?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4626?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4681?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4953?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4903?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4938?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4949?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4776?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4701?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/296?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4866?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4853?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4756?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4871?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4836?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/431?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4862?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4870?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4859?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/181?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4201?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/421?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4948?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4977?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/1?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4676?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4851?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4946?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4741?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4811?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/3131?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/276?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4915?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4521?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4372?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4874?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/2?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4336?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/466?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4873?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/456?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4501?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4933?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4631?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/166?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4857?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4845?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4864?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4860?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4852?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4917?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4905?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4786?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/436?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4842?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4868?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4867?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/496?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4843?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/321?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttp://developer.nvidia.com/blog/feed/2http://developer.nvidia.com/content/contact-parallel-forallhttp://developer.nvidia.com/about-parallel-forallhttps://developer.nvidia.com/blog?term=2https://developer.nvidia.com/registered-developer-programshttps://developer.nvidia.com/blog/feed/2
  • 7/27/2019 Thinking Parallel, Part I

    6/6

    10/10/13 Thinking Parallel, Part I: Collision Detection on the GPU | NVIDIA Developer Zone

    https://developer.nvidia.com/content/thinking-parallel-part-i-collision-detection-gpu 6/6

    Sensing (1 ) Shared Memory (6) Shield (1) Stream s (2) tablet (1 ) TADP (1 ) Technologies (3) t egra (5)

    Tegra A ndroid Developer Pack (1 ) Tegra A ndroid Development Pack (1 ) Tegra Dev eloper Stories (1)

    Tegra Profiler (1 ) Tegra Zone (1 ) Textur es (1) Th rust (3 ) Tools (10) tools (2) Toradex (1 ) Visual Stu dio

    (3 ) Windows 8 (1 ) xoom (1 ) Zone In (1 )

    Developer Blogs

    Parallel Forall Blog

    About

    Contact

    Copyright 2013 NVIDIA Corporation

    Legal Inf ormation

    Priv acy Policy

    Code of C onduct

    https://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/code-of-conducthttp://www.nvidia.com/object/privacy_policy.htmlhttp://www.nvidia.com/object/legal_info.htmlhttps://developer.nvidia.com/contacthttp://www.nvidia.com/page/companyinfo.htmlhttp://developer.nvidia.com/blog/feed/2https://developer.nvidia.com/blog/tags/4392?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4211?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4945?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4161?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4861?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/256?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/50?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4821?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4849?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4397?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4939?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4971?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4940?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4914?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/51?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/221?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4865?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/316?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4844?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4854?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4848?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublashttps://developer.nvidia.com/blog/tags/4937?display[%24ne]=char%28119%2C104%2C115%2C83%2C81%2C76%2C105%29content%2Fincomplete-lu-and-cholesky-preconditioned-iterative-methods-using-cusparse-and-cublas