lab project gpu programming ss12 - tum · gpu programming lab project optical flow and super...
TRANSCRIPT
![Page 1: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/1.jpg)
SS12
GPU ProgrammingLab Project
Optical Flow and Super Resolution
Ross Kidson 03627521Oliver Dunkley 03631802
![Page 2: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/2.jpg)
Introduction
Contents○ Implementation environment, methods○ Optical Flow
■ example■ theory■ implementation■ results & performance
○ Super Resolution■ theory■ implementation■ results
![Page 3: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/3.jpg)
Implementation
● nsight, ubuntu● two code bases● focused on benchmarking/comparing
memory types● python scripts & open office
![Page 4: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/4.jpg)
Benchmarking methods
● GPU - NVidia Visual Profiler● CPU - custom 'benchmark' class to reproduce the same
data
Benchmark::instance()->addEvent(Benchmark::start, "add_flow_fields"); for(unsigned int p=0;p<nx_fine*ny_fine;p++){
_u1[p] += _u1lvl[p];_u2[p] += _u2lvl[p];
}Benchmark::instance()->addEvent(Benchmark::end, "add_flow_fields");
.
.
.
Benchmark::instance()->doBenchmark(); //Save collected data to file
![Page 5: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/5.jpg)
Optical flow
"Pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer and the scene"
Usages● motion detection● object segmentation● time-to-collision ● motion compensated encoding● stereo disparity measurement
![Page 6: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/6.jpg)
Optical flow: Example input street 1
![Page 7: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/7.jpg)
Optical flow: Example input street 2
![Page 8: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/8.jpg)
Optical flow: Example output
![Page 9: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/9.jpg)
Optical flow
![Page 10: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/10.jpg)
Optical flow Example input Tron 1
![Page 11: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/11.jpg)
Optical flow Example input Tron 2
![Page 12: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/12.jpg)
Optical flow Overlayed output
![Page 13: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/13.jpg)
Optical flow: Formulation
Given two images I1 and I2 one computes a field vector u that matches image intensities:
Formulate as an energy function to minimize
![Page 14: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/14.jpg)
Optical flow: Formulation
Apply taylor series expansion and robust penalty term:
This can be solved with Euler-Lagrange equations and reformulated into ax = b form, which can then be solved using SOR
![Page 15: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/15.jpg)
resampleAreaParallel(u(x)k-1 )
backwardRegistrationBilinearFunctionTex(I2 , u(x)
setKernel(du(x),0)
sorflow_update_robustifications_warp_tex(u(x), du(x))
sorflow_update_righthandside_tex(u(x), b
sorflow_nonlinear_warp_sor_tex(b, ) du(x)
addKernel(u(x),du(x))
inner iteration (SOR) loop
outer iteration (update robustifications) loop
next pyramid level loop
coarse --> fineOutput: u(x)
1. Resize initial u(x) to current level
2. Warp the original image at this level by resized flow field u(x)
3. Set du to 0
4. Update robustification terms based on current u(x) and du(x)
5. Update right hand side of equation to solve - b term
6. Solve for du(x)
7. Update robustifications and b and resolve for du(x)
8. add du to u
9. Continue to next pyramid level
u(x)k
I2_warp
![Page 16: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/16.jpg)
Optical Flow ResultsOuter Iterations
40
40
101
1
10
Inne
r Ite
ratio
ns
x145
x147
x163x162
x141
x137x104
x143
x156
![Page 17: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/17.jpg)
Applying GPU Techniques
● registers
● constant memory
● shared memory
● texture memory
● global memory
● kernels, blocks, pitch, warp
![Page 18: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/18.jpg)
Variables tested
Already in Texture Memory● Image gradients Ix, Iy, It● backwardRegistration - I2
Added to Texture Memory● ● u(x)
Added to Shared Memory● u(x)● du(x)
resampleAreaParallel(u(x)k-1 )
backwardRegistrationBilinearFunctionTex(I2 , u(x))
setKernel(du(x),0)
sorflow_update_robustifications_warp_tex(u(x), du(x))
sorflow_update_righthandside_tex(u(x), )
sorflow_nonlinear_warp_sor_tex(b, )
addKernel(u(x),du(x))
inner iteration (SOR) loop
outer iteration (update robustifications) loop
next pyramid level loop
Output: u(x)
![Page 19: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/19.jpg)
Optical Flow Results
![Page 20: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/20.jpg)
Optical Flow Results
![Page 21: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/21.jpg)
Optical Flow Results
![Page 22: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/22.jpg)
Optical Flow Results
![Page 23: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/23.jpg)
GPU Techniques - Small Hack 1
if (x == 0) shared_mem[0][ty] = shared_mem[tx][ty]; //one at a timevs
if (x == 0){ shared_mem1[0][ty] = shared_mem1[tx][ty]; //1 to N in parallel shared_memN[0][ty] = shared_mem2[tx][ty]; }
![Page 24: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/24.jpg)
Optical Flow Results
![Page 25: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/25.jpg)
Optical Flow Results
![Page 26: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/26.jpg)
Super Resolution: Formulation
Given a number of degraded observations of image I that have been transformed by linear operations , a set of linear equations can be obtained:
![Page 27: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/27.jpg)
Super Resolution: Formulation
Energy function with regularity penalty term:
![Page 28: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/28.jpg)
Super Resolution: General Method
Super Res image I
Lower resolution images In
I1 I2 I3 I4 I6 I7 I8 I9I
![Page 29: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/29.jpg)
Super Resolution: General Method
1. Warp inital guess image I to I1
(BackwardRegistation)
I1
Iwarped
I2 I3 I4 I6 I7 I8 I9I
![Page 30: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/30.jpg)
Super Resolution: General Method
2. Shrink warped image and perform gaussian blur
I1
Iwarped
I2 I3 I4 I6 I7 I8 I9I
![Page 31: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/31.jpg)
Super Resolution: General Method
3. Subtract images
I1
Iwarped
(-)
Idiff
I2 I3 I4 I6 I7 I8 I9I
![Page 32: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/32.jpg)
Super Resolution: General Method
4. Resample difference image back to larger resolution
I1
Idiff resampled
I2 I3 I4 I6 I7 I8 I9I
![Page 33: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/33.jpg)
Super Resolution: General Method
5. Warp back to middle image.
(ForwardRegistration)
I1
Idiff resampled warped
I2 I3 I4 I6 I7 I8 I9I
![Page 34: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/34.jpg)
Super Resolution: General Method
6. Repeat for all other images and add all difference images together
I1
I2 diff resampled warped
I3 diff resampled warped
I4 diff resampled warped
In diff resampled warped
I2 I3 I4 I6 I7 I8 I9I
![Page 35: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/35.jpg)
Super Resolution: General Method
7. Repeat the entire process until image converges
I1
I2 diff resampled warped
I3 diff resampled warped
I4 diff resampled warped
In diff resampled warped
I2 I3 I4 I6 I7 I8 I9I
![Page 36: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/36.jpg)
Image loop 1:Calculate diff images
For each image In:
backwardRegistrationBilinearValueTex(uor(x), u(x))
gaussBlurSeparateMirrorGpu() [const memory!]
resampleAreaParallelSeparate()
dualL1Difference()
Results in a vector of original sized difference images
- =
shrink
![Page 37: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/37.jpg)
Image loop 2: Warp difference images forward
For each difference image
resampleAreaParallelSeparateAdjoined()
gaussBlurSeparateMirrorGpu()
forewardRegistrationBilinearAtomic()
addKernel(diff_image, temp_accumulator)
Results in an image of all summed differences together
![Page 38: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/38.jpg)
calculate for # number of outer iterations
Putting it together (for one image)
Image loop 1: Calculate difference Images
Image loop 2: Morph and sum difference images
xi1, xi2dualTVHuber(uor)
Final Result
primal1N(xi1,xi2,difference_sum,u,uor) uor(x), u(x)
setKernel(differnce_sum,0)
![Page 39: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/39.jpg)
Super Resolution Output &Cuda Tex2d(...) differences
differencemanual cuda tex2d(...)original
![Page 40: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/40.jpg)
Super Resolution GPU Performance
22x Speed-up
![Page 41: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/41.jpg)
Super Resolution Performance
![Page 42: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/42.jpg)
GPU Techniques - Small Hacks 2
const int tx_1 = tx == 0 ? tx : tx - 1;vs
const int tx_1 = tx - 1 * (x > 0);
![Page 43: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/43.jpg)
Debugging techniques: Compare Image class
● Automatically compare cpu to gpu images○ Compares pixel values○ Errors not always visible
● Outputs which failed, stats, display image differences● Facilitates code testing after optimizations/hacks● Problem with texture memory (low similarity thresholds)
ImageComparison* ic = ImageComparison::instance();
CPU:ic->addImage(_I1pyramid->level[rec_depth], "CI1",rec_depth, nx_fine, ny_fine,1, "CPU");ic->addImage(_I2pyramid->level[rec_depth], "CI2",rec_depth, nx_fine, ny_fine,1, "CPU");ic->dumpData("cpu_images"); //save data to disk....
GPU:ic->addImage(_I1pyramid->level[rec_depth], "CI1", rec_depth,nx_fine, ny_fine, pitch,"GPU");ic->addImage(_I2pyramid->level[rec_depth], "CI2", rec_depth,nx_fine, ny_fine, pitch,"GPU");....ImageComparison::instance()->compareImages("cpu_images");
![Page 44: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/44.jpg)
Difficulties
● Diving into the code● Compiling on a local system● Debugging segaults (printf, __LINE__, __FILE__ helped...)
● Texture offsets● Incorrect in/out pitches● Forgetting to catchkernel (oops)● Evaluating massive amounts of data● Maintaining two code bases● Combining benchmark data
![Page 45: Lab Project GPU Programming SS12 - TUM · GPU Programming Lab Project Optical Flow and Super Resolution Ross Kidson 03627521 ... results. Implementation nsight, ubuntu two code bases](https://reader033.vdocument.in/reader033/viewer/2022053004/5f0832ae7e708231d420d2c3/html5/thumbnails/45.jpg)
References:
Nils Papenberg, Andres Bruhn, Thomas Brox, Stephan Didas, and Joachim Weickert. 2006. Highly Accurate Optic Flow Computation with Theoretically Justified Warping. Int. J. Comput. Vision 67, 2 (April 2006), 141-158.
Markus Unger, Thomas Pock, Manuel Werlberger, and Horst Bischof. 2010. A convex approach for variational super-resolution. In Proceedings of the 32nd DAGM conference on Pattern recognition, Michael Goesele, Stefan Roth, Arjan Kuijper, Bernt Schiele, and Konrad Schindler (Eds.). Springer-Verlag, Berlin, Heidelberg, 313-322.
NVIDIA CUDA C Best Practices Guide DG-05603-001_v4.1 | January 2012