Download - Accerelate Framwork
![Page 1: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/1.jpg)
These are confidential sessions—please refrain from streaming, blogging, or taking pictures
Fast and energy efficient computation
Session 713
Accelerate Framework
Geoff BelterEngineer, Vector and Numerics Group
![Page 2: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/2.jpg)
What is it?Accelerate Framework
• Easy access to a lot of functionality•Accurate• Fast with low energy usage•Works on both OS X and iOS•Optimized for all of generations of hardware
![Page 3: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/3.jpg)
What operations are available?Accelerate Framework
• Image processing (vImage)•Digital signal processing (vDSP)• Transcendental math functions (vForce, vMathLib)• Linear algebra (LAPACK, BLAS)
![Page 4: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/4.jpg)
Session goalsAccelerate Framework
•How Accelerate helps you•Areas of your code likely to benefit from Accelerate•How you use Accelerate
![Page 5: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/5.jpg)
Why is it fast?Accelerate Framework
![Page 6: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/6.jpg)
Why is it fast?Accelerate Framework
• SIMD instructions■ SSE, AVX and NEON
![Page 7: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/7.jpg)
Why is it fast?Accelerate Framework
• SIMD instructions■ SSE, AVX and NEON
•Match the micro-architecture■ Instruction selection and scheduling■ Software pipelining■ Loop unrolling
![Page 8: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/8.jpg)
Why is it fast?Accelerate Framework
• SIMD instructions■ SSE, AVX and NEON
•Match the micro-architecture■ Instruction selection and scheduling■ Software pipelining■ Loop unrolling
•Multi-threaded using GCD
![Page 9: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/9.jpg)
Tips for Successful Use of Accelerate
![Page 10: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/10.jpg)
Tips for Successful Use of Accelerate
• Prepare your data■ Contiguous■ 16-byte aligned
![Page 11: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/11.jpg)
Tips for Successful Use of Accelerate
• Prepare your data■ Contiguous■ 16-byte aligned
•Understand problem size
![Page 12: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/12.jpg)
Tips for Successful Use of Accelerate
• Prepare your data■ Contiguous■ 16-byte aligned
•Understand problem size•Do setup once/destroy at the end
![Page 13: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/13.jpg)
Using Accelerate FrameworkXcode
![Page 14: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/14.jpg)
Using Accelerate FrameworkXcode
![Page 15: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/15.jpg)
Using Accelerate FrameworkXcode
![Page 16: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/16.jpg)
XcodeUsing Accelerate Framework
![Page 17: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/17.jpg)
XcodeUsing Accelerate Framework
![Page 18: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/18.jpg)
XcodeUsing Accelerate Framework
![Page 19: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/19.jpg)
XcodeUsing Accelerate Framework
![Page 20: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/20.jpg)
XcodeUsing Accelerate Framework
![Page 21: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/21.jpg)
XcodeUsing Accelerate Framework
![Page 22: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/22.jpg)
Command L=lineUsing Accelerate Framework
![Page 23: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/23.jpg)
cc -framework Accelerate main.c
Command L=lineUsing Accelerate Framework
![Page 24: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/24.jpg)
What operations are available?Accelerate Framework
• Image processing (vImage)•Digital signal processing (vDSP)• Transcendental math functions (vForce, vMathLib)• Linear algebra (LAPACK, BLAS)
![Page 25: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/25.jpg)
Vectorized image processing libraryvImage
![Page 26: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/26.jpg)
vImageWhat’s available?
![Page 27: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/27.jpg)
vImageWhat’s available?
![Page 28: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/28.jpg)
Additions and Improvements
![Page 29: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/29.jpg)
Additions and Improvements
• Improved conversion support
![Page 30: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/30.jpg)
Additions and Improvements
• Improved conversion support• vImage Buffer creation utilities
![Page 31: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/31.jpg)
Additions and Improvements
• Improved conversion support• vImage Buffer creation utilities• Resampling of 16-bit images
![Page 32: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/32.jpg)
Additions and Improvements
• Improved conversion support• vImage Buffer creation utilities• Resampling of 16-bit images• Streamlined Core Graphics interoperability
![Page 33: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/33.jpg)
Core Graphics Interoperability
![Page 34: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/34.jpg)
Core Graphics Interoperability
•How do I use vImage with my CGImageRef?
![Page 35: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/35.jpg)
Core Graphics Interoperability
•How do I use vImage with my CGImageRef?•New utility functions• vImageBuffer_InitWithCGImage• vImageCreateCGImageFromBuffer
![Page 36: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/36.jpg)
Core Graphics InteroperabilityFrom CGImageRef to vImageBuffer
#include <Accelerate/Accelerate.h>
// Create and prepare CGImageRefCGImageRef inImg;
// Specify FormatvImage_CGImageFormat format = { .bitsPerComponent = 8, .bitsPerPixel = 32, .colorSpace = NULL, .bitmapInfo = kCGImageAlphaFirst, .version = 0, .decode = NULL, .renderingIntent = kCGRenderingIntentDefault,};
// Create vImageBuffervImage_Buffer inBuffer;vImageBuffer_InitWithCGImage(&inBuffer, &format, NULL, inImg, kvImageNoFlags);
![Page 37: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/37.jpg)
Core Graphics InteroperabilityFrom CGImageRef to vImageBuffer
#include <Accelerate/Accelerate.h>
// Create and prepare CGImageRefCGImageRef inImg;
// Specify FormatvImage_CGImageFormat format = { .bitsPerComponent = 8, .bitsPerPixel = 32, .colorSpace = NULL, .bitmapInfo = kCGImageAlphaFirst, .version = 0, .decode = NULL, .renderingIntent = kCGRenderingIntentDefault,};
// Create vImageBuffervImage_Buffer inBuffer;vImageBuffer_InitWithCGImage(&inBuffer, &format, NULL, inImg, kvImageNoFlags);
![Page 38: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/38.jpg)
Core Graphics InteroperabilityFrom CGImageRef to vImageBuffer
#include <Accelerate/Accelerate.h>
// Create and prepare CGImageRefCGImageRef inImg;
// Specify FormatvImage_CGImageFormat format = { .bitsPerComponent = 8, .bitsPerPixel = 32, .colorSpace = NULL, .bitmapInfo = kCGImageAlphaFirst, .version = 0, .decode = NULL, .renderingIntent = kCGRenderingIntentDefault,};
// Create vImageBuffervImage_Buffer inBuffer;vImageBuffer_InitWithCGImage(&inBuffer, &format, NULL, inImg, kvImageNoFlags);
![Page 39: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/39.jpg)
Core Graphics InteroperabilityFrom CGImageRef to vImageBuffer
#include <Accelerate/Accelerate.h>
// Create and prepare CGImageRefCGImageRef inImg;
// Specify FormatvImage_CGImageFormat format = { .bitsPerComponent = 8, .bitsPerPixel = 32, .colorSpace = NULL, .bitmapInfo = kCGImageAlphaFirst, .version = 0, .decode = NULL, .renderingIntent = kCGRenderingIntentDefault,};
// Create vImageBuffervImage_Buffer inBuffer;vImageBuffer_InitWithCGImage(&inBuffer, &format, NULL, inImg, kvImageNoFlags);
![Page 40: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/40.jpg)
Core Graphics InteroperabilityFrom vImageBuffer to CGImageRef
#include <Accelerate/Accelerate.h>
// The output buffervImage_Buffer outBuffer;
// Create CGImageRefvImage_Error error;CGImageRef outImg = vImageCreateCGImageFromBuffer(&outBuffer, &format, NULL, !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! NULL, kvImageNoFlags, &error);
![Page 41: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/41.jpg)
Core Graphics InteroperabilityFrom vImageBuffer to CGImageRef
#include <Accelerate/Accelerate.h>
// The output buffervImage_Buffer outBuffer;
// Create CGImageRefvImage_Error error;CGImageRef outImg = vImageCreateCGImageFromBuffer(&outBuffer, &format, NULL, !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! NULL, kvImageNoFlags, &error);
![Page 42: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/42.jpg)
Core Graphics InteroperabilityFrom vImageBuffer to CGImageRef
#include <Accelerate/Accelerate.h>
// The output buffervImage_Buffer outBuffer;
// Create CGImageRefvImage_Error error;CGImageRef outImg = vImageCreateCGImageFromBuffer(&outBuffer, &format, NULL, !! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! NULL, kvImageNoFlags, &error);
![Page 43: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/43.jpg)
Core Graphics Interoperability
• Convert between vImage_CGImageFormat types• Tips for use
■ Create converter once■ Use many times
vImageConvert_AnyToAny
![Page 44: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/44.jpg)
0
10
20
30
ARGB
ARGB
pre
mul
BGRA
pre
mul
RGBA
RGBA
pre
mul
RGBA
(flo
at)
RGBA
(flo
at) p
rem
ul
RGBA
(uin
t16)
RGBA
(uin
t16)
pre
mul
MPi
xel/s
Software JPEG Encode Performance
Bigger is Better
iPhone 5
Old methodAnyToAny
![Page 45: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/45.jpg)
0
10
20
30
ARGB
ARGB
pre
mul
BGRA
pre
mul
RGBA
RGBA
pre
mul
RGBA
(flo
at)
RGBA
(flo
at) p
rem
ul
RGBA
(uin
t16)
RGBA
(uin
t16)
pre
mul
MPi
xel/s
Software JPEG Encode Performance
Bigger is Better
iPhone 5
Old methodAnyToAny
![Page 46: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/46.jpg)
0
10
20
30
ARGB
ARGB
pre
mul
BGRA
pre
mul
RGBA
RGBA
pre
mul
RGBA
(flo
at)
RGBA
(flo
at) p
rem
ul
RGBA
(uin
t16)
RGBA
(uin
t16)
pre
mul
MPi
xel/s
Software JPEG Encode Performance
Bigger is Better
iPhone 5
Old methodAnyToAny
![Page 47: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/47.jpg)
0
10
20
30
ARGB
ARGB
pre
mul
BGRA
pre
mul
RGBA
RGBA
pre
mul
RGBA
(flo
at)
RGBA
(flo
at) p
rem
ul
RGBA
(uin
t16)
RGBA
(uin
t16)
pre
mul
MPi
xel/s
Software JPEG Encode Performance
Bigger is Better
iPhone 5
Old methodAnyToAny
![Page 48: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/48.jpg)
Scaling with a premultiplied PlanarF imageConversion Example
#include <Accelerate/Accelerate.h>
vImage_Buffer src, dst, alpha;...// Premultiplied data -> Non-premultiplied data, works in-placevImageUnpremultiplyData_PlanarF(&src, &alpha, &src, kvImageNoFlags);
// Resize the imagevImageScale_PlanarF(&src, &dst, NULL, kvImageNoFlags);
// Non-premultiplied data -> Premultiplied data, works in-placevImagePremultiplyData_PlanarF(&dst, &alpha, &dst, kvImageNoFlags);
![Page 49: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/49.jpg)
Scaling with a premultiplied PlanarF imageConversion Example
#include <Accelerate/Accelerate.h>
vImage_Buffer src, dst, alpha;...// Premultiplied data -> Non-premultiplied data, works in-placevImageUnpremultiplyData_PlanarF(&src, &alpha, &src, kvImageNoFlags);
// Resize the imagevImageScale_PlanarF(&src, &dst, NULL, kvImageNoFlags);
// Non-premultiplied data -> Premultiplied data, works in-placevImagePremultiplyData_PlanarF(&dst, &alpha, &dst, kvImageNoFlags);
![Page 50: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/50.jpg)
Scaling with a premultiplied PlanarF imageConversion Example
#include <Accelerate/Accelerate.h>
vImage_Buffer src, dst, alpha;...// Premultiplied data -> Non-premultiplied data, works in-placevImageUnpremultiplyData_PlanarF(&src, &alpha, &src, kvImageNoFlags);
// Resize the imagevImageScale_PlanarF(&src, &dst, NULL, kvImageNoFlags);
// Non-premultiplied data -> Premultiplied data, works in-placevImagePremultiplyData_PlanarF(&dst, &alpha, &dst, kvImageNoFlags);
![Page 51: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/51.jpg)
Scaling with a premultiplied PlanarF imageConversion Example
#include <Accelerate/Accelerate.h>
vImage_Buffer src, dst, alpha;...// Premultiplied data -> Non-premultiplied data, works in-placevImageUnpremultiplyData_PlanarF(&src, &alpha, &src, kvImageNoFlags);
// Resize the imagevImageScale_PlanarF(&src, &dst, NULL, kvImageNoFlags);
// Non-premultiplied data -> Premultiplied data, works in-placevImagePremultiplyData_PlanarF(&dst, &alpha, &dst, kvImageNoFlags);
![Page 52: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/52.jpg)
Scaling with a premultiplied PlanarF imageConversion Example
#include <Accelerate/Accelerate.h>
vImage_Buffer src, dst, alpha;...// Premultiplied data -> Non-premultiplied data, works in-placevImageUnpremultiplyData_PlanarF(&src, &alpha, &src, kvImageNoFlags);
// Resize the imagevImageScale_PlanarF(&src, &dst, NULL, kvImageNoFlags);
// Non-premultiplied data -> Premultiplied data, works in-placevImagePremultiplyData_PlanarF(&dst, &alpha, &dst, kvImageNoFlags);
![Page 53: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/53.jpg)
Conversions in ContextScaling on a premultiplied PlanarF image
0 25 50 75 100
Unpremultiply
Scale
Premultiply
Percentage of time taken
![Page 54: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/54.jpg)
Conversions in ContextScaling on a premultiplied PlanarF image
0 25 50 75 100
Unpremultiply
Scale
Premultiply 0.64%
98.26%
1.10%
Percentage of time taken
![Page 55: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/55.jpg)
![Page 56: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/56.jpg)
vImage vs. OpenCV
![Page 57: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/57.jpg)
The competitionOpenCV
•Open source computer vision library• Image processing module
![Page 58: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/58.jpg)
Points of Comparison
• Execution time• Energy consumed
![Page 59: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/59.jpg)
vImage Speedup over OpenCViPhone 5
0 5 10 15 20 25
Histogram
Max
Box Convolve
Affine Warp
Speedup over OpenCV
Above one is better
(lanczos)
![Page 60: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/60.jpg)
vImage Speedup over OpenCViPhone 5
0 5 10 15 20 25
Histogram
Max
Box Convolve
Affine Warp 6.41
7.88
23.19
1.62
Speedup over OpenCV
Above one is better
(lanczos)
![Page 61: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/61.jpg)
Energy Consumption and Battery Life
• Fast code tends to■ Decrease energy consumption■ Increase battery life
![Page 62: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/62.jpg)
Typical Energy Consumption Profile
Pow
er
Idle power
Instantaneous power
Energy
t1t0
p2
p1
p0
Time
![Page 63: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/63.jpg)
Typical Energy Consumption Profile
Pow
er
Idle power
Instantaneous power
Time
Unoptimized
Opt
imiz
ed
![Page 64: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/64.jpg)
vImage Energy Savings over OpenCViPhone 5
0 2 4 6 8
Histogram
Max
Box Convolve
Affine Warp
Times less energy than OpenCV
(lanczos)
Above one is better
![Page 65: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/65.jpg)
vImage Energy Savings over OpenCViPhone 5
0 2 4 6 8
Histogram
Max
Box Convolve
Affine Warp 6.96
4.05
6.71
0.75
Times less energy than OpenCV
(lanczos)
Above one is better
![Page 66: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/66.jpg)
![Page 67: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/67.jpg)
“Using vImage from the Accelerate framework to dynamically pre-render my sprites. It’s the only way to make it fast. ;-)”
–Twitter User
![Page 68: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/68.jpg)
What operations are available?Accelerate Framework
• Image processing (vImage)•Digital signal processing (vDSP)• Transcendental math functions (vForce, vMathLib)• Linear algebra (LAPACK, BLAS)
![Page 69: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/69.jpg)
Vectorized digital signal processing libraryvDSP
![Page 70: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/70.jpg)
• Basic operations on arrays■ Add, subtract, multiply, conversion, accumulation, etc.
•Discrete Fourier Transform• Convolution and correlation
vDSPWhat’s available?
![Page 71: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/71.jpg)
New and Improved in vDSP
![Page 72: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/72.jpg)
New and Improved in vDSP
•Multi-channel IIR filter
![Page 73: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/73.jpg)
New and Improved in vDSP
•Multi-channel IIR filter• Improved power of 2 support
■ Discrete Fourier Transform (DFT)■ Discrete Cosine Transform (DCT)
![Page 74: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/74.jpg)
Discrete Fourier Transform (DFT)
• Same operation, two entries based on number of points
![Page 75: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/75.jpg)
Discrete Fourier Transform (DFT)
• Same operation, two entries based on number of points
256
FFT
384
DFT
Before
![Page 76: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/76.jpg)
Discrete Fourier Transform (DFT)
• Same operation, two entries based on number of points
256
FFT
384
DFT
256 384
DFT
Before OS X 10.9, iOS 7
![Page 77: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/77.jpg)
#include <Accelerate/Accelerate.h>
// Create and prepare data:float *Ir,*Ii,*Or,*Oi;
// Once at start:vDSP_DFT_Setup setup = vDSP_DFT_zop_CreateSetup(0, 1024, vDSP_DFT_FORWARD);... vDSP_DFT_Execute(setup, Ir, Ii, Or, Oi);...// Once at end:vDSP_DFT_DestroySetup(setup);
DFT Example
![Page 78: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/78.jpg)
#include <Accelerate/Accelerate.h>
// Create and prepare data:float *Ir,*Ii,*Or,*Oi;
// Once at start:vDSP_DFT_Setup setup = vDSP_DFT_zop_CreateSetup(0, 1024, vDSP_DFT_FORWARD);... vDSP_DFT_Execute(setup, Ir, Ii, Or, Oi);...// Once at end:vDSP_DFT_DestroySetup(setup);
DFT Example
![Page 79: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/79.jpg)
#include <Accelerate/Accelerate.h>
// Create and prepare data:float *Ir,*Ii,*Or,*Oi;
// Once at start:vDSP_DFT_Setup setup = vDSP_DFT_zop_CreateSetup(0, 1024, vDSP_DFT_FORWARD);... vDSP_DFT_Execute(setup, Ir, Ii, Or, Oi);...// Once at end:vDSP_DFT_DestroySetup(setup);
DFT Example
![Page 80: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/80.jpg)
#include <Accelerate/Accelerate.h>
// Create and prepare data:float *Ir,*Ii,*Or,*Oi;
// Once at start:vDSP_DFT_Setup setup = vDSP_DFT_zop_CreateSetup(0, 1024, vDSP_DFT_FORWARD);... vDSP_DFT_Execute(setup, Ir, Ii, Or, Oi);...// Once at end:vDSP_DFT_DestroySetup(setup);
DFT Example
![Page 81: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/81.jpg)
#include <Accelerate/Accelerate.h>
// Create and prepare data:float *Ir,*Ii,*Or,*Oi;
// Once at start:vDSP_DFT_Setup setup = vDSP_DFT_zop_CreateSetup(0, 1024, vDSP_DFT_FORWARD);... vDSP_DFT_Execute(setup, Ir, Ii, Or, Oi);...// Once at end:vDSP_DFT_DestroySetup(setup);
DFT Example
![Page 82: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/82.jpg)
![Page 83: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/83.jpg)
vDSP vs. FFTW
![Page 84: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/84.jpg)
Fastest Fourier Transform in the westFFTW
•One and multi-dimensional transforms• Real and complex data• Parallel
![Page 85: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/85.jpg)
vDSP Speedup over FFTWDFT on iPhone 5
![Page 86: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/86.jpg)
vDSP Speedup over FFTW
240 256 320 384 480 512 640 768 9600
1
2
3
Spee
dup
Number of Points
DFT on iPhone 5
Above one is better
![Page 87: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/87.jpg)
vDSP Speedup over FFTW
240 256 320 384 480 512 640 768 9600
1
2
3
Spee
dup
Number of Points
DFT on iPhone 5
Above one is better
![Page 88: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/88.jpg)
•Used in FaceTime•DFT one of many DSP routines• Percentage of time spent in DFT
DFT In UseAAC Enhanced Low Delay
![Page 89: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/89.jpg)
DFT In UseAAC Enhanced Low Delay
![Page 90: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/90.jpg)
DFT In UseAAC Enhanced Low Delay
47%54%
DFTEverything Else
FFTW
Percent time spent in DFT
![Page 91: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/91.jpg)
DFT In UseAAC Enhanced Low Delay
47%54%
DFTEverything Else
FFTW
70%
30%
vDSP
Percent time spent in DFT
![Page 92: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/92.jpg)
Data Types in vDSP
• Single and double precision• Real and complex• Support for strided data access
![Page 93: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/93.jpg)
“Wanna do FFT on iOS?Use the Accelerate.framework.
Highly recommended. #ios”
–Twitter User
![Page 94: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/94.jpg)
What operations are available?Accelerate Framework
• Image processing (vImage)•Digital signal processing (vDSP)• Transcendental math functions (vForce, vMathLib)• Linear algebra (LAPACK, BLAS)
![Page 95: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/95.jpg)
Fast Math
Luke ChangEngineer, Vector and Numerics Group
![Page 96: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/96.jpg)
Math for Every Data Length
• Libm for scalar data
float
![Page 97: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/97.jpg)
Math for Every Data Length
• Libm for scalar data• vMathLib for SIMD vectors
float vFloat
![Page 98: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/98.jpg)
Math for Every Data Length
• Libm for scalar data• vMathLib for SIMD vectors• vForce for array data
float vFloat
...float []
![Page 99: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/99.jpg)
Standard C math libraryLibm
![Page 100: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/100.jpg)
Libm
• Standard math library in C• Collection of transcendental functions•Operates on scalar data
■ exp[f ]■ log[f ]■ sin[f ]■ cos[f ]■ pow[f ]■ Etc…
![Page 101: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/101.jpg)
What’s New in Libm?
• Extensions to C11, prefixed with “__”•Added in both iOS 7 and OS X 10.9• __exp10[f ]• __sinpi[f ], __cospi[f ], __tanpi[f ]• __sincos[f ], __sincospi[f ]
![Page 102: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/102.jpg)
__exp10[f]Power of 10
• Commonly used for decibel calculations• Faster than pow(10.0, x)•More accurate than exp(log(10) * x)
■ exp(log(10) * 5.0) =100000.0000000002
![Page 103: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/103.jpg)
__sinpi[f ], __cospi[f ], __tanpi[f ]Trigonometry in Terms of PI
• cospi(x) means cos( *x)• Faster because argument reduction is simpler•More accurate when working with degrees
■ cos(M_PI*0.5) returns 6.123233995736766e-17■ __cospi(0.5) returns exactly 0.0
⇡
![Page 104: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/104.jpg)
__sincos[f ], __sincospi[f ]Sine-Cosine Pairs
• Compute sine and cosine simultaneously• Faster because argument reduction is only done once• Clang will call __sincos[f ] when possible
![Page 105: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/105.jpg)
C11 Features
• Some complex values can’t be specified as literals■ (0.0 + INFINITY * I)
• C11 adds CMPLX macro for this purpose■ CMPLX(0.0, INFINITY)
• CMPLXF and CMPLXL are also available
![Page 106: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/106.jpg)
SIMD vector math libraryvMathLib
![Page 107: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/107.jpg)
vMathLib
• Collection of transcendental functions for SIMD vectors•Operates on SIMD vectors
■ vexp[f ]■ vlog[f ]■ vsin[f ]■ vcos[f ]■ vpow[f ]■ Etc…
![Page 108: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/108.jpg)
Writing your own vector algorithmWhen to Use vMathLib?
•Need transcendental functions in your vector code
![Page 109: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/109.jpg)
Taking sine of a vectorvMathLib Example
•Using Libm
#include <math.h>
vFloat vx = { 1.f, 2.f, 3.f, 4.f };vFloat vy;...float *px = (float *)&vx, *py = (float *)&vy;for( i = 0; i < sizeof(vx)/sizeof(px[0]); ++i ) {
py[i] = sinf(px[i]);}...
![Page 110: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/110.jpg)
Taking sine of a vectorvMathLib Example
•Using vMathLib
#include <Accelerate/Accelerate.h>
vFloat vx = { 1.f, 2.f, 3.f, 4.f };vFloat vy;...vy = vsinf(vx);...
![Page 111: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/111.jpg)
Vectorized math libraryvForce
![Page 112: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/112.jpg)
vForce
• Collection of transcendental functions for arrays•Operates on array data
■ vvexp[f ]■ vvlog[f ]■ vvsin[f ]■ vvcos[f ]■ vvpow[f ]■ Etc…
![Page 113: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/113.jpg)
vForce Example
• Filling a buffer with sine wave using a for loop
#include <math.h>
float buffer[length];float indices[length];
...
for (int i = 0; i < length; i++){ buffer[i] = sinf(indices[i]);}
![Page 114: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/114.jpg)
vForce Example
• Filling a buffer with sine wave using vForce
#include <Accelerate/Accelerate.h>
float buffer[length];float indices[length];
...
vvsinf(buffer, indices, &length);
![Page 115: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/115.jpg)
Better PerformanceMeasured on iPhone 5
![Page 116: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/116.jpg)
vForce
For loop
0 1 2 3 4 5 6
Sines Computed per µs
0 10 20 30 40 50 60
Better PerformanceMeasured on iPhone 5
Bigger is better
56
24
![Page 117: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/117.jpg)
vForce
For loop
0 1 2 3 4 5 6
nJ consumed per sine
0 4 8 12 16 20 24
Less EnergyMeasured on iPhone 5
Smaller is better
![Page 118: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/118.jpg)
vForce
For loop
0 1 2 3 4 5 6
nJ consumed per sine
0 4 8 12 16 20 24
Less EnergyMeasured on iPhone 5
Smaller is better
14.16
23.37
![Page 119: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/119.jpg)
Measured on iPhone 5vForce Performance
0 40 80 120 160 200
truncf
logf
expf
powf
sinf
sincosf
Results computed per µs vForceFor loop
![Page 120: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/120.jpg)
Measured on iPhone 5vForce Performance
0 40 80 120 160 200
truncf
logf
expf
powf
sinf
sincosf 10.6
24
9.2
29
22
161
28.8
56
26.6
120
116
0
Results computed per µs vForceFor loop
826
![Page 121: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/121.jpg)
vForce in Detail
• Supports both float and double•Handles edge cases correctly• Requires minimal data alignment• Supports in-place operation• Improves performance even with small arrays
■ Consider using vForce when more than 16 elements
![Page 122: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/122.jpg)
What operations are available?Accelerate Framework
• Image processing (vImage)•Digital signal processing (vDSP)• Transcendental math functions (vForce, vMathLib)• Linear algebra (LAPACK, BLAS)
![Page 123: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/123.jpg)
Linear Algebra PACKage andBasic Linear Algebra Subprograms
LAPACK and BLAS
Geoff BelterEngineer, Vector and Numerics Group
![Page 124: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/124.jpg)
LAPACK Operations
•High-level linear algebra• Solve linear systems•Matrix factorizations• Eigenvalues and eigenvectors
![Page 125: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/125.jpg)
LINPACK
•How fast can you solve a system of linear equations?
•1000 x 1000 matrices
![Page 126: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/126.jpg)
0 500 1000 1500
LAPACK
“Brand A”
LINPACK benchmark performance in Mflops
Bigger is better
Mflops
![Page 127: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/127.jpg)
0 500 1000 1500
LAPACK
“Brand A” 40
LINPACK benchmark performance in Mflops
Bigger is better
Mflops
![Page 128: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/128.jpg)
0 500 1000 1500
LAPACK
“Brand A”
LINPACK benchmark performance in Mflops
Bigger is better
Mflops
![Page 129: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/129.jpg)
0 500 1000 1500
LAPACK
“Brand A”
LINPACK benchmark performance in Mflops
Mflops
Bigger is better
![Page 130: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/130.jpg)
0 500 1000 1500
LAPACK
“Brand A” 788
LINPACK benchmark performance in Mflops
Mflops
Bigger is better
![Page 131: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/131.jpg)
0 500 1000 1500
LAPACK
Accelerate
“Brand A”
Bigger is better
788
LINPACK benchmark performance in Mflops
Mflops
![Page 132: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/132.jpg)
0 500 1000 1500
LAPACK
Accelerate
“Brand A”
Bigger is better
788
1202
LINPACK benchmark performance in Mflops
Mflops
![Page 133: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/133.jpg)
0 500 1000 1500
LAPACK
Accelerate on iPhone 4S
“Brand A”
Bigger is better
788
1202
LINPACK benchmark performance in Mflops
Mflops
![Page 134: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/134.jpg)
0 500 1000 1500
LAPACK
“Brand A”
Bigger is better
788
1202
LINPACK benchmark performance in Mflops
Mflops
![Page 135: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/135.jpg)
0 500 1000 1500 2000
LAPACK
“Brand A”
Bigger is better
Accelerate on iPhone 5
788
LINPACK benchmark performance in Mflops
Mflops
![Page 136: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/136.jpg)
0
Accelerate on iPhone 4S
LAPACK
“Brand A”
500
Bigger is better
1000 1500 2000 2500 3000 3500 4000
788
LINPACK benchmark performance in Mflops
Mflops
Accelerate on iPhone 5
![Page 137: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/137.jpg)
0
Accelerate on iPhone 4S
LAPACK
“Brand A”
3446
500
Bigger is better
1000 1500 2000 2500 3000 3500 4000
788
LINPACK benchmark performance in Mflops
Mflops
Accelerate on iPhone 5
![Page 138: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/138.jpg)
![Page 139: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/139.jpg)
VS.iPadwith Retina Display G5
Power Mac
![Page 140: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/140.jpg)
iPad with Retina display vs. Power Mac G5LAPACK
• Triumphant return•After 10 years•All fans blazing
![Page 141: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/141.jpg)
LAPACK
0
Accelerate on iPhone 4S
PowerMac G5
iPad with Retina Display
1000 2000 3000 4000
Bigger is better
LINPACK benchmark performance in Mflops
Mflops
![Page 142: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/142.jpg)
LAPACK
0
Accelerate on iPhone 4S
PowerMac G5
iPad with Retina Display
1000 2000 3000 4000
Bigger is better
3643
LINPACK benchmark performance in Mflops
Mflops
![Page 143: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/143.jpg)
LAPACK
0
Accelerate on iPhone 4S
PowerMac G5
iPad with Retina Display
1000 2000 3000 4000
Bigger is better
3643
LINPACK benchmark performance in Mflops
Mflops
![Page 144: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/144.jpg)
LAPACK
0
Accelerate on iPhone 4S
PowerMac G5
iPad with Retina Display
1000 2000 3000 4000
3686
Bigger is better
3643
LINPACK benchmark performance in Mflops
Mflops
![Page 145: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/145.jpg)
LAPACK ExampleSolve linear system
#include <Accelerate/Accelerate.h>
// Create and prepare input and output datadouble *A, *B;__CLPK_integer *ipiv;
// solvedgesv_(&n, &nrhs, A, &n, ipiv, B, &n, &info);
![Page 146: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/146.jpg)
LAPACK ExampleSolve linear system
#include <Accelerate/Accelerate.h>
// Create and prepare input and output datadouble *A, *B;__CLPK_integer *ipiv;
// solvedgesv_(&n, &nrhs, A, &n, ipiv, B, &n, &info);
![Page 147: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/147.jpg)
LAPACK ExampleSolve linear system
#include <Accelerate/Accelerate.h>
// Create and prepare input and output datadouble *A, *B;__CLPK_integer *ipiv;
// solvedgesv_(&n, &nrhs, A, &n, ipiv, B, &n, &info);
![Page 148: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/148.jpg)
BLAS Operations
• Low-level linear algebra• Vector
■ Dot product, scalar product, vector sum
•Matrix-vector■ Matrix-vector product, outer product
•Matrix-matrix■ Matrix multiply
![Page 149: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/149.jpg)
BLAS ExampleMatrix Multiply
#include <Accelerate/Accelerate.h>
// Create and prepare datadouble *A, *B, *C;
// C <-- A * Bcblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 100, 100, 100, 1.0, A, 100, B, 100, 0.0, C, 100);
![Page 150: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/150.jpg)
BLAS ExampleMatrix Multiply
#include <Accelerate/Accelerate.h>
// Create and prepare datadouble *A, *B, *C;
// C <-- A * Bcblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 100, 100, 100, 1.0, A, 100, B, 100, 0.0, C, 100);
![Page 151: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/151.jpg)
BLAS ExampleMatrix Multiply
#include <Accelerate/Accelerate.h>
// Create and prepare datadouble *A, *B, *C;
// C <-- A * Bcblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, 100, 100, 100, 1.0, A, 100, B, 100, 0.0, C, 100);
![Page 152: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/152.jpg)
Data Types
• Single and double precision• Real and complex•Multiple data layouts
■ Dense, banded, triangular, etc.■ Transpose, conjugate transpose■ Row and column major
![Page 153: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/153.jpg)
“Playing with the Accelerate.framework today. Having a BLASt.”
–Twitter User
![Page 154: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/154.jpg)
Summary
![Page 155: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/155.jpg)
Lots of functionalityAccelerate Framework
• Image processing (vImage)•Digital signal processing (vDSP)• Transcendental math functions (vForce, vMathLib)• Linear algebra (LAPACK, BLAS)
![Page 156: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/156.jpg)
Accelerate Framework
• Easy access to a lot of functionality•Accurate• Fast with low energy usage•Works on both OS X and iOS•Optimized for all of generations of hardware
Features and benefits
![Page 157: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/157.jpg)
• Prepare your data■ Contiguous■ 16-byte aligned
•Understand problem size•Do setup once/destroy at the end
Accelerate FrameworkTo be successful
![Page 158: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/158.jpg)
![Page 159: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/159.jpg)
If You Need a Feature
![Page 160: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/160.jpg)
If You Need a FeatureRequest It
![Page 161: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/161.jpg)
“Discrete Cosine Transform was my feature request that made it into the
Accelerate Framework. I feel so special!”
–Twitter User
![Page 162: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/162.jpg)
“Thanks Apple for making the Accelerate Framework.”
–Twitter User
![Page 163: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/163.jpg)
Paul DanboldCore OS Technologies [email protected]
George WarnerDTS Sr. Support [email protected]
DocumentationvImage Programming Guidehttp://developer.apple.com/library/mac/#documentation/Performance/Conceptual/vImage/Introduction/Introduction.html
vDSP Programming Guidehttp://developer.apple.com/library/mac/#documentation/Performance/Conceptual/vDSP_Programming_Guide/Introduction/Introduction.html
Apple Developer Forumshttp://devforums.apple.com
More Information
![Page 164: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/164.jpg)
Labs
Accelerate Lab Core OS Lab AThursday 4:30PM
![Page 165: Accerelate Framwork](https://reader034.vdocument.in/reader034/viewer/2022042608/563dba11550346aa9aa266e8/html5/thumbnails/165.jpg)