![Page 2: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/2.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
![Page 3: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/3.jpg)
ENGINEERS AND DEVICES
WORKING TOGETHER
![Page 5: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/5.jpg)
K (Android 4.4): Dalvik + JIT compilerL (Android 5.0): ART + AOT compilerM (Android 6.0): ART + AOT compilerN (Android 7.0): ART + JIT/AOT compilerO (Android 8.0): ART + JIT/AOT compiler + vectorization
![Page 6: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/6.jpg)
●●●●●●
![Page 7: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/7.jpg)
ENGINEERS AND DEVICES
WORKING TOGETHER
![Page 8: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/8.jpg)
A SIMD instruction performs a single operation to multiple operands in parallel
ARM: NEON Technology (128-bit)
Intel: SSE* (128-bit) AVX* (256-bit, 512-bit)
MIPS: MSA (128-bit)
All modern general-purpose CPUs support small-scale SIMD instructions (typically between 64-bit and 512-bit)
4x32-bit operations
![Page 9: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/9.jpg)
●○○○
●
○○○
![Page 10: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/10.jpg)
● Many vectorizing compilers were developed by supercomputer vendors
● Intel introduced first vectorizing compiler for SSE in 1999● Since the Android O release, the optimizing compiler of
ART has joined the family of vectorizing compilers
www.aartbik.com
![Page 11: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/11.jpg)
ENGINEERS AND DEVICES
WORKING TOGETHER
![Page 12: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/12.jpg)
for (int i = 0; i < 256; i++) { for (int i = 0; i < 256; i += 4) {
a[i] = b[i] + 1; -> a[i:i+3] = b[i:i+3] + [1,1,1,1];} }
![Page 13: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/13.jpg)
Ronny Reader
Abby AuthorWendy Writer
Perry Presenter Vinny Viewer Molly Maker Casey Creator
VectorOperation
VectorMemOpVectorBinOp
VectorAdd VectorSub VectorLoad VectorStore
….
….
has alignment
has vector lengthhas packed data type
A class hierarchy of general vector operations that is sufficiently powerful to represent SIMD operations common to all architectures
![Page 14: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/14.jpg)
t = [1,1,1,1];
for (int i = 0; i < 256; i += 4) { -> for (int i = 0; i < 256; i += 8) {
a[i:i+3] = b[i:i+3] + [1,1,1,1]; a[i :i+3] = b[i :i+3] + t;} a[i+4:i+7] = b[i+4:i+7] + t; }
![Page 15: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/15.jpg)
![Page 16: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/16.jpg)
t = [1,1,1,1];
for (int i = 0; i < 256; i += 8) { ->
a[i:i+3] = b[i:i+3] + t; a[i+4:i+7] = b[i+4:i+7] + t;}
movi v0.4s, #0x1, lsl #0
mov w3, #0xc
mov w0, #0x0
Loop: cmp w0, #0x100 (256)
b.hs Exit
add w4, w0, #0x4 (4)
add w0, w3, w0, lsl #2
add w5, w3, w4, lsl #2
ldr q1, [x2, x0]
add v1.4s, v1.4s, v0.4s
str q1, [x1, x0]
ldr q1, [x2, x5]
add v1.4s, v1.4s, v0.4s
str q1, [x1, x5]
add w0, w4, #0x4 (4)
ldrh w16, [tr] ; suspend check
cbz w16, Loop
![Page 17: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/17.jpg)
VecReplicateScalar(x)
ARM64 x86-64 MIPS64
dup v0.4s, w2 movdq xmm0, rdx fill.w w0, a2 pshufd xmm0, xmm0, 0
![Page 18: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/18.jpg)
/** * Cross-fade byte arrays x1 and x2 into byte array x_out. */private static void avg(byte[] x_out, byte[] x1, byte[] x2) { // Compute minimum length of the three byte arrays. int min = Math.min(x_out.length, Math.min(x1.length, x2.length));
// Morph with rounding halving add (unsigned). for (int i = 0; i < min; i++) { x_out[i] = (byte) (((x1[i] & 0xff) + (x2[i] & 0xff) + 1) >> 1); }}
![Page 19: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/19.jpg)
SEQUENTIAL (ARMv8 AArch64)
L:cmp w5, w0 b.hs Exit add w4, w2, #0xc (12) add w6, w3, #0xc (12) ldrsb w4, [x4, x5] ldrsb w6, [x6, x5] and w4, w4, #0xff and w6, w6, #0xff add w4, w4, w6 add w6, w1, #0xc (12) add w4, w4, #0x1 (1) asr w4, w4, #1 strb w4, [x6, x5] add w5, w5, #0x1 (1) ldrh w16, [tr] ; suspend check cbz w16, L
SIMD (ARMv8 AArch64 + NEON Technology)
L:cmp w5, w4 b.hs Exit add w16, w2, w5 ldur q0, [x16, #12] add w16, w3, w5 ldur q1, [x16, #12] urhadd v0.16b, v0.16b, v1.16b add w16, w1, w5 stur q0, [x16, #12] add w5, w5, #0x10 (16) ldrh w16, [tr] ; suspend check cbz w16, L
Runs about 10x faster!
![Page 20: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/20.jpg)
Sequential performance SIMD performance (NEON 128-bit) ≈20fps ≈60fps
![Page 21: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/21.jpg)
ENGINEERS AND DEVICES
WORKING TOGETHER
![Page 22: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/22.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Java code Autovectorization result
void mul_add(int[] a, int[] b) -{ for (int i = 0; i < 512; i++) { a[i] += a[i] * b[i]; }}
●○
●○○
![Page 23: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/23.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Java code Autovectorization result
void mul_add(int[] a, int[] b) -{ for (int i = 0; i < 512; i++) { a[i] += a[i] * b[i]; }}
L:cmp w0, #0x200b.hs Exit
add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.2s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.2s}, [x16]mul v1.2s, v0.2s, v1.2sadd v0.2s, v0.2s, v1.2sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.2s}, [x16]add w0, w0, #0x2ldrh w16, [tr]cbz w16, L
●○
●○○
●○○
●○
![Page 24: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/24.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (68% perf boost)
L:cmp w0, #0x200b.hs Exit
add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.2s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.2s}, [x16]mul v1.2s, v0.2s, v1.2sadd v0.2s, v0.2s, v1.2sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.2s}, [x16]add w0, w0, #0x2ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mul v1.4s, v0.4s, v1.4sadd v0.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●
![Page 25: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/25.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (68% perf boost)
L:cmp w0, #0x200b.hs Exit
add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.2s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.2s}, [x16]mul v1.2s, v0.2s, v1.2sadd v0.2s, v0.2s, v1.2sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.2s}, [x16]add w0, w0, #0x2ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mul v1.4s, v0.4s, v1.4sadd v0.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●
![Page 26: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/26.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (68% perf boost)
L:cmp w0, #0x200b.hs Exit
add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.2s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.2s}, [x16]mul v1.2s, v0.2s, v1.2sadd v0.2s, v0.2s, v1.2sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.2s}, [x16]add w0, w0, #0x2ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mul v1.4s, v0.4s, v1.4sadd v0.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●
●○
●○
●○
●
![Page 27: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/27.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (11% perf boost)
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mul v1.4s, v0.4s, v1.4sadd v0.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v2.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●○
![Page 28: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/28.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (11% perf boost)
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mul v1.4s, v0.4s, v1.4sadd v0.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v2.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●○
![Page 29: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/29.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (11% perf boost)
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mul v1.4s, v0.4s, v1.4sadd v0.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v2.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●○
●○○○
![Page 30: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/30.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (23% perf boost)
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v2.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit add w16, w1, w0, lsl #2ldur q0, [x16, #12]
add w16, w2, w0, lsl #2ldur q1, [x16, #12]
mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, w0, lsl #2stur q2, [x16, #12]
add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●○
●○
●○○○○
![Page 31: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/31.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (23% perf boost)
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v2.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit add w16, w1, w0, lsl #2ldur q0, [x16, #12]
add w16, w2, w0, lsl #2ldur q1, [x16, #12]
mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, w0, lsl #2stur q2, [x16, #12]
add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●○
●○
●○○○○
![Page 32: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/32.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (23% perf boost)
L:cmp w0, #0x200b.hs Exit add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.4s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.4s}, [x16]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v2.4s}, [x16]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit add w16, w1, w0, lsl #2ldur q0, [x16, #12]
add w16, w2, w0, lsl #2ldur q1, [x16, #12]
mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, w0, lsl #2stur q2, [x16, #12]
add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●○
●○
●○○○○
●○○
●
![Page 33: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/33.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (10% perf boost)
L:cmp w0, #0x200b.hs Exit add w16, w1, w0, lsl #2ldur q0, [x16, #12]add w16, w2, w0, lsl #2ldur q1, [x16, #12]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, w0, lsl #2stur q2, [x16, #12]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
mov w3, #0xc
L:cmp w0, #0x200b.hs Exit add w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]
mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]
add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●○
●○○
●
![Page 34: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/34.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (10% perf boost)
L:cmp w0, #0x200b.hs Exit add w16, w1, w0, lsl #2ldur q0, [x16, #12]add w16, w2, w0, lsl #2ldur q1, [x16, #12]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, w0, lsl #2stur q2, [x16, #12]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
mov w3, #0xc
L:cmp w0, #0x200b.hs Exit add w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]
mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]
add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●○
●○○
●
![Page 35: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/35.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (10% perf boost)
L:cmp w0, #0x200b.hs Exit add w16, w1, w0, lsl #2ldur q0, [x16, #12]add w16, w2, w0, lsl #2ldur q1, [x16, #12]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sadd w16, w1, w0, lsl #2stur q2, [x16, #12]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
mov w3, #0xc
L:cmp w0, #0x200b.hs Exit add w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]
mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]
add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●○
●○○
●●
![Page 36: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/36.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
![Page 37: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/37.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (2.5% perf boost)
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4 add w4, w3, w0, lsl #2 ldr q0, [x1, x4] ldr q1, [x2, x4] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x4] add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●
![Page 38: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/38.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (2.5% perf boost)
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4 add w4, w3, w0, lsl #2 ldr q0, [x1, x4] ldr q1, [x2, x4] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x4] add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●
![Page 39: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/39.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (2.5% perf boost)
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4 add w4, w3, w0, lsl #2 ldr q0, [x1, x4] ldr q1, [x2, x4] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x4] add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
●●
○○
![Page 40: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/40.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (12% perf boost)
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4 add w4, w3, w0, lsl #2 ldr q0, [x1, x4] ldr q1, [x2, x4] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x4] add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exitadd w4, w0, #0x4add w0, w3, w0, lsl #2 add w5, w3, w4, lsl #2ldr q0, [x1, x0]ldr q1, [x2, x0]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x0] ldr q0, [x1, x5] ldr q1, [x2, x5] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x5] add w0, w4, #0x4ldrh w16, [tr]cbz w16, L
●
![Page 41: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/41.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (12% perf boost)
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4 add w4, w3, w0, lsl #2 ldr q0, [x1, x4] ldr q1, [x2, x4] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x4] add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exitadd w4, w0, #0x4add w0, w3, w0, lsl #2 add w5, w3, w4, lsl #2ldr q0, [x1, x0]ldr q1, [x2, x0]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x0] ldr q0, [x1, x5] ldr q1, [x2, x5] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x5] add w0, w4, #0x4ldrh w16, [tr]cbz w16, L
●
![Page 42: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/42.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (12% perf boost)
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4 add w4, w3, w0, lsl #2 ldr q0, [x1, x4] ldr q1, [x2, x4] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x4] add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exitadd w4, w0, #0x4add w0, w3, w0, lsl #2 add w5, w3, w4, lsl #2ldr q0, [x1, x0]ldr q1, [x2, x0]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x0] ldr q0, [x1, x5] ldr q1, [x2, x5] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x5] add w0, w4, #0x4ldrh w16, [tr]cbz w16, L
●
●●
○
●○○
●○
![Page 43: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/43.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
Before After (12% perf boost)
L:cmp w0, #0x200b.hs Exitadd w4, w3, w0, lsl #2ldr q0, [x1, x4]ldr q1, [x2, x4]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x4]add w0, w0, #0x4 add w4, w3, w0, lsl #2 ldr q0, [x1, x4] ldr q1, [x2, x4] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x4] add w0, w0, #0x4ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exitadd w4, w0, #0x4add w0, w3, w0, lsl #2 add w5, w3, w4, lsl #2ldr q0, [x1, x0]ldr q1, [x2, x0]mov v2.16b, v0.16bmla v2.4s, v0.4s, v1.4sstr q2, [x1, x0] ldr q0, [x1, x5] ldr q1, [x2, x5] mov v2.16b, v0.16b mla v2.4s, v0.4s, v1.4s str q2, [x1, x5] add w0, w4, #0x4ldrh w16, [tr]cbz w16, L
●
●●
○
●○○
●○
●
![Page 44: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/44.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
for (int i = 0; i < LENGTH; i++) { c[i] = (byte)(a[i] + b[i]);}
i87 Add [i80,i79]i102 IntermediateAddressIndex [i87,i98,i3]i99 IntermediateAddressIndex [i80,i98,i3]d89 VecLoad [l35,i102]d84 VecLoad [l35,i99]d83 VecLoad [l29,i99]d88 VecLoad [l29,i102]d85 VecAdd [d83,d84]d90 VecAdd [d88,d89]d86 VecStore [l27,i99,d85]d91 VecStore [l27,i102,d90]i92 Add [i87,i79]v78 Goto
●
○
○
●
![Page 45: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/45.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
(gdb) x/64u 0xefc0b0000xefc0b000: 0 28 192 18 0 0 0 00xefc0b008: 0 0 4 0 100 101 102 1030xefc0b010: 104 105 106 107 108 109 110 1110xefc0b018: 112 113 114 115 116 117 118 1190xefc0b020: 120 121 122 123 124 125 126 1270xefc0b028: 128 129 130 131 132 133 134 1350xefc0b030: 136 137 138 139 140 141 142 1430xefc0b038: 144 145 146 147 148 149 150 151
Java Code static final int LENGTH = 1024 * 256; // 256K elements, 0x40000static byte [] a = new byte[LENGTH];static byte [] b = new byte[LENGTH];static byte [] c = new byte[LENGTH];
Object Header
data[0]
![Page 46: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/46.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
(gdb) x/64u 0xefc0b0000xefc0b000: 0 28 192 18 0 0 0 00xefc0b008: 0 0 4 0 100 101 102 1030xefc0b010: 104 105 106 107 108 109 110 1110xefc0b018: 112 113 114 115 116 117 118 1190xefc0b020: 120 121 122 123 124 125 126 1270xefc0b028: 128 129 130 131 132 133 134 1350xefc0b030: 136 137 138 139 140 141 142 1430xefc0b038: 144 145 146 147 148 149 150 151
One VecLoad / VecStore
Java Code static final int LENGTH = 1024 * 256; // 256K elements, 0x40000static byte [] a = new byte[LENGTH];static byte [] b = new byte[LENGTH];static byte [] c = new byte[LENGTH];
Object Header
![Page 47: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/47.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●○
●○○○
●
0xefc0b000: 0 28 192 18 0 0 0 0
0xefc0b008: 0 0 4 0 100 101 102 103
0xefc0b010: 104 105 106 107 108 109 110 111
0xefc0b018: 112 113 114 115 116 117 118 119
0xefc0b020: 120 121 122 123 124 125 126 127
0xefc0b028: 128 129 130 131 132 133 134 135
0xefc0b030: 136 137 138 139 140 141 142 143
0xefc0b038: 144 145 146 147 148 149 150 151
SIMD from here->
Avoid SIMD from here
![Page 48: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/48.jpg)
ENGINEERS AND DEVICES
WORKING TOGETHER
![Page 49: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/49.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●○
●●
○○
![Page 50: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/50.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●○○
●●●●
○
![Page 51: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/51.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●●
○○○○○○○
●○○○
Analyzable and flexible CHECKED!
Embeddable CHECKED!
Stable and reproducible CHECKED!
Recognized CHECKED!
![Page 52: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/52.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●●
○○○
●○○○
![Page 53: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/53.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●
![Page 54: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/54.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●
![Page 55: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/55.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●
![Page 56: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/56.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●
![Page 57: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/57.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●○
●○
●○
●○ LDR q1, [x16] + LDR q2, [x16, #16] -> LDP q1, q2, [x16]
●○
![Page 59: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/59.jpg)
ENGINEERS AND DEVICESWORKING TOGETHER
●●
○●
○○
![Page 60: WORKING TOGETHER - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/sfo17/Presentations... · 2017-10-09 · ENGINEERS AND DEVICES WORKING TOGETHER Before After (68% perf boost)](https://reader034.vdocument.in/reader034/viewer/2022050409/5f86a4e63d69910a7157a09d/html5/thumbnails/60.jpg)
Java Scalar version Initial SIMD Version
void mul_add(int[] a, int[] b, int[] c) -{ for (int i=0; i<512; i++) { a[i] += a[i] * b[i]; }}
L:cmp w0, #0x200b.hs Exit
add w4, w1, #0xcldr w6, [x4, x0, lsl #2]add w5, w2, #0xcldr w5, [x5, x0, lsl #2]madd w5, w6, w5, w6str w5, [x4, x0, lsl #2]add w0, w0, #0x1ldrh w16, [tr]cbz w16, L
L:cmp w0, #0x200b.hs Exit
add w16, w1, #0xcadd x16, x16, x0, lsl #2ld1 {v0.2s}, [x16]add w16, w2, #0xcadd x16, x16, x0, lsl #2ld1 {v1.2s}, [x16]mul v1.2s, v0.2s, v1.2sadd v0.2s, v0.2s, v1.2sadd w16, w1, #0xcadd x16, x16, x0, lsl #2st1 {v0.2s}, [x16]add w0, w0, #0x2ldrh w16, [tr]cbz w16, L