the occul t / the a ppl e gpu
TRANSCRIPT
![Page 1: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/1.jpg)
⌘ The Occult / the Apple GPUAlyssa Rosenzweig
![Page 2: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/2.jpg)
⌘ Introduc�on
![Page 3: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/3.jpg)
⌘ The wizardsDougall JohnsonHector Mar�nSven PeterAlyssa Rosenzweig
Designed by Asahi in Canada.
Assembled in… also Canada. �
![Page 4: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/4.jpg)
⌘ DCP
![Page 5: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/5.jpg)
⌘ HardwareDiabolical Clusterpuck
…Er, wait.
![Page 6: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/6.jpg)
⌘ HardwareDisplay CoprocessorManages the display controllerHas its own cursed coprocessor7 megabytes of firmware
![Page 7: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/7.jpg)
⌘ DCP
![Page 8: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/8.jpg)
⌘ RTKitReal Time KitSecret real-�me opera�ng systemApple firmware (and AirPods)Shared memory and mailbox
![Page 9: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/9.jpg)
⌘ FirmwareObject-oriented C++Remote procedure callsUnstable ABI ⇒ maintenance nightmare
![Page 10: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/10.jpg)
⌘ LinuxGoofy DRM/KMS driverAtomic KMS → DCP callsHaunted by IOSurface
![Page 11: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/11.jpg)
⌘ Status
![Page 12: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/12.jpg)
⌘ AGX
![Page 13: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/13.jpg)
⌘ HardwareApple GraphicsTilerDual-issue, scalar instruc�on setMade for Metal
![Page 14: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/14.jpg)
⌘ MesaGallium3D driverNIR compiler
![Page 15: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/15.jpg)
⌘ Metal lacks OpenGL features
![Page 16: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/16.jpg)
⌘ “Fun” with AGX
![Page 17: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/17.jpg)
⌘ Divergence stylesMali: Branches with hardware reconvergenceAMD: Compiler manages execu�on masksApple: Count control flow nes�ng
![Page 18: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/18.jpg)
⌘ AGX divergence
![Page 19: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/19.jpg)
⌘ AGX control flow32 threads in a warpImplicit execu�on 32-bit maskNes�ng counter in r0l (0 if ac�ve)Warp-sta�c jumpsStructured if, else, do…whileControl flow sets r0l and mask
![Page 20: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/20.jpg)
⌘ NIR control flowIf-elseInfinite loopBreak, con�nue
![Page 21: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/21.jpg)
⌘ Implemen�ng if
if_icmp cond != 0 (n = 1)
...
pop_exec (n = 1)
![Page 22: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/22.jpg)
⌘ Implemen�ng if…else
if_icmp cond != 0 (n = 1)
...
else_icmp cond == 0 (n = 1)
...
pop_exec (n = 1)
![Page 23: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/23.jpg)
⌘ What about loops?
![Page 24: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/24.jpg)
⌘ Implemen�ng loops, take 0No way to break!
start:
...
jmp_exec_any start
![Page 25: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/25.jpg)
⌘ Implemen�ng loops, take 1
Break:
start:
...
do_while true (n = 1)
jmp_exec_any start
mov r0l, #1
pop_exec (n = 0)
![Page 26: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/26.jpg)
⌘ Implemen�ng loops, take 2Don’t clobber the execu�on mask.
push_exec (n = 1)
start:
...
do_while true (n = 1)
jmp_exec_any start
pop_exec (n = 1)
![Page 27: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/27.jpg)
⌘ Implemen�ng loops, take 2Implements loop { ... }Can break out of mul�ple loops at onceWhat about con�nue?
![Page 28: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/28.jpg)
⌘ Implemen�ng loops, take 3Quoth the Dougall:
“Con�nue is a break.”
do {
do {
...
} while(0);
} while (cond);
![Page 29: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/29.jpg)
⌘ Implemen�ng loops, take 3Model two nested loops in general.
push_exec (n = 2)
start:
....
do_while true (n = 2)
jmp_exec_any start
pop_exec (n = 2)
![Page 30: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/30.jpg)
⌘ Implemen�ng loops, take 3Break:
Con�nue:
mov r0l, #2
pop_exec (n = 0)
mov r0l, #1
pop_exec (n = 0)
![Page 31: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/31.jpg)
⌘ Implemen�ng loops, take 4Break:
Con�nue:
mov r0l, #(nested_if_count + 2)
pop_exec (n = 0)
mov r0l, #(nested_if_count + 1)
pop_exec (n = 0)
![Page 32: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/32.jpg)
⌘ Conclusion
![Page 33: The Occul t / the A ppl e GPU](https://reader034.vdocument.in/reader034/viewer/2022051204/6278035110aaa949d95dd4dd/html5/thumbnails/33.jpg)
⌘ StatusDCP driver downstreamAGX upstream in Mesa
Passing 95% of dEQP-GLES2AGX kernel driver pending