the occul t / the a ppl e gpu

34
The Occult / the Apple GPU Alyssa Rosenzweig

Upload: others

Post on 09-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Occul t / the A ppl e GPU

⌘ The Occult / the Apple GPUAlyssa Rosenzweig

Page 2: The Occul t / the A ppl e GPU

⌘ Introduc�on

Page 3: The Occul t / the A ppl e GPU

⌘ The wizardsDougall JohnsonHector Mar�nSven PeterAlyssa Rosenzweig

Designed by Asahi in Canada.

Assembled in… also Canada. �

Page 4: The Occul t / the A ppl e GPU

⌘ DCP

Page 5: The Occul t / the A ppl e GPU

⌘ HardwareDiabolical Clusterpuck

…Er, wait.

Page 6: The Occul t / the A ppl e GPU

⌘ HardwareDisplay CoprocessorManages the display controllerHas its own cursed coprocessor7 megabytes of firmware

Page 7: The Occul t / the A ppl e GPU

⌘ DCP

Page 8: The Occul t / the A ppl e GPU

⌘ RTKitReal Time KitSecret real-�me opera�ng systemApple firmware (and AirPods)Shared memory and mailbox

Page 9: The Occul t / the A ppl e GPU

⌘ FirmwareObject-oriented C++Remote procedure callsUnstable ABI ⇒ maintenance nightmare

Page 10: The Occul t / the A ppl e GPU

⌘ LinuxGoofy DRM/KMS driverAtomic KMS → DCP callsHaunted by IOSurface

Page 11: The Occul t / the A ppl e GPU

⌘ Status

Page 12: The Occul t / the A ppl e GPU

⌘ AGX

Page 13: The Occul t / the A ppl e GPU

⌘ HardwareApple GraphicsTilerDual-issue, scalar instruc�on setMade for Metal

Page 14: The Occul t / the A ppl e GPU

⌘ MesaGallium3D driverNIR compiler

Page 15: The Occul t / the A ppl e GPU

⌘ Metal lacks OpenGL features

Page 16: The Occul t / the A ppl e GPU

⌘ “Fun” with AGX

Page 17: The Occul t / the A ppl e GPU

⌘ Divergence stylesMali: Branches with hardware reconvergenceAMD: Compiler manages execu�on masksApple: Count control flow nes�ng

Page 18: The Occul t / the A ppl e GPU

⌘ AGX divergence

Page 19: The Occul t / the A ppl e GPU

⌘ AGX control flow32 threads in a warpImplicit execu�on 32-bit maskNes�ng counter in r0l (0 if ac�ve)Warp-sta�c jumpsStructured if, else, do…whileControl flow sets r0l and mask

Page 20: The Occul t / the A ppl e GPU

⌘ NIR control flowIf-elseInfinite loopBreak, con�nue

Page 21: The Occul t / the A ppl e GPU

⌘ Implemen�ng if

if_icmp cond != 0 (n = 1)

...

pop_exec (n = 1)

Page 22: The Occul t / the A ppl e GPU

⌘ Implemen�ng if…else

if_icmp cond != 0 (n = 1)

...

else_icmp cond == 0 (n = 1)

...

pop_exec (n = 1)

Page 23: The Occul t / the A ppl e GPU

⌘ What about loops?

Page 24: The Occul t / the A ppl e GPU

⌘ Implemen�ng loops, take 0No way to break!

start:

...

jmp_exec_any start

Page 25: The Occul t / the A ppl e GPU

⌘ Implemen�ng loops, take 1

Break:

start:

...

do_while true (n = 1)

jmp_exec_any start

mov r0l, #1

pop_exec (n = 0)

Page 26: The Occul t / the A ppl e GPU

⌘ Implemen�ng loops, take 2Don’t clobber the execu�on mask.

push_exec (n = 1)

start:

...

do_while true (n = 1)

jmp_exec_any start

pop_exec (n = 1)

Page 27: The Occul t / the A ppl e GPU

⌘ Implemen�ng loops, take 2Implements loop { ... }Can break out of mul�ple loops at onceWhat about con�nue?

Page 28: The Occul t / the A ppl e GPU

⌘ Implemen�ng loops, take 3Quoth the Dougall:

“Con�nue is a break.”

do {

do {

...

} while(0);

} while (cond);

Page 29: The Occul t / the A ppl e GPU

⌘ Implemen�ng loops, take 3Model two nested loops in general.

push_exec (n = 2)

start:

....

do_while true (n = 2)

jmp_exec_any start

pop_exec (n = 2)

Page 30: The Occul t / the A ppl e GPU

⌘ Implemen�ng loops, take 3Break:

Con�nue:

mov r0l, #2

pop_exec (n = 0)

mov r0l, #1

pop_exec (n = 0)

Page 31: The Occul t / the A ppl e GPU

⌘ Implemen�ng loops, take 4Break:

Con�nue:

mov r0l, #(nested_if_count + 2)

pop_exec (n = 0)

mov r0l, #(nested_if_count + 1)

pop_exec (n = 0)

Page 32: The Occul t / the A ppl e GPU

⌘ Conclusion

Page 33: The Occul t / the A ppl e GPU

⌘ StatusDCP driver downstreamAGX upstream in Mesa

Passing 95% of dEQP-GLES2AGX kernel driver pending

Page 34: The Occul t / the A ppl e GPU

⌘ Thank you

Alyssa [email protected]