waqar ali, michael bechtel, heechul yun university …analyzable and practical real-time gang...

24
Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-Gang Waqar Ali, Michael Bechtel , Heechul Yun University of Kansas

Upload: others

Post on 01-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Analyzable and Practical Real-Time GangScheduling on Multicore Using RT-Gang

Waqar Ali, Michael Bechtel, Heechul Yun

University of Kansas

Page 2: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Outline

• RT-Gang

• Tutorial

• DeepPicar Case Study

2

Page 3: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Multicore Processors

• Provide high computing performance

• Needed for intelligent safety-critical real-time systems

3

Page 4: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Parallel Real-Time Tasks

• Many emerging workloads in AI, vision, robotics are parallel real-time tasks

4

Effect of parallelization on DNN control task

33% 50%

DNN based real-time control *

* M. Bojarski, "End to End Learning for Self-Driving Cars." arXiv:1604.07316, 2016

Page 5: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Effect of Co-Scheduling

• DNN control task suffers >10X slowdown

– Due to interference in shared memory hierarchy

5

0

2

4

6

8

10

12

DNN (Core 0,1) BwWrite (Core 2,3)

Norm

aliz

ed E

xeuction T

ime

SoloCorun

DRAM

LLC

Core1 Core2 Core3 Core4

DNN BwWrite

5%

10X

interference

It can be worse! (> 300X slowdown)*

* Michael G. Bechtel and Heechul Yun. “Denial-of-Service Attacks on Shared Cache in Multicore: Analysis and Prevention.” In RTAS, 2019

Page 6: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Observations

• Interference in shared memory hierarchy– Can be very high and unpredictable– Depends on the hardware (black box)

• Constructive sharing (Good)– Between threads of a single parallel task

• Destructive sharing (Bad)– Between threads of different tasks

• Goal: analyzable and efficient parallel real-time task scheduling framework for multicore– By avoiding destructive sharing

6

Page 7: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

RT-Gang

• One (parallel) real-time task---a gang---at a time– Eliminate inter-task interference by construction

• Schedule best-effort tasks during slacks w/ throttling– Improve utilization with bounded impacts on the RT tasks

7* Waqar Ali and Heechul Yun. RT-Gang: Real-Time Gang Scheduling Framework for Safety-Critical Systems. In RTAS, 2019.

Page 8: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Safe Best-Effort Task Throttling

• Throttle the best-effort core(s) if it exceeds a given bandwidth budget set by the RT task

8

1ms 2ms0

Budget

Core

activity

21

computation memory fetch

* Yun et al., “MemGuard: Memory Bandwidth Reservation System for Efficient Performance Isolation in Multi-core Platforms.” In RTAS, 2013

Basic throttling mechanism *

* W. Ali and H. Yun., “Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms.” In ECRTS, 2018

Page 9: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Implementation

• Modified Linux’s RT scheduler

– Implemented as a “feature” of SCHED_FIFO (sched/rt.c)

• Best-effort task throttling

– A separate kernel module based on BWLOCK++ *

9* W. Ali and H. Yun., “Protecting Real-Time GPU Kernels on Integrated CPU-GPU SoC Platforms.” In ECRTS, 2018

Page 10: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Outline

• RT-Gang

• Tutorial

• DeepPicar Case Study

10

Page 11: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Source Code Repository

• git clone https://github.com/CSL-KU/RT-Gang

11

Page 12: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Installation

• From the Linux kernel directory:

– patch -p1 < ../RT-Gang/rtgang-v4.19.patch

– Compile & install & restart

• To check if installed correctly:– sudo cat /sys/kernel/debug/sched_features | grep RT_GANG_LOCK

12

Page 13: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Enable/Disable RT-Gang

• RT-Gang is enabled/disabled through the kernel's scheduling feature

13

Page 14: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Best-Effort Task Throttling

• Throttling is enabled through a kernel module– cd RT-Gang/throttling/kernel_module

– make

– sudo insmod exe/bwlockmod.ko

14

Page 15: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Best-Effort Task Throttling

• Only occurs when a real-time task is running– W/o real-time task

– W/ real-time task

15

Page 16: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Outline

• RT-Gang

• Tutorial

• DeepPicar Case Study

16

Page 17: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

DeepPicar

• A low cost, small scale replication of NVIDIA’s DAVE-2

• Uses the exact same DNN

• Runs on a Raspberry Pi 3 in real-time

17* Bechtel et al. DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car. In RTCSA, 2018

https://github.com/mbechtel2/DeepPicar-v2

Page 18: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

DNN based Real-Time Control

• DNN Inferencing is the most compute intensive part.

• Parallelized by TensorFlow to utilize multiple cores.

18

Page 19: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Experiment Setup

• DNN control task of DeepPicar (real-world RT)

• IsolBench BwWrite benchmark (synthetic RT)

• Parboil benchmarks (real-world BE)

19

Task WCET (C ms)

Period (P ms)

# Threads

34 100 2

220 340 2

∞ N/A 4

∞ N/A 4DRAM

LLC

Core1 Core2 Core3 Core4

DNN BwWrite

Parboil cutcp & lbm

RT

BE

Page 20: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Execution Time Distribution

• RT-Gang achieves deterministic timing

20

What does this look like in the real world?

Page 21: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

CoSched (w/o RT-Gang)

21https://youtu.be/Jm6KSDqlqiU

Page 22: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

RT-Gang

22https://youtu.be/pk0j063cUAs

Page 23: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Conclusion

• Parallel real-time task scheduling

– Hard to analyze on COTS multicore

– Due to interference in shared memory hierarchy

• RT-Gang

– Analyzable and efficient parallel real-time gang scheduling framework, implemented in Linux

– Avoid interference by construction

• Can protect critical real-time tasks

23

https://github.com/CSL-KU/rt-gang

Page 24: Waqar Ali, Michael Bechtel, Heechul Yun University …Analyzable and Practical Real-Time Gang Scheduling on Multicore Using RT-GangWaqar Ali, Michael Bechtel, Heechul Yun University

Thank You!

Disclaimer:

This research is supported by NSF CNS 1718880, CNS 1815959, and NSA Science of Security initiative contract #H98230-18-D-0009.

24