an rtos for embedded systems in rust - uva · 2020-05-21 · chapter 1 introduction rust is a...

Bachelor Informatica

An RTOS for embedded sys-tems in Rust

Wicher Heldring

June 8, 2018

Supervisor: Drs. T.R. Walstra

Signed:

Informatica—

Universiteit

vanAmst

erdam

Abstract

Rust is a programming language that offers safety and reliability without sacrificing run-time performance. Rust can be used as a language to build operating systems. Examplesof operating systems built in Rust include Tock and Redox. However, Rust has not yetbeen tested for its real-time qualities. This thesis will investigate the hypothesis that itis possible to write an real-time operating system (RTOS) in Rust for embedded systems.The hypothesis is tested by measuring and comparing the performance of a Rust and Cscheduler. Results show that Rust scheduler exhibits similar real-time characteristics to theoriginal C scheduler. These results make it highly probable that it is possible to build anRTOS in Rust with competitive performance to C or C++ RTOSes. Further research isrequired to verify that writing an RTOS in Rust is doable.

3

Contents

1 Introduction 71.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Theoretical background 92.1 What is a RTOS? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Causes of indeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Why use Rust instead of C or C++ . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2.1 Undefined behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.2 Affine type system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.3 Rust without standard library . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3 Scheduling methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.1 Cooperative scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.2 Preemptive scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3.3 Round-robin scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.4 Rate monotonic scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3.5 Earliest deadline first scheduling . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Mutexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Isolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.6 Multi-threading in an RTOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Environment and method 173.1 ChibiOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Thread allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.2 Context switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.1.3 ChibiOS scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 STM32F3DISCOVERY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 ARM Toolchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Experiments and results 214.1 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5 Discussion and conclusion 25

5

CHAPTER 1

Introduction

Rust is a programming language for developing programs which are reliable and efficient [1]. Ithas a safe type system and a safe memory model. Decades of research to make operating systemssafer have had mixed success. Two third of the bugs as mentioned in Common Vulnerabilitiesand Exposures in 2017 in the Linux kernel can be attributed to using an unsafe language [2].Most safe languages are not practical to use within an operating system kernel, because theselanguages do not offer control when a program runs and how a program uses its memory. Thisrequirement of where and when the code runs is how Rust differs. Rust offers safety withoutsacrificing control over time and memory. Rust achieves this control without sacrificing safetyby moving the checks to compile time. Currently, there are a few operating system kernels thatare built with Rust such as Tock [3] or Redox [4]. These OSes prove that it is possible to developan OS in Rust.

An RTOS (real-time OS) contains a kernel which enables programs to execute within a deter-ministic time-frame. An RTOS ensures that tasks never miss their deadline. Tasks for an RTOSare developed to finish their work within this specific time-frame. If a program is running afterits predefined deadline, the RTOS has failed [5]. RTOSes are used for tasks where any delayscould cause failure. Examples of RTOSes usage include aerospace, aviation, life-support equip-ment, nuclear power systems and milling machines. Currently, most RTOSes are written in Cor C++ because in an RTOS deterministic performance is the primary requirement. Languagesthat do not allow to specify when or what is running cannot be used for an RTOS since it wouldbe impossible to ensure that deadlines are never missed. An example of a standard languagefeature that breaks this requirement is a garbage collector. A garbage collector runs next to theprogram, and on a non multi-threaded system it has to run on the same core to keep memoryusage down. This garbage collector interferes with execution.

This thesis will investigate the hypothesis that it is possible to write an RTOS in Rust forembedded systems. An existing C RTOS is compared to the same RTOS where the scheduleris replaced with a Rust scheduler. The hypothesis is tested by measuring and comparing theperformance of the Rust and C scheduler.

1.1 Related work

Tock and Redox are OSes built in Rust. Redox is a Unix-like operating system that takesa microkernel approach. Redox is designed to be a general-purpose OS with a focus on safety,freedom, reliability, correctness, and pragmatism [4]. It is developed to run most Linux programswith minimal changes. If there is a trade-off to be made between correctness and compatibilitywith Linux, Redox chooses correctness over compatibility.

Tock is an embedded operating system designed to run concurrent and distrustful applicationson low-power and low-memory microcontrollers. Tock makes a distinction between the corekernel, capsules, and processes. The core kernel exists of a hardware abstraction layer (HAL),

7

a scheduler and some platform-specific configuration [3]. Capsules do not run in privilegedhardware mode but use the safety of Rust for isolation. Capsules can access each other andthe kernel by calling exposed functions and accessing exposed members. Using Rust as a safetymeasure allows capsules to have no memory or run-time overhead. The downside of this systemis, that if a capsule hangs, it brings the entire system down, since capsules are cooperativelyscheduled. Additionally, capsules cannot dynamically allocate memory. Dynamic allocationcould cause a faulty capsule to starve the entire system of resources.

Tock processes are isolated from each other using a hardware memory protection unit. Pro-cesses are preemptively scheduled in a way that faulty processes can never bring the system down.The downside is, that processes incur a memory and run-time overhead. Processes in Tock allowfor dynamic memory allocation. All capsules can request a certain amount of memory to bereserved for every process. Processes are not allowed to access this extra memory and can onlyaccess their memory and stack space.

8

CHAPTER 2

Theoretical background

2.1 What is a RTOS?

A real-time operating system (RTOS) is an operating system that has to run in real-time. Thedefinition of real-time is better explained by using the word deterministic. Usually, operatingsystems try to optimize for executing as much work as possible. A real-time operating systemensures tasks are always finished before a deadline, even if it has to sacrifice overall performance.

An RTOS has two primary requirements [6]. Logical correctness: the RTOS produces correctoutputs, and temporal correctness: the RTOS produces the output at the correct time. RTOSesshow up in high-risk scenarios. Failure of one of these requirements could have catastrophicconsequences in some applications. A typical RTOS is hard real-time. Missing the deadline ofa task, not being temporal correct, is a failure. Soft real-time OSes allow occasional deadlinemisses. Examples of soft real-time applications include telephone switches and video games.Missing a deadline on these systems does not have catastrophic consequences [5]. In a soft real-time system deadlines should be met most of the time. For these systems, there is no precisedefinition of what most of the time means.

2.1.1 Causes of indeterminism

To have a deterministic system every single layer in a system has to be deterministic. If onelayer is not deterministic every layer that depends on it will be non-deterministic as well. Thisis why in RTOSes it is of critical importance to be aware of all abstractions. Not only thesoftware abstractions are essential to take into account, but also the hardware must be capableof deterministic performance up to a certain degree. If, for instance, the cooling is not goodenough, this could affect the deterministic characteristics of a system because of overheating,which causes loss of timely performance. Another hardware cause of indeterminism is an energy-saving processor: a processor that starts to underclock itself as soon as it has nothing to do. Assoon as the critical code has to execute after a small amount of idle time, the processor has toreact that there is work to be done and will take some time to get up to speed. This temporaryloss of performance could cause an RTOS to fail its task by missing its deadline.

Not only hardware can cause RTOSes to fail their deadlines, software also has to take intoaccount that it has to run deterministically. One of the problems in writing software in an RTOS,is that memory allocation is not deterministic. It is preferred to allocate memory statically.However, it is not always possible to allocate everything statically. Depending on the choice ofan algorithm, the response time to a dynamic memory allocation can be high or even unbounded[7]. FreeRTOS solves this by supplying a set of different memory allocators which possess differentresponse times and characteristics. The memory allocator can be specified at the start of theprogram to get optimal performance.

9

2.2 Why use Rust instead of C or C++

2.2.1 Undefined behavior

One of the most common problems is that in C and C++ code is undefined behavior. Undefinedbehavior has the advantage that code can be optimized more easily. The compiler can assumethe most efficient path in case of undefined behavior. For instance, as soon as two numbers areadded together in C, they can overflow. In C, overflow is undefined on signed integers. In thefollowing code, the output differs when optimizations are enabled.

#inc lude <s t d i o . h>

i n t main ( i n t argc , char ∗argv [ ] ) {i n t a = 0 ;

// ’ a ’ i s s e t dynamical ly in a way that compi ler cannot opt imize// the code awayi f ( argc == 1) {

// This i s the maximum s igned i n t e g e r numbera = 2147483647;

}

p r i n t f (” a = %i \n” , a ) ;i f ( a + a < 0) {

p r i n t f (” Overflow !\n ” ) ;} e l s e {

p r i n t f (”No over f l ow \n ” ) ;}

}

// Output without op t im i za t i on s :a = 2147483647Overflow !

// Output with opt im i za t i on s (O3)a = 2147483647No over f l ow !

This optimization is allowed, because overflow of signed integers in C is undefined behavior. Thecompiler can reason that ’a’ is either 0 or 2147483647. The compiler can reason that a + a mustalways be positive since adding two numbers larger than 0 always yields a number larger than 0,since overflow is undefined. Thus the a + a can never be smaller than 0, and the overflow casecan never happen. Dead code elimination will cause the overflow branch to be compiled awayuntil it just leaves the statement ’printf (”No overflow”);’.

Rust defines signed integer overflow. The same program in Rust cannot compile the overflowcase away, because overflow is defined to be the 2-complement wraparound overflow in releasemode. In debug mode, Rust will automatically add checks that validate if integers overflow. Inthe safe subset of the language of Rust, there is no undefined behavior. The safe subset of Rustis restricted to support this. So additionally Rust has an unsafe subset. Unsafe code is codethat can cause undefined behavior but allows for programs that can bypass certain restrictions.These bypasses include dereferencing pointers, since these could point to invalid memory or callC functions.

To write a program in C that does not invoke undefined behavior is hard. Rust programscan have undefined behavior, but the scope of where this undefined behavior occurs is limited tospecific blocks of code which are marked unsafe. This guarantee ensures, that as long as theseunsafe blocks of code are verified to cause no undefined behavior, there is no undefined behaviorin the program. No undefined behavior does not guarantee no programs errors, but it will ensure

10

that the program will not crash by unexpected behavior during run-time.

2.2.2 Affine type system

A distinct feature of Rust is the way the language implements references. References can beused and dereferenced in the safe subsection of Rust code. To guarantee that references neverpoint to invalid memory, Rust has a concept called ownership. Ownership ensures that anyreferenced object will exist as long as there is a reference pointing to it without a garbagecollector. This guarantee is validated at compile time [8]. Without any run-time overhead, anysafe Rust program can never have invalid memory accesses.

In Rust, there are two kinds of references, mutable and shared references. Any variable canhave any number of shared references or one mutable reference. Rust will track the lifetime ofreferences at compile time. The lifetime of a reference must be shorter than the lifetime of theoriginal referenced variable. For instance, the following code which is valid in C or C++ codewill fail to compile in Rust.

fn s ink ( s : S t r ing ) {p r i n t l n ! ( ” Received : {}” , s ) ;

}

fn main ( ) {l e t a = St r ing : : from (” He l lo ! ” ) ;l e t b = &a ;

// A ge t s used here which w i l l i n v a l i d a t e the r e f e r e n c e to bs ink ( a ) ;

// b po in t s to i n v a l i d memory herep r i n t l n ! ( ”{}” , b ) ;

}

// Compiler g i v e s the f o l l o w i n g compi le e r r o r :// e r r o r [ E0505 ] : cannot move out o f ‘ a ‘ because i t i s borrowed

As soon as the function sink is called with ’a’, the original ’a’ does not exist anymore, sincewhere ’a’ resides is at a different point in memory. The reference ’b’ is now pointing to a non-existent memory location which is invalid. Rust tracks the lifetime of ’a’ and ’b’ and fails tocompile because of this.

In the previous code example ’a’ does not exist anymore as soon as the function sink iscalled. This is because Rust has an affine type system. An affine type means that any variablecan be used at most once. Note that referencing a type is not using it. Using it is similar tomoving in C++. While C++ leaves the original variable in a valid but unspecified state, inRust the original variable gets destroyed. Trying to do anything with the variable after it hasbeen destroyed, results in a compile-time error. In the previous example sink takes a stringitself, instead of a reference to a string. This causes the original string to be destroyed. Anotherexample of this affine type system occurs in the FnOnce trait.

pub t r a i t FnOnce<Args> {type Output ;fn c a l l o n c e ( s e l f , a rgs : Args ) −> S e l f : : Output ;

}

Instead of taking a reference to ’self’, FnOnce takes ’self’ itself. After executing the function”call once” the variable on which this is executed, is now used and cannot be used again.

These rules allow Rust to guarantee memory safety without sacrificing performance. Storingreferences and ensuring all references go out of scope before destroying the object itself mightbe difficult. To work around this problem, it is possible to use handles instead of references. Forinstance, if there is a big chunk of text and a reference needs to be stored to a part of the text,

11

instead of saving the reference to the exact point in the text, it is better to store an offset thatpoints to the text. When this offset needs to be transformed to a reference again, it is possibleto recalculate the offset and the text as input. Using handles allows the original text to movebetween owners.

2.2.3 Rust without standard library

Since Rust is competing as a system language, Rust has the option to compile without itsstandard library. When compiling Rust, it will automatically link to a standard library thatuses features of the underlying kernel like Networking, Threading or File I/O. Supplying a nostd (standard) flag to the compiler will prevent the program from linking to the default standardlibrary. This flag stops the code from executing system calls that might not exist. This flagallows Rust to run on platforms where there is no OS or where the OS is not supported.

2.3 Scheduling methods

There are multiple ways to measure how RTOS schedulers perform. The most obvious is howlong it takes till a task executes after its arrival. Another way to measure is stability undertransient overload. Transient overload happens when a system has so many tasks that needto be processed at the same time, that it is impossible to meet all deadlines. A characteristicis the maximum utilization where the scheduler will never fail deadlines. Also, there are somequalitative characteristics such as how a scheduler deals with a faulty program or whether ascheduler can be configured to allow different priorities for tasks.

When analyzing the performance of a scheduler, the cost of a context switch needs to betaken into account as well. A context switch can take a non-negligible amount of time. If asorted list has to be kept, the performance is at best O(n log(n)). Depending on the amountof time it takes to switch between tasks, sometimes it might be more optimal to finish a lowpriority task over immediately starting the highest prioritized task.

2.3.1 Cooperative scheduling

Cooperative scheduling is a method of scheduling where threads voluntary yield control to thekernel, so that the kernel can schedule another task. A switch of tasks only happens, if theoriginal task yields control back to the kernel. Yielding control can be caused, for example, whenthe task requests a lock on a mutex that is already locked, or when the task starts sleeping.

A downside of cooperative scheduling is that sometimes threads are executing a lengthycalculation. The time before the process yields can grow so large that a higher priority taskmisses its deadline.

2.3.2 Preemptive scheduling

Preemptive scheduling is a method where the scheduler interrupts a running task. Instead ofwaiting for a task to voluntary yield back control to the scheduler, an interrupt is called thatgives control to the scheduler, which can decide whether to continue running the existing task orrun another task. Preemptive scheduling uses cooperative scheduling in case a task voluntarilyyields control back to the kernel.

There are multiple ways how to decide what task needs to run when an interrupt triggers. Atypical way is to resume the highest priority thread. The priority of a thread is assigned whenthe thread is created. During execution, the scheduler will always execute the highest priorityactive thread. This priority assignment is called Fixed Priority Preemptive Scheduling (FPPS).

A downside of FPPS is the amount of context switches. Reducing the amount of contextswitches could alleviate this problem. A possible solution is using a slack stealing algorithm [9].A slack stealing algorithm allows a lower priority task to continue to run if the higher priorityhas enough slack. The deadline is far enough away that it can run after the lower priority taskfinishes.

12

2.3.3 Round-robin scheduling

Round robin scheduling distributes time-slices in a fair way to all tasks. A time-slice is the periodwhere a task can run between two interrupts. A round robin scheduler works like a preemptivescheduler except when a task exceeds a set amount of time, then an interrupt will fire. Thisinterrupt will switch the active task for a new task of equal priority. If there is no task of equalpriority, the task will continue executing. Round robin has a small extra run-time overhead, butmultiple high priority tasks cannot starve each other.

Round-robin scheduling has the same downside as preemptive scheduling, namely that assoon as CPU utilization becomes high, it will miss deadlines of lower priority tasks.

2.3.4 Rate monotonic scheduling

When transient overload occurs, a scheduling algorithm will prefer to fail lower priority tasks overhigh priority tasks. However, the problem is, that if the scheduler only focuses on the highestpriority tasks, lower priority tasks are likely to fail their deadline during high CPU usage. Asolution to this problem is rate-monotonic scheduling [10].

Rate-monotonic scheduling schedules the task with the highest priority first. The priority isdefined as the inverse of the duration of the task. So a task that takes less time has a higherpriority. A task that takes a long time will be preempted by any task that takes less time.

A rate-monotonic scheduler is guaranteed to always meet all deadlines for all possible taskstart times if the condition holds that

n∑i=1

Ci

Ti≤ n(21/n − 1) (2.1)

where Ci is the execution time of a task and Ti is the period of the task [11]. The right-hand sideof the equation starts off at 1 for n = 1 and converges quickly to around 0.69. For large n theamount of utilization can never exceed 70%, or the deadlines are not guaranteed to be met. Theleft-hand side of the equation calculates the total utilization of the CPU. Note that the deadlinesin rate-monotonic scheduling are set to when the same task is rescheduled again.

This 70% is the worst case. The 70% worst case bound will usually never happen duringtypical operation of an RTOS. A more realistic bound is over 90% [10].

2.3.5 Earliest deadline first scheduling

Earliest deadline first (EDF) scheduling schedules the task that has the closest upcoming dead-line. EDF is a perfect scheduling algorithm in the sense that EDF will always be able to scheduleall tasks if and only if the total CPU utilization is less than 100

n∑i=1

Ci

Ti≤ 1 (2.2)

If the total CPU utilization is larger than 1 no algorithm can meet all deadlines. EDF schedulingdoes not see much utilization in commercial RTOSes. The reason for this is that EDF schedulinghas a few major downsides [12].

• It is less predictable. When a task is running, and a task with an earlier deadline preemptsit, the task will be deferred until later.

• It is less controllable since there is no way to change the priority of a task.

• There is more scheduling overhead. In EDF scheduling, the scheduler needs to keep trackof the next running thread which at best performs in O(log(n)) fashion. Other schedulingcan run in O(1).

• It has the domino effect.

13

A domino effect occurs when an EDF scheduler is running near 100% CPU utilization. Whena deadline is missed, this deadline will affect other tasks which will also start to miss deadlines.Other algorithms might prioritize higher priority tasks, so that at least some tasks meet thedeadline while lower priority tasks do not meet the deadline.

2.4 Mutexes

Sharing variables between multiple threads can cause problems because data races can occur. Adata race happens when for instance two threads share a counter. Both threads will incrementthe counter every five milliseconds. Incrementing a counter consists of three steps. The first stepis to load the existing value of the counter. The second step is to increment the value. The finalstep is to write the resulting value back into memory. If the second thread preempts the firsttask thread while it is in the middle, it will load the same value as the original thread and thuswrite the same result to the memory. Although the counter should have been incremented twice,it is only incremented once. This effect is called a data race. To solve this problem OSes have aprimitive called a mutex (mutual exclusion). A mutex ensures, that when a thread locks it, noother thread can lock it at the same time. Implementing a mutex brings about a new problemcalled priority inversion.

For priority inversion to occur, there must be at least three threads, A, B and C. Thesethreads have descending priority. If thread A and C share a mutex and this mutex gets lockedby thread C, A can preempt thread C and start working since it has higher priority. As soon asit then tries to lock the mutex, it cannot do so, because C currently holds this mutex. Withoutpriority inversion thread A will wait until the mutex gets unlocked. Thread B will now run sinceit has a higher priority than C. This causes thread B to run over A, since running C now willallow thread A to continue work quicker. This event is called priority inversion because a threadwith higher priority is waiting for a thread with lower priority [5].

To solve the problem of priority inversion, operating systems employ a strategy called priorityinheritance. When thread A starts waiting for thread C, thread C will inherit the priority oftask A. Instead of B running, C will run since it has a higher priority. This strategy solves theproblem of priority inversion.

2.5 Isolation

In most RTOSes isolation is limited. The overhead of adding isolation is too significant formost RTOSes. Some RTOSes such as Xenomai allow for running real-time applications in userspace. Other RTOSes like ChibiOS provide a variety of safety options to help development anddebugging.

Isolation helps for debugging and catching a range of errors that could cause complete systemfailure, since non-isolated code with a bug can cause writes to arbitrary memory. This can causesegmentation faults or undefined behavior. Isolation would prevent these errors from causingundefined behavior and allows the RTOS to continue while stopping a single thread or notify thedeveloper of the error.

Isolation is provided by some RTOSes using the integrated memory management unit (MMU)or memory protection unit (MPU). An MPU protects memory during run-time by specifyingmemory regions that can be accessed during execution of a part of the RTOS. Any memoryaccess outside the specified bounds will cause an interrupt. Using the MPU has some overhead,because the RTOS not only has to save the registers, but it also has to save the memory protectionstatus. During a context switch, it needs to switch registers as well as update the MPU in orderto protect the right memory.

An MMU virtualizes the entire address space in a way that a running process cannot accessany memory outside of its process. An MMU is more complicated than an MPU, so smaller

14

Figure 2.1: An example of priority inversion. Task A, the highest priority task, misses itsdeadline, because it is waiting for a lower priority task C, that is waiting for task B with apriority between task A and C.

Figure 2.2: An example of priority inheritance. Task A, the highest priority task, makes itsdeadline because the lower priority task C inherits task A’s priority.

15

processors tend to have an MPU, whereas fast processors tend to have an MMU. An MMU alsohas additional overhead over an MPU [13].

2.6 Multi-threading in an RTOS

Multi-threading in an RTOS introduces new problems during the development of an RTOS. De-pending on the processor, it could either have symmetric multiprocessing (SMP) or asymmetricmultiprocessing (AMP). The difference is that an SMP system shares memory between all pro-cessors while an AMP system runs a separate instance of the RTOS for every specific core. Theseseparate processors have their memory and interrupt which are not shared between cores. Anexternal system has to be designed to facilitate communication between the cores.

During the implementation of an SMP RTOS, a few problems need to be considered. AnSMP system has the possibility of running into cache trashing [14]. Cache trashing happenswhen a thread switches between cores and it has to reload the internal cache for the new core. Asolution to this problem is binding the threads to specific cores. This solution stops the problemfrom occurring, since now the process will not switch cores. Another problem is that the threadshave to be appropriately distributed over the cores. A load-balancer can be developed to supportmoving the processes between cores, as soon as utilization on a core is high when there is anothercore with less utilization at the same time.

The main advantage of an AMP system over an SMP system is that an AMP system canrun in a heterogeneous environment. The cores do not have to run at the same speed. AnAMP system needs a system to communicate events between cores. This can lead to scalabilityissues as soon as there are more than two cores, since any event needs to be communicated usingthis system, instead of using system primitives that exist in an SMP environment. For legacysystems, an AMP system might be preferred as well, because this guarantees that the applicationwill run like it would run in a single threaded system.

16

CHAPTER 3

Environment and method

3.1 ChibiOS

ChibiOS is a small RTOS optimized for performance and code size. It supports most commonRTOS features such as preemptive scheduling, semaphores, mutexes as well as messages queues.Additionally, it has a hardware abstraction layer which supports a multitude of different inter-faces. It is a GPLv3-licensed project and is free to use for teaching and hobby projects. Forcommercial usage, it is necessary to buy a license.

3.1.1 Thread allocation

There are two kinds of possible threads in ChibiOS. Static ones and dynamic ones. For staticthreads, the amount of space needed for the stack must be specified at compile time. Therecan only be one thread running in this static space at the same time. For dynamic threads, theneeded space for the thread data and thread stack is allocated on the heap.

An allocated thread consists of thread data and stack space. The thread data includes whatstate the thread is in, saved registers and also the thread data for the Rust scheduler. When thethread is not running, it saves the register state into this structure. Next, it loads a different set ofregisters from another thread. When a static or dynamic thread is allocated, an amount of stackspace has to be specified. In case the thread uses more stack space than specified, the thread willoverwrite the next allocated thread. ChibiOS has support for stack protection. Stack protectionworks by adding a few bytes at the end of the stack space. At every context switch, ChibiOSwill check whether the values of these bytes are identical to how they are initialized initially. Incase they are not identical ChibiOS will stop the kernel reporting the error. Using this check hasa performance overhead. It is better only to use this protection during development.

3.1.2 Context switch

Whenever a thread is running, there are two ways to switch between threads, preemption andcooperative scheduling. Preemption occurs when an interrupt by a hardware subsystem such astimer happens. When this interrupt fires, it is possible to save the current state of the stack andswitch to another thread. Preemption means that all pieces of code can execute with possiblebreaks in the middle. During critical sections of the code, the kernel disables the interrupts. InChibiOS all scheduling procedures are critical. Other examples where interrupts are disabled,are during the locking and unlocking of mutexes.

Another way for a context switch to happen is, when a thread uses a system call that causesit to sleep. For instance when a thread tries to lock a locked mutex or when the thread is waitingfor a result from a hardware subsystem.

A context switch saves the current state of the registers in the thread allocated storage. Theregisters of the new thread are loaded and overwrite the current registers. For the thread itself,the context switch is entirely transparent. The original thread is resumed where the context

17

switch happened. In case a context switch happens because a thread got preempted, the onlyway to notice that there was a context switch is by measuring the time between statements, orto notice that the environment changed. A shared variable could now hold a different value.

3.1.3 ChibiOS scheduler

ChibiOS has a preemptive scheduler which yields processes to a higher priority task when ahigher priority task becomes available, or when the current process yields control to the kernel.Additionally, it supports an optional round-robin extension that will cause the processor to switchbetween threads of the same priority every n milliseconds. During the implementation of theRTOS scheduler, the preemptive scheduler is replaced with a Rust version.

The original ChibiOS implementation of the scheduler works by keeping a ready list of allactive threads sorted by priority. When a tasks is added, the task will iterate through thepriority list using a linear search, until it finds the first thread where the priority is higher thanthe next in the ready list. It will insert the thread at that point of the priority list. To keep themeasurements fair, the Rust scheduler will use the same strategy.

3.2 STM32F3DISCOVERY

The STM32F3DISCOVERY is the embedded development which is used for measuring the dif-ference in performance. It has 48KB of ram running at 72MHz. Additionally it supports aST-LINK/V2 interface. This is an interface that allows for debugging the code running on theembedded device from a connected device using serial wire debugging.

Additionally it has support for semihosting. Semihosting is a mechanism that allows com-munication over the ST-LINK/V2 interface to a debugger. For instance, in the Rust code thereis a function called ’bpkt’.

pub fn bkpt ( ) {match ( ) {

#[ c f g ( t a r g e t a r c h = ”arm ” ) ]( ) => unsa fe { asm ! ( ” bkpt” : : : : ” v o l a t i l e ”) } ,#[ c f g ( not ( t a r g e t a r c h = ”arm ” ) ) ]( ) => unimplemented ! ( ) ,

}}

This function will cause the external attached debugger to break whenever this code is called.Semihosting also supports sending arbitrary data to the debugger.

Figure 3.1: An image of the STM32F3DISCOVERY

18

3.3 ARM Toolchain

The GNU ARM EABI toolchain is used to compile and link the RTOS. This is an open-sourcetoolchain maintained by ARM that allows for compilation to Cortex-M and Cortex-R ARMprocessors. To compile the Rust code, the Rust compiler is used. This compiler generates objectfiles from Rust source files. These Rust object files are linked with the C object files to build theimage.

19

CHAPTER 4

Experiments and results

4.1 Experiments

A Rust implementation of the scheduler is compared with the original C version of ChibiOSto test how the Rust scheduler performs. A few different benchmarks will be run to test thedifference.

The benchmark uses an internal timer. During this test, the STM32F3 Discovery will havea generator thread and a sender thread running. The generator thread wakes up every threemilliseconds. After this wake-up, it will calculate the difference between the current time andthe last wake-up time and add the output to a circular buffer. There is also a sender threadrunning that will wake up every two milliseconds and send the contents in the circular bufferover UART to the output channel. This setup is similar to how Cyclitest works, which is builtto benchmark Real-Time Application Interface Linux (RTAI).

Additionally, there are some worker threads that generate stress on the system by generatinga configurable amount of random numbers. The worker threads will toggle a random light onthe board depending on the result of the generated work. Using the result of the generated workensures that the work generated does not get optimized away by the compiler.

There are multiple tests run with different workloads. The changing parameters are thenumber of worker threads running and how long the threads are running. The calculation thatthese threads are doing is running a small random number generator (RNG). As soon as thisRNG completes, it will have a small chance to toggle one of the LED pins depending on theresult of the RNG. The RNG takes a different RNG and executes it multiple times. It xors theresults together and returns the result. The number of times this RNG executes its child RNGcan be modified to increase or decrease the workload per thread.

Measurements are taken over about a 5-minute interval to create around 100.000 measure-ments. The first 100 and last 100 measurements are thrown away. The starting and stopping ofthe RTOS from the debug interface could influence those results.

4.2 Results

Some of the graphs are split into two parts. The difference between these parts is 1/10 of amillisecond. This difference is the tick size of the configuration of the OS. Every 1/10th of amillisecond ChibiOS will increase the tick count and execute any attached timers. Why thereonly is a split for some amount of stress threads is unclear.

Additionally, some banding can be noticed in the graph. This banding happens around every250 clock cycles. This banding shows up in every measurement except for the case that there areno worker threads. It is unclear what causes the banding. One hypothesis, that the interruptonly gets called in a 250 clock cycle interval cannot explain it, since the C kernel does not showthis problem. Additionally, the specification of the used architecture shows that an interruptshould take 12 clock cycles [15].

21

Figure 4.1: Measurements of running the schedulers with 64 workload threads. Banding can benoticed at 250 clock cycles intervals for the Rust scheduler. The C scheduler seems to have somebanding as well, but this is a lot less pronounced.

Figure 4.2: Measurements of running the schedulers with 32 workload threads. There is a splitwhere some measurements take around 3.0 ms (216000 clock cycles) and some take around 3.1ms (223200 clock cycles). Both schedulers show this behavior.

22

In clockcycles Average Stdev Max

0 threads 215888 — 215890 2.85 — 3.50 215993 — 21601916 threads 215782 — 215852 6.99 — 95.1 216087 — 21624632 threads 215884 — 215830 80.1 — 79.3 216355 — 21639764 threads 215887 — 215891 266 — 355 216882 — 216904

Table 4.1: Measurements of the scheduler performance using different characteristics. On theleft hand side is the C measurement, on the right hand side is the Rust measurement. In case agraph consists of 2 parts, the leftmost part is chosen.

Another hypothesis is that the Rust scheduler has a loop where every iteration takes about250 clock cycles. This loop causes the resulting measurements to show banding. Both schedulersuse a loop to decide where the lower priority tasks are on the ready list, as soon as a higherpriority task preempts the lower priority task. So the question stands why the C scheduler doesnot show similar banding. The hypothesis of whether the Rust loop causes the banding is nottested.

23

CHAPTER 5

Discussion and conclusion

The results show that the determinism of the Rust scheduler is comparable to the original Cscheduler. However there are some minor differences. The start time of a task is more evenlydistributed in the Rust scheduler. The original C scheduler has fewer measurements close to thebounds, in comparison to the Rust scheduler, which has more measurements close to the bounds.Although there are minor differences, these results show that Rust is deterministic enough toreplace C for simple programs. The current implementation is not complex enough to extrapolatethese results because it does not use the safety features of Rust. The question remains whetherthese result can be extrapolated to more complex programs.

In this thesis, the safety features of Rust have not been thoroughly used. The interoperabilitybetween C and Rust is unsafe, which means that all current operations contain an unsafe partbecause the scheduler is executed by C. Building an RTOS from scratch in Rust could eliminatethese unsafe parts. Additionally, the Rust scheduler never has the problem of dangling references.A thread created at some point will exist until the RTOS shuts down. One of the causes of Cerrors, dangling pointers, that could have been solved by using Rust, is not tested since thereare never any references to dangle since all references are static by nature.

Using the safety features of Rust should not significantly impact the deterministic perfor-mance of the program. Most safety checks by Rust are done at compile time. The run-timeoverhead of the safety features is expected to be negligible. This makes it highly probable thatit is possible to build an RTOS in Rust with competitive performance to C or C++ RTOSes.Further research is required to verify that writing an RTOS in Rust is doable.

25

Bibliography

[1] Nicholas D Matsakis and Felix S Klock II. The Rust language. Ada Lett., 34(3):103–104,October 2014.

[2] Abhiram Balasubramanian, Marek S Baranowski, Anton Burtsev, Aurojit Panda, ZvonimirRakamari, and Leonid Ryzhyk. System programming in Rust: beyond safety. ACM SIGOPSOperating Systems Review, 51(1):94–99, 2017.

[3] Tock embedded operating system. https://www.tockos.org/.

[4] Redox - Your next(gen) OS. https://www.redox-os.org/.

[5] Jane WS Liu, Ajit Narayanan, and Quan Bai. Real-time systems. Citeseer, 2000.

[6] Joachim Wegener, Harmen Sthamer, Bryan F. Jones, and David E. Eyres. Testing real-timesystems using genetic algorithms. Software Quality Journal, 6(2):127–135, 1997.

[7] Miguel Masmano, Ismael Ripoll, Alfons Crespo, and Jorge Real. Tlsf: A new dynamic mem-ory allocator for real-time systems. In Real-Time Systems, 2004. ECRTS 2004. Proceedings.16th Euromicro Conference on Real-Time Systems, pages 79–88. IEEE, 2004.

[8] Amit Levy, Bradford Campbell, Branden Ghena, Pat Pannuto, Prabal Dutta, and PhilipLevis. The case for writing a kernel in Rust. In Proceedings of the 8th Asia-Pacific Workshopon Systems, page 1. ACM, 2017.

[9] Robert I Davis, Ken W Tindell, and Alan Burns. Scheduling slack time in fixed priority pre-emptive systems. In Real-Time Systems Symposium, 1993., Proceedings., pages 222–231.IEEE, 1993.

[10] Lui Sha, Ragunathan Rajkumar, and Shirish S Sathaye. Generalized rate-monotonicscheduling theory: A framework for developing real-time systems. Proceedings of the IEEE,82(1):68–82, 1994.

[11] C. L. Liu and James W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. ACM, 20(1):46–61, January 1973.

[12] Giuseppe Lipari. Earliest deadline first. Scuola Superiore SantAnna, Pisa-Italy, 2005.

[13] Bernhard HC Sputh and Eric Verhulst. Virtuosonext: Fine-grain space and time partitioningrtos for distributed heterogeneous systems.

[14] M. Vaidehi and TR Gopalakrishnan Nair. Multicore applications in real time systems.CoRR, abs/1001.3539, 2010.

[15] ARM. Cortex-M3 technical reference manual, ARM DDI 0337E, 2006. Also avail-able at http://infocenter.arm.com/help/topic/com.arm.doc.ddi0337e/DDI0337E_

cortex_m3_r1p1_trm.pdf.

27

https://www.tockos.org/

https://www.redox-os.org/

http://infocenter.arm.com/help/topic/com.arm.doc.ddi0337e/DDI0337E_cortex_m3_r1p1_trm.pdf

http://infocenter.arm.com/help/topic/com.arm.doc.ddi0337e/DDI0337E_cortex_m3_r1p1_trm.pdf

an rtos for embedded systems in rust - uva · 2020-05-21 · chapter 1 introduction rust is a...

Documents