a novel scheduling framework leveraging hardware cache ...while there exist many isolation...

A Novel Scheduling Framework Leveraging Hardware CachePartitioning for Cache-Side-Channel Elimination in Clouds

Read SpraberyUniversity of Illinois,[email protected]

Konstantin EvchenkoUniversity of Illinois,Urbana-Champaign

[email protected]

Abhilash RajOregon State [email protected]

Rakesh B. BobbaOregon State University

[email protected]

Sibin MohanUniversity of Illinois,[email protected]

Roy H. CampbellUniversity of Illinois,[email protected]

ABSTRACTWhile there exist many isolation mechanisms that are availableto cloud service providers, including virtual machines, containers,etc. , the problem of side-channel increases in importance as a re-maining security vulnerability – particularly in the presence ofshared caches and multicore processors. In this paper we presenta hardware-software mechanism that improves the isolation ofcloud processes in the presence of shared caches on multicorechips. Combining the Intel CAT architecture that enables cachepartitioning on the �y with novel scheduling techniques and statecleansing mechanisms, we enable cache-side-channel free comput-ing for Linux-based containers and virtual machines, in particular,those managed by KVM. We do a preliminary evaluation of oursystem using a CPU bound workload. Our system allows Simultane-ous Multithreading (SMT) to remain enabled and does not requireapplication level changes.

KEYWORDScache, side-channels, scheduling, hardware partitioning, cloud com-puting

1 INTRODUCTIONCache-based side-channel attacks (e.g., [13, 26, 27]) are a threatto computing environments where a diverse set of users sharehardware resources. Such attacks take advantage of observableside-e�ects (on hardware) due to the execution of software pro-grams. A number of these attacks focus on di�erences in timingwhile accessing shared processor caches. Recently, researchers haveadapted these cache-based side-channel attacks to cloud computingenvironments, especially Infrastructure-as-a-Service (IaaS) clouds(e.g., [18, 23, 30, 47]), and showed that secrets and sensitive infor-mation can be extracted across co-located virtual machines (VMs).Container frameworks such as Docker[7] that virtualize the oper-ating system are even more susceptible to such attacks since theyshare the underlying operating system (e.g., [48]).

Initial cache-based side-channel attacks focused on schedulers,either at the OS level or Virtual Machine Monitor (VMM) layer [13,26, 47]. Other approaches also focused on resource sharing – e.g.,using the processor core (for instance SMT on a single core) toaccess shared L1 and L2 caches [27]. Recent attacks have focused

on the Last-Level-Cache (LLC) that is shared across multiple cores[18, 23, 43, 44, 48] – these make defenses much harder.

Many defenses against cache-side-channel attacks in cloud envi-ronments have also been proposed (e.g., [6, 19, 21, 22, 24, 25, 28, 31,32, 37–40, 45, 51]). However, the proposed solutions su�er from avariety of drawbacks: (a) some are probabilistic [25, 37, 51]; (b) oth-ers do not protect applications when SMT is enabled [51]; (c) somerequire developers to re-write applications [19, 22, 31], (d) whileothers require hardware changes [39, 40] impacting deployability;(e) some depend on violating x86 semantics by modifying the reso-lution, accuracy or availability of timing instructions [21, 24, 38]and consequently require signi�cant changes to the applications.Compiler-based [6] and page coloring based cache-partitioning[28, 32, 45] approaches have high overheads making them imprac-tical.

Defenses against cache-side-channel attacks that eliminate theattacks, rather than frustrate techniques employed by the attackerare desirable. Shannon’s noisy-channel coding theorem states thatinformation can be transmitted, regardless of the amount of noise onthe channel. Probabilistic defenses (e.g., [25, 37, 51]) may decreasethe bit-rate of attacks, but cannot fully eliminate them. In additionto a guaranteed defense, the solution must not severely impact (i)the performance of the applications or (ii) utilization of the machine.In other words, defenses must minimize the performance impact ofenforcing hard isolation to remain practical. For instance, disablinghyper-threading [51], which many existing solutions do, can havea signi�cant performance impact on the applications. To the bestof our knowledge, every cloud provider enables hyperthreading.Furthermore, the solutionsmust be easy to adopt. History has shownthat solutions requiring additional development time (or signi�cantchanges to existing applications) have a harder time being adopted(as shown in the Return Oriented Programming (ROP) community[34]). Thus, solutions that require developers make application levelchanges [19, 22] may be challenging to apply to existing workloads.

In this paper, we present a framework designed to eliminateside-channel attacks in cloud computing systems that use multicoreprocessors with a shared LLC. The proposed framework uses acombination of Commodity-o�-the-Shelf (COTS) hardware featuresalongwith novel scheduling techniques to eliminate cache-based side-channel attacks. In particular, we use Cache Allocation Technology(CAT) [17] that allows us to partition last-level caches at runtime.CAT, coupled with state cleansing between context switches, and

1

selective sharing of common libraries removes any possibility ofcache-timing-based side-channel attacks between di�erent securitydomains. We implement a novel scheduling method, as an extensionto the commonly-used Completely-Fair-Scheduler (CFS) in Linux, toreduce the overheads inherent due to any such cleansing operation.Our solution provides a transparent1 way to eliminate side-channelattacks, while still working with hyperthreading enabled systems. Itworks with containers, kernel-based virtual machines (KVMs) andany other schedulable entity that relies on the OS scheduler2.

In summary, the proposed framework:C1 Eliminates cache-based side-channel attacks for schedula-

ble units (e.g., containers, KVMs)C2 Requires no application level changesC3 Allows providers to exploit hyperthreading, andC4 Imposes modest performance overheads

2 SYSTEM AND ATTACK MODEL

Figure 1: Hierarchical Cache in Modern Processors

2.1 System ModelWe consider public Platform-as-a-Service (PaaS) or Infrastructure-as-a-cloud (IaaS) cloud environments. Such environments allowfor co-residency of multiple computational appliances (e.g., con-tainers, VMs) belonging to potentially di�erent security domains.We assume that the cloud computing infrastructure is built usingcommodity-o�-the-shelf (COTS) components. In particular, we as-sume that the servers have multi-core processors with multiplelevels of caches, some of which are shared (See Figure 1). We alsoassume that the servers have a runtime mechanism to partition thelast-level shared cache.

For this work, we evaluated our approach using an Intel Haswellseries processor that has a three-level cache hierarchy: privatelevel 1 (L1) and level 2 (L2) caches for each core (64KB and 256KBrespectively) and a last level (L3) cache (20MB) that is shared amongall the cores3. For cache partitioning, we turned to the Intel CacheAllocation Technology (CAT)[17] that allows us to partition theshared L3 cache. The CAT mechanism is con�gured using model-speci�c registers (MSRs). This can be carried out at runtime in adynamic fashion using software mechanisms. On our processorthe maximum number of partitions is limited to four but newergenerations support more [16]. Intel CAT technology has beenavailable on select Haswell series processors starting late 2014 andcontinues to be available on select processor lines belonging to1From the perspective of the application developer/use2For ease of exposition, in the rest of the paper, we will describe our framework usingcontainers.3This is the model of the system that we use for the rest of this paper.

Broadwell and Skylake micro-architectures that succeeded Haswellmicro-architecture.

While our implementation and evaluation used an Intel Haswellprocessor with Intel CAT technology the proposed approach isgenerally applicable to any multi-core processor with a hierarchicalcache, shared last-level cache, and a mechanism to partition theshared last-level cache.

2.2 Attack ModelWhile there are many potential threats to security in public cloudenvironments (e.g., [5, 35]) the focus of this paper is on cache-basedside-channel attacks (e.g., [46, 48]). At a high-level, in cache-basedside-channel attacks an attacker deduces information about thevictim’s computations by observing the victims cache usage. Theinformation deduced can range from high-level information suchas which tenant you are co-located with (e.g., [30]) to more �negrained details such as cryptographic keys or items in a shoppingcart (e.g., [18, 48]).

In such attacks an attacker process �rst needs to co-locate (i.e.,get assigned to the same physical server) itself with the target or vic-tim in the infrastructure. Methods to both achieve co-residency [30]and to thwart co-residency (e.g., [1, 14, 15, 25, 49]) have beendiscussed in the literature. In this work we assume that an at-tacker is able to co-locate with the victim and focus on thwart-ing side-channel attacks themselves. Our framework complementsapproaches that thwart co-residency.

There are primarily two techniques to exploit cache based side-channels discussed in the literature, namely, ‘Prime+Probe” attacks[27] and ‘Flush-Reload” attacks [13]. It is important to note thatwhile these techniques are popular, they are only possible becauseof measurable interference. Our solution addresses these speci�ctechniques and other techniques that leverage the cache as a side-channel.

(a) “Prime+Probe” attacks [27]: At a high level, the attack‘primes’ the entire cache with random data. After waiting for a cer-tain time (so that the victim executes), the adversary then ‘probes’the cache to see which of its own lines have been evicted – thusproviding information on which lines were used by the victim. Thisinformation combined with the knowledge of cache access patternsexhibited by victim programs (e.g., cryptographic algorithms) canbe used to extract information (e.g., cryptographic key being used)about the victim.

(b) “Flush-Reload” attacks [13]: are a modi�cation of prime-and-probe attacks and leverage memory pages that are sharedbetween the attacker and the victim. This allows the attacker to‘�ush’ speci�c lines rather than prime the whole cache. The restof the procedure is similar – waiting for a certain amount of timeto see which of the lines corresponding to shared memory pagesare in the cache (hence used by the victim). This method is moree�cient and results in less noise for the attacker.

We consider consider both cross-core (i.e., attacker and victimrunning on di�erent cores on the same processor) and same-core(i.e., attacker and victim running on the same core) side-channelattacks. Same-core attacks, as the name indicates, require the at-tacker to achieve co-residency on the same core and be able topreempt the victim. They typically focus on higher-level caches

2

(L1 and L2) that are speci�c to the core. Cross-core attacks on theother hand only require the attacker to achieve co-residency on thesame physical server. Their limitation is that they only allow theattacker to observe victim’s activities through the last level-cachewhich is shared and thus is noisy. However, it has been shown thatsuch limitations can be overcome [23].

To clarify, the attacker is capable of achieving co-residency witha victim, can allocate an arbitrary number of resources, and gameboth the cloud level scheduler (placement), and the operating sys-tem level scheduler (preemption). However, we assume that thecloud infrastructure is trusted. That is, while the cloud schedulermay be gamed, we assume that both the attacker and victim areauthenticated with the cloud provider (e.g., for billing purposes).Additionally, we assume that the host kernel running either KVMor containers is trusted.

It is important to note that there are many other threats to cloudcomputing environments apart from cache-based side-channels.We limit the scope of this work to addressing cache-based sidechannels attacks. Such attacks can be launched by any legitimatetenant without having to exploit any vulnerabilities and withoutcompromising the underlying software or hardware.

3 SYSTEM ARCHITECTURE

Scheduler OS

Cache

ManagementMechanism

CACHE

Isolated Region

SharedCachePar44on

Shared Region

CORES Core3 Core4 CoreN

IsolatedCachePar44on

Core1 Core2

Figure 2: System Overview

Our framework logically partitions a host server into an isolatedregion4 and a shared region as illustrated in Figure 2. Tenants arerequired to indicate to the cloud provider whether or not theircontainers need isolated execution. Containers designated as re-quiring isolated execution will be executed in the isolated partitionof the host server, while all other containers will be executed inthe shared partition. The ‘isolated execution’ designation guaran-tees that processes within the designated containers will not sharecache resources with (i) processes from any container belonging toanother tenant (or security domain), or with (ii) processes belong-ing to any container that is not designated for isolated executionirrespective of their ownership (or security domain).

Our design, discussed next, leverages (i) Intel CAT, processora�nities (or CPU pinning) and selective page sharing to provide

4It can be extended to multiple isolated regions.

spatial isolation, and (ii) co-scheduling with state cleansing to pro-vide temporal isolation for designated containers.

3.1 Hardware Enforced Spatial IsolationIntel’s CAT [17], currently available in COTS hardware in Intel’sXeon series processors, is designed to improve the performanceof latency sensitive real-time workloads by allowing the LLC tobe partitioned into distinct regions. Each processor core can berestricted to allocating cache lines into a speci�ed cache partition.Consequently, a processor can only evict cache lines within theprocessor’s assigned LLC partition, thus reducing the impact ofprocesses running on that processor core can have on other cacheregions and vice versa. In particular, note that the ability to allocatecache lines (priming in prime+probe attacks) in a cache sharedwith the victim, and the ability to evict cache lines being usedby the victim process (�ushing in �ush-reload attacks) are keysteps in cache-based side channel attacks. Therefore ensuring thata potential victim and attacker processes run on cores associatedwith di�erent LLC cache partitions, as we propose in our framework,defeats some cache-side-channel attacks. Speci�cally, cross-coreprime+probe attacks on a victim process using a di�erent cachepartition are eliminated.

Accordingly, we partition the LLC into two regions using IntelCAT and associate cores in the system to each partition such thata core is assigned to one or the other partition but not both. Werefer to these partitions as isolated partition and shared partition.Processes belonging to containers designated as needing isolatedexecution will be pinned to the cores (using processor a�nity)associated with the isolated cache partition and the rest will bepinned to cores associated with the shared cache partition. Themaximum number of cache partitions available with Intel CAT is ahardware parameter and �xed for a given micro-architecture. Themachine used for our testing allows for up to 4 distinct partitions,but newer machines have 16. But the con�guration of size, numberof active partitions, and core to partition assignment can occurin software and can be adjusted based on the demand for isolatedexecution and the needs of the expected workloads. Further, if thereis no demand for shared execution then the host server could bepartitioned into two (or more) isolated partitions.

As previously discussed, this hardware-assisted spatial partition-ing protects the containers running in the isolated partition againstcross-core prime+probe cache-side-channel attacks from containersrunning in the shared partition. However, cross-core cache-side-channel attacks across cache partitions are not entirely eliminatedbecause Intel CAT, primarily designed to improve fairness of cachesharing and the performance of real-time workloads, allows cachehits across partition boundaries to maximize the bene�ts of sharedmemory (e.g., shared libraries). In particular, if the victim and theattacker processes have shared memory (e.g., because of layered�le systems used in container frameworks), an attacker can �ushcache lines associated with the shared memory of interest from hisLLC partition, wait for a little while for the victim to execute, readthe same shared memory and measure the time taken to see if therewas a cache hit. Since the attacker previously �ushed the cacheline associated with the shared memory from his LLC partition,a cache hit indicates that the victim executing in a di�erent core

3

(a) Union Filesystem Architecture

(b) Modi�ed Union Filesystem Architecture

Figure 3: Defense against Shared-Memory Attacks

and LLC partition has used or is using the library. While this limitsthe granularity of information an attacker can glean across parti-tion boundaries, timing observations and hence the side-channel isnot entirely eliminated. Further, cache-side-channel attacks fromwithin an isolated partition continue to be viable. These will beaddressed in the following subsections.

However, even the partial protection against cache-side chan-nel attacks obtained through this spatial partitioning comes at thecost of reduced LLC cache size and the associated potential reduc-tion in performance. Fortunately however, reduction in cache sizehas been shown to have relatively little impact on modern cloudworkloads [12]. In particular, minimal performance sensitivity toLLC size has been reported above cache sizes of 4 � 6MB for mod-ern scale-out and traditional server workloads (see Section 4.3 andFigure 4 in [12]) that are typical in cloud environments.

3.2 Selective Page SharingAs previously discussed, hardware-assisted spatial partitioning doesnot eliminate cross-core �ush-reload style cache-side channel at-tacks when the attacker and the victim have shared memory pages.Modern container deployments have one primary source of sharedmemory. Since Docker is one of the most popular choices for build-ing container images and running them on Linux platforms, welimit our discussion to it, but these concepts are similar in othercontainer frameworks. Docker uses storage drivers that are built on

top of Union Filesystems (UFS) to present a �lesystem to a processinside of a container from a stack of layers. Figure 3a shows howdi�erent layers are stacked and shared across di�erent containers.Several di�erent containers may use the same base components,thus a way was needed to reduce disk and memory usage of thecommon building blocks. Docker solves this problem by uniquelyidentifying each layer by its cryptographic hash and sharing thecommon ones between all containers built using a given layer ID.

Often there are multiple containers running the same imagewhich causes them to share all the layers except for the upper mostwritable layer. For example, two Apache Tomcat servers running onthe same Docker installation using the same image would share allthe binaries including Java Virtual Machine (JVM), Apache Tomcatbinary, GnuPG binary, and OpenSSL binary among others. Onlythe top most layer, containing writable elements such as Tomcat’slog �le, di�er between containers.

To thwart �ush-reload style attacks across cache partitions, weeliminate cross-domain page sharing through selective layer dupli-cation. That is, for containers designated for isolated execution, oursystem allows sharing of layers only among containers belongingto the same tenant (or security domain) but not otherwise (seeFigure 12b). This is a reasonable trade-o� as it enables isolationbetween di�erent tenants (or security domains) while limiting theincrease in memory usage. In particular, the increase in memoryusage will be a function of the number of tenants running on theserver rather than the number of containers. We do not preventtraditional sharing of layers for containers running in the sharedpartition.

For VMs, kernel same-page merging (KSM) module in Linuxused for memory deduplication is the main source of shared pages.However, KSM and memory de-duplication in general come withtheir own security risks(e.g., [2, 3, 29, 33]). For instance it has beenshown that KSM can be leveraged to break ASLR [2], can enableRowhammer [20] attacks across VMs [29], and create a timingside-channel that can be used to detect the existence of softwareacross VMs [33] much like the �ush-reload style attack discussedpreviously. Given the serious security concerns surrounding theuse of KSM we disable it in our framework.

Note, that selective page sharing combinedwith hardware-assistedspatial partitioning eliminate cross-core cache-side-channel attacksacross partitions. Cache-side-channel attacks from within an iso-lated partition continue to be a threat and will be discussed next.

3.3 State CleansingEven with containers running in an isolated partition, an attackerallocated to the same isolated partition as the victimmight be able to(i) observe the victim’s LLC usage if scheduled to run on a di�erentcore than the victim but associated with the same partition, and (ii)even observe the victim’s L1 and L2 usage if running on the samecore as the victim (e.g., [47]). In the latter case an attacker observesthe cache usage of the victim by managing to frequently alternateexecution with the victim process.

To thwart the latter kind of attacks we propose to cleanse thecache state when context switching between processes (containers)belonging to di�erent tenants (or security domains). That is, if aprocess from one security domain (tenant), SD1, runs on a core,

4

then a processes belonging to another domain, SD2, must either runon a core assigned to a separate partition or state-cleansing must beperformed on the partition during the transition between processfrom SD1 to process from SD2. There currently exists no hardwareinstruction for per-partition cache invalidation.More details on howstate cleansing is achieved are in Section 4. However, this does notprevent attacks from an attacker process who is running in parallelwith the victim either on the same-core through simultaneousmulti-threading or hyper-threading, or running on a di�erent core but inthe same partition.

A naïve solution would be to assign just one core to the isolatedpartition, disable hyper-threading and perform state-cleansing onevery context-switch. The performance cost of such an approachis unattractive. A mitigation would be to create multiple isolatedpartitions with a single-core assigned to each. However, the numberof cache partitions is �nite (4 in our case), and such a an approachwould further fragment the LLC for the shared partition. Further,many cloud workloads are multi-threaded.

3.4 Co-scheduling for Temporal IsolationTo address the aforementioned threat, we use a novel schedulingtechnique for temporal separation of security domains sharing asingle cache partition. Co-scheduling container processes belongingto a given security domain across multiple processors amortizesthe cost of state cleansing, but introduces additional complexitywhich we address below.

Scale-out workloads with many threads, those commonly de-ployed on cloud infrastructure, motivate this approach. As threadcounts for a security domain increase, the number of threads ableto run per domain at any given time remains high. This allows us todrive up utilization of cores assigned to a partition and only �ushthe partition when changing to the next domain. The complexitiesstem from needing to synchronize all isolated cores during domainchanges, thus any implementation of co-scheduling has to guaran-tee an exclusion property. No task belonging to security domainSDX can run on an isolated processor while a task from anotherdomain, SDY , is running in a processor associated with the sameisolated partition. Additionally, before a task from SDX can run, astate cleansing event must occur. As shown in Figure 4, multiplecores can be utilized at once within a security domain, but thenstate-cleansing must be performed as every core assigned to a givenpartition context switches to another security domain. The nextsecurity domain cannot run on any isolated processor until thisprocess is complete.

4 IMPLEMENTATIONPartitioning the LLC and associating cores with each partition doesnot need changes to the kernel or the operating system. It can bedone by a system administrator as part of themachine con�guration.Here we focus on the implementation of rest of the components.

4.1 Co-SchedulingCo-scheduling can enforce isolation between security domains, butany implementation must be precise. By precise, we mean that anyform of “loose" or “lazy" co-scheduling is unacceptable. For example,

Isolated

Core 0 Core 1 Core 3Core 2 Core 4 Core 5 Core 7Core 6 Core 8 Core 9

SD0 SD0 SD0 SD0

SD1 SD1 SD1 SD1

SD2 SD2 SD2 SD2

State Cleansing Event

State Cleansing Event

time

= Schedulable UnitSD = Security Domain

Figure 4: Co-scheduling Overview - The isolated environ-ment is on the left. It consist of an isolated cache parti-tion along with the processors assigned to that partition. Co-scheduling is used to group tasks belonging to the same secu-rity domain and state cleasing events occur when changingdomains. Regular tasks are on the right in a separate cachepartition. Tasks on the right have no scheduling restrictions.

consider a naïve implementation of co-scheduling as outlined inFigure 5.

Table 1: Per Security Domain Thread Allocations

Security Domain Thread CountORG1 2ORG2 3

Figure 5 is a schedule instance of the con�guration as describedin Table 1. The example is a situation in which 2 cores are associatedwith an isolated partition and are running containers belonging totwo security domains. These cores may be two physical cores orone physical core presented as two to the operating system (SMT).The de�ning characteristic in our example is the shared cache. Forhypthreaded cores, this is the L1, L2, and LLC. In the case of twophysical cores, the shared cache is only the LLC. The cores 1 and 2in our example are not cores on two separate sockets on the samemotherboard.

Consider a situation in which Core1 initiates a domain transferupon scheduling a thread from a con�icting domain, ORG2:THREAD1in this example. Even if the scheduler invokes a �ushing event,f1, there remains a �t3 during which cross-core, cross-domainattacks could be carried out. This is seen again afterCore1 schedulesORG1:THREAD1 and ORG2:THREAD3 leading to durations �t4 and �t5during which attacks remain feasible. While this situation is stillbetter than the behavior of the default scheduler, it is not guaranteedto eliminate observable cache interference.

CachePartition

ProtectedRegion

CORESCore1

Core2

ORG1:Thread1 ORG2:Thread1

SchedulingPolicy


ORG1:Thread1

ORG1:Thread2

ORG2:Thread3

ORG2:Thread1

t

f1 f2 f3

Δ t3 Δ t4 Δ t5

Figure 5: Limitations of Naïve Co-Scheduling

5

This example highlights the need for careful attention to de-tails when implementing co-scheduling for improved security onmodern operating systems. E�ective elimination of side-channelsdictates that cross-core synchronization be performed before statecleansing occurs and subsequent domain scheduling takes place.

To ensure that no two schedulable units belonging to di�erentsecurity domains run on an isolated cache partition simultaneously,we implement the core synchronization protocol shown in Figure6. The protocol works by making the �rst core in an isolated par-tition a leader core. The leader core is responsible for initiatingdomain changes, synchronizing cores, and �ushing the cache (statecleansing). All isolated cores only schedule tasks belonging to theACTIVE_SECURITY_DOMAIN. Note that while only 2 cores are shownin Figure 6, the approach works with any number of cores. In Sec-tion 5 we evaluate the protocol with 4 cores assigned to an isolatedpartition.

ORG1: Thread1 ORG1: Thread2

TRUSTED PROC

Initiate SD Change

Flush Cache

Force Reschedule

Change SDTRUSTED PROC

ORG2: Thread1 ORG2: Thread2

Leader Follower

ROUND_OVER = True

ROUND_OVER = False

Figure 6: Strict Co-Scheduling Protocol

Isolated cores rely on two pieces of shared state to achieve strictsynchronization. The leader core is the only core that can modifythe state. The ROUND_OVER variable indicates to follower cores thata domain change is about to occur. A timer on the leader coreinitiates a domain change by modifying this variable and invokingthe __schedule function on the leader core. The change domainevent �res every sysctl_sched_min_granularity. After settingthe ROUND_OVER variable to true, the leader core issues a reschedulecommand to follower cores and waits for them to send back anacknowledgment. The acknowledgment is performed within the__schedule function on follower cores. When the ROUND_OVERvariable is set, partitioned cores can only run trusted processes.These are only kernel tasks, including: ksoftirq, watchdog, andthe idle task.

After receiving an acknowledgment back from all follower cores,the leader then �ushes the cache and updates the ACTIVE_SECURITY_-DOMAIN to point to the next security domain. Our system uses aseparate task_group within the Linux kernel for each security do-main. All task_groups at the same level are linked together usinga linked list. We use this list to implement a round robin style it-eration through security domains on the leader core. Run-queue

checking is performed to ensure a domain with runnable tasks ischosen.

Having chosen the next domain, the leader core sets ROUND_-OVER to false and again issues a reschedule command to followercores. The __schedule function will eventually be invoked on thefollower cores, but we use the reschedule command to reduce theidle time of follower cores. This protocol corrects the problempresented in Figure 5 resulting in “strict" co-scheduling as seen inFigure 7.

CachePartition

ProtectedRegion

CORESCore1

Core2


SchedulingPolicy


ORG1:Thread1

ORG1:Thread2

ORG2:Thread3

ORG2:Thread1

t

f1 f2 f3

Figure 7: Synchronized Co-Scheduling

4.2 State CleansingThe isolated cache partition must be cleaned or �ushed beforeswitching context to a di�erent security domain. For a processorcache, state cleansing or �ushing equates to invalidating the cachelines or evicting them, but no hardware mechanism exists to �ushthe cache lines assigned to a single CAT partition. The WBINVDinstruction invalidates the entire shared cache disrupting processesin all partitions.

One way to implement state cleansing is for the user processto invoke the CLFLUSH instruction, which can evict cache linescorresponding to a linear virtual address and can be invoked fromuser space. This can be done by the application process before beingswitched out as was done in [50]. However, this requires changes touser applications which is not desirable. Another possibility is forthe kernel to invoke CLFLUSH on the entire virtual address space.While this is guaranteed to work across processor generations, thisapproach is too costly. An optimization is to do it only on validvirtual addresses for the task being switched out as was done in [42].However, this can still be a large range compared to the size of acache partition (4 � 6MB).

Another approach is to create an eviction set – a set of addresseswhich when loaded are guaranteed to evict the entire cache par-tition. However, the memory address to cache line mapping isproprietary and subject to change across processor generations.Cache-side-channel attacks also have to contend with this challengeand have addressed it by reverse-engineering the memory addressto cache line mapping for a given micro-architecture (e.g., [23]).Apart from the one-time of cost of reverse engineering, the cost ofthis approach is equal to loading memory the size of cache partition.We adopt this approach.

We perform this state cleansing anytime the security domain ischanged, as shown by the protocol in Figure 6 and the co-schedulingoverview in Figure 4. To reduce performance impact in the case ofother domains lacking runnable threads (due to blocking on I/O,etc. ), �ushing is only performed when ACTIVE_SECURITY_DOMAINchanges.

6

4.3 Selective Page SharingDocker uses Union File systems(UFS) to present a uni�ed view of theseveral di�erent layers. Of the several UFS that Docker supports likebtrfs[9], overlayfs[10], AUFS[8], AUFS is one of the most matureone. In a UFS, multiple directories on the host are uni�ed in asingle directory called union mount, without replicating the actualcontents of individual directories. Contents (�les or directories) ofall the directories become visible at union mount. Docker keepssingle copy of each layer on the host �lesystem and AUFS mountsall the layers to a single union mount point, which is then handedover to the container as its root �le system. Each layer could bea part of multiple union mounts and thus can be shared acrossdi�erent containers.An important point to note here is that thisbehavior is speci�c to AUFS and Docker.

Our implementation modi�es Docker (v1.14.0-dev, compiledfrom source code available on Github), speci�cally the AUFS stor-age driver, to transparently allow selective sharing of �le systemlayers. Docker’s command line client provides an option called--cgroup-parent which sets the parent cgroup of the container tothe value provided. We modi�ed the AUFS driver to have separatecopies of each layer for each cgroup-parent. Each cgroup-parentmaps to a separate security domain and thus no two containersbelonging to di�erent security domains have any common layersbetween them.

5 PERFORMANCE EVALUATION5.1 Impact of Scheduler ChangesFeasibility is evaluated using a CPU bound workload to determinethe impact on applications in the worst case scenario. Consider abatch workload such as Hadoop or a web serving workload. Thecase in which all threads have work and are not waiting for inputis evaluated here.

The machine is con�gured as outlined in Section 2. We allocate 2physical cores and 4 logical processors to an isolated cache region.The cores IDs are 4,20,5, and 21 due to the way Linux numberslogical processors. The cache region is 4MB (4 cache ways out of20 available on the system). Each security domain is assigned 4threads, and the number of domains is varied from 2 to 8. Eachdomain consists of 4 cpu bound tasks, 1 for each logical processor.Measurements are taken using sar and pidstat at an interval ofonce per second for 100 seconds. Figures 9-11 showwhere processortime is spent under di�erent scheduler con�gurations. It is clear tosee that the overheads for a single logical processor are a functionof the system and not of the number of security domains assigned toa partition. . Follower cores in (b) and (c) can be seen idling duringdomain changes, but the overheads never exceed slightly above10%, with the average case being slightly below 10% on followerprocessors. Flushing signi�cantly increases the performance penal-ties as can been seen by the di�erences in (c) and (b) in each of the�gures. The leader core spends the most time executing in systemspace due to its responsibility to change domains and synchronizecores, so this was to be expected. In the future, we will investigatemechanisms to reduce system time on the leader core and idle timeon follower cores. Figure 8b shows the overhead normalized tosecurity domain (each domain representing an organization with

(a) Baseline

(b) Co-Scheduling, Flushing

Figure 8: Overheads Per Domain

names testing[1-4] in this case) for the 4 security domain case. Thereduction in time spent in userspace per domain is small.

5.2 Impact of Shared Memory ReductionBy enabling selective sharing of base layers in Docker, we expectan increase in the memory footprint of containers as there aremultiple copies of certain pages that would otherwise be shared. Tounderstand the memory growth vs. the number of security domains,we ran 2 experiments each with a web server (Apache Tomcat)and an in-memory database (Redis). We used smem to measure theproportional set size (PSS) per container as it represents realisticmemory usage by only adding the fair share of the total sharedmemory. To make sure that the code is resident in memory beforewe take measurements, we sent 100 requests each to the ApacheTomcat servers and added 1000 random key-value pairs to eachRedis server. We designed our experiments to be representative ofreal world deployments ofmicro-services wheremultiple containerseach of a single type run in a distributed fashion.

In the �rst experiment we measured how the average memoryusage of each container increased as we increased the number

7

(a) Baseline (b) Co-Scheduling (c) Co-Scheduling with Flushing

Figure 9: 2 Security Domains, 4 Logical Processors in Isolated Partition - Per Core Utilization





of security domains from 1 to 4. Each security domain owns 5containers. It was compared against the same number of containersrunning without modi�cations on Docker (i.e., all the containersshare base layers). We measure the average memory usage foracross 50 runs. Figure 12a and 12b show that memory usage forRedis increased only about 0.45 MB per container and for ApacheTomcat only about 1.71 MB.

In the second experiment we ran 20 containers equally splitbetween 1, 2 and 4 security domains and observed the increase of

memory usage per container. Figure 12c and 12d show that memoryusage for Redis increased only about 1.61 MB per container and forApache tomcat 3.26 MB.

The results support our claim that the additional cost of selectivesharing of memory is negligible. The increase that we see arisesfrom the duplicate copies ofmemory pages, one per security domain,which were shared among all the containers within that domain.

8

(a) Unmodi�ed vs. Modi�ed (Redis) (b) Unmodi�ed vs. Modi�ed (Tomcat)

(c) Memory Usage vs. No. of Security Domains (Redis) (d) Memory Usage vs. No. of Security Domains (Tomcat)

Figure 12: Memory Overheads

6 RELATEDWORKS6.1 Cache Side-ChannelsCache side-channel attacks take advantage of the shared nature ofprocessor resources, in particular the processor’s Level-1, Level-2,and Last-Level caches. Prime+Probe attacks were �rst exploredacross Virtual Machines (VMs) by Osvik et al. [27] and shown tobe practical in cross-core attacks via the LLC by Liu et al. [23].Modern clouds are driving up machine utilization by o�ering con-tainer based platforms. This increases revenue [36] while provid-ing more performance to tenants [11, 41]. Cache side-channel at-tacks are already emerging on container based infrastructure [48].Flush+Reload attacks [18] are a real threat to cloud computing se-curity and have been successfully deployed on public infrastructure[48]. All of these attacks fall under the broader class of access basedside-channels in which an attacker can tell whether or not a givenresource has been accessed by a victim in a given time period.

6.2 Existing SolutionsExisting approaches achieve single core isolation by disabling hy-perthreading. StealthMem is able to enable hyperthreading, but theauthors do not su�ciently address cross thread scheduling issues[19] and it requires application developers to make code changes.Disabling hyperthreading signi�cantly reduces the throughput ofnot only the “secure" workload, but the entire machine. Disablinghypthreading is untenable as the economicmodel behind cloud com-puting dictates high per machine utilization [36]. Cutting whole-machine utilization by even 20% (the impact of hyperthreading in2005 [4]) is too high for cloud computing. We argue that for sometenants, a 20% overhead may be a reasonable trade-o� for increased

security, there is little value in forcing all tenants pay that perfor-mance penalty. CATalyst [22] follows a similar defense model toStealthmem but uses Intel’s CAT technology to assign virtual pagesto sensitive variables instead of software-based page coloring.

CACHEBAR [51] defends against Flush+Reload attacks by dupli-cating memory pages on access from separate processes, a schemeit calls Copy-On-Read. Since Linux’ KSM does de-duplication ofpages at regular intervals, it also modi�es the behavior of KSM toachieve Copy-On-Read. To defend against Prime & Probe attackCACHEBAR modi�es the memory allocation based on which cachelines the memory regions maps to so that the attacking processloses visibility into the victim process. It provides only probabilisticguarantees for defense against a Prime & Probe attack.

Some solutions are probabilistic [25, 37, 51] or do not protect ap-plications when SMT is enabled [51]. Others require developers tore-write applications [19, 22, 31] or rely on costly hardware changes[39, 40]. Other hardware approaches violate x86 semantics by mod-ifying the resolution, accuracy or availability of timing instruc-tions [21, 24, 38] . Finally, cache coloring [28, 32, 45] approacheshave impractically high overheads. Solutions like Nomad[25], whileprobabilistic, complement our approach. Nomad works in the cloudscheduler to reduce the co-residency of di�erent security domains.Our solution could be used in conjunction with Nomad to providehard isolation when co-residency restrictions are not possible.

7 CONCLUSIONSWe have presented a hardware/software mechanism that eliminatescache based side-channels for schedulable units belonging to sepa-rate security domains. Unlike many existing solutions, our solutionallows SMT to remain enabled and does not require application

9

level changes. A user simply noti�es the provider that a given work-load should be run in isolation. Our solution eliminates an attackersability to use the cache as a noisy communication channels anddoes not rely on probabilistic methods to decrease the granular-ity of information available on the channel. We implemented oursystem on top of the Linux scheduler and presented an evaluationof the system under a CPU bound workload. Our system has aworst case reduction in utilization in the case of 2 security domainsof 9.8% and only 2.97% and 1.68% decrease in utilization for the 4and 8 security domain con�gurations respectively (with 4 logicalprocessors assigned to an isolated partition).

REFERENCES[1] Yossi Azar, Seny Kamara, Ishai Menache, Mariana Raykova, and Bruce Shepard.

2014. Co-Location-Resistant Clouds. In Proceedings of the 6th Edition of the ACMWorkshop on Cloud Computing Security (CCSW ’14). ACM, New York, NY, USA,9–20. https://doi.org/10.1145/2664168.2664179

[2] Antonio Barresi, Kaveh Razavi, Mathias Payer, and Thomas R. Gross. 2015. CAIN:Silently Breaking ASLR in the Cloud. In Proceedings of the 9th USENIX Conferenceon O�ensive Technologies (WOOT’15). USENIX Association, Berkeley, CA, USA,13–13. http://dl.acm.org/citation.cfm?id=2831211.2831224

[3] Erik Bosman, Kaveh Razavi, Herbert Bos, and Cristiano Giu�rida. 2016. DedupEst Machina: Memory Deduplication as an Advanced Exploitation Vector. 2016IEEE Symposium on Security and Privacy (SP) 00 (2016), 987–1004. https://doi.org/doi.ieeecomputersociety.org/10.1109/SP.2016.63

[4] James R Bulpin and Ian Pratt. 2005. Hyper-Threading Aware Process SchedulingHeuristics.. In USENIX Annual Technical Conference, General Track. 399–402.

[5] Shakeel Butt, H Andrés Lagar-Cavilla, Abhinav Srivastava, and Vinod Ganapathy.2012. Self-service cloud computing. In Proceedings of the 2012 ACM conferenceon Computer and communications security. ACM, 253–264.

[6] Bart Coppens, Ingrid Verbauwhede, Koen De Bosschere, and Bjorn De Sutter.2009. Practical mitigations for timing-based side-channel attacks on modern x86processors. In Security and Privacy, 2009 30th IEEE Symposium on. IEEE, 45–60.

[7] Docker. (????). https://www.docker.com[8] Docker Documentation. 2017. AUFS storage driver. https:

//docs.docker.com/engine/userguide/storagedriver/aufs-driver/#how-the-aufs-storage-driver-works. (2017). (Accessed on 06/08/2017).

[9] Docker Documentation. 2017. BTRFS storage driver | Docker Documen-tation. https://docs.docker.com/engine/userguide/storagedriver/btrfs-driver/.(2017). (Accessed on 06/08/2017).

[10] Docker Documentation. 2017. OverlayFS storage driver | Docker Documenta-tion. https://docs.docker.com/engine/userguide/storagedriver/overlayfs-driver/.(2017). (Accessed on 06/08/2017).

[11] Wes Felter, Alexandre Ferreira, Ram Rajamony, and Juan Rubio. 2015. An up-dated performance comparison of virtual machines and linux containers. InPerformance Analysis of Systems and Software (ISPASS), 2015 IEEE InternationalSymposium on. IEEE, 171–172.

[12] Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, MohammadAlisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, AnastasiaAilamaki, and Babak Falsa�. 2012. Clearing the clouds: a study of emergingscale-out workloads on modern hardware. In ACM SIGPLAN Notices, Vol. 47.ACM, 37–48.

[13] David Gullasch, Endre Bangerter, and Stephan Krenn. 2011. Cache games–bringing access-based cache attacks on AES to practice. In Security and Privacy(SP), 2011 IEEE Symposium on. IEEE, 490–505.

[14] Yi Han, Tansu Alpcan, Je�rey Chan, and Christopher Leckie. 2013. SecurityGames for Virtual Machine Allocation in Cloud Computing. In 4th InternationalConference on Decision and Game Theory for Security - Volume 8252 (GameSec2013). Springer-Verlag New York, Inc., New York, NY, USA, 99–118. https://doi.org/10.1007/978-3-319-02786-9_7

[15] Yi Han, Je�rey Chan, Tansu Alpcan, and Christopher Leckie. 2014. Virtualmachine allocation policies against co-resident attacks in cloud computing. InCommunications (ICC), 2014 IEEE International Conference on. IEEE, 786–792.

[16] Intel. 2017. Cache Allocation Technology. https://01.org/cache-monitoring-technology?page=1. (2017). (Accessed on 06/08/2017).

[17] Intel. 2017. Cache Monitoring Technology and Cache Allocation Tech-nology. https://www.intel.com/content/www/us/en/communications/cache-monitoring-cache-allocation-technologies.html. (2017). (Accessed on06/08/2017).

[18] Gorka Irazoqui, Mehmet Sinan Inci, Thomas Eisenbarth, and Berk Sunar. 2014.Wait a minute! a fast, cross-VM attack on AES. In International Workshop onRecent Advances in Intrusion Detection. Springer, 299–319.

[19] Taesoo Kim, Marcus Peinado, and Gloria Mainar-Ruiz. 2012. STEALTHMEM:System-Level ProtectionAgainst Cache-Based Side Channel Attacks in the Cloud..In USENIX Security symposium. 189–204.

[20] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin, Ji Hye Lee, DonghyukLee, Chris Wilkerson, Konrad Lai, and Onur Mutlu. 2014. Flipping Bits inMemory Without Accessing Them: An Experimental Study of DRAM Dis-turbance Errors. In Proceeding of the 41st Annual International Symposium onComputer Architecuture (ISCA ’14). IEEE Press, Piscataway, NJ, USA, 361–372.http://dl.acm.org/citation.cfm?id=2665671.2665726

[21] Peng Li, Debin Gao, and Michael K Reiter. 2013. Mitigating access-driven timingchannels in clouds using StopWatch. In Dependable systems and networks (DSN),2013 43rd Annual IEEE/IFIP international conference on. IEEE, 1–12.

[22] Fangfei Liu, Qian Ge, Yuval Yarom, Frank Mckeen, Carlos Rozas, Gernot Heiser,and Ruby B Lee. 2016. CATalyst: Defeating Last-Level Cache Side ChannelAttacks in Cloud Computing. In High Performance Computer Architecture (HPCA),2016 IEEE International Symposium on. IEEE, 406–418.

[23] Fangfei Liu, Yuval Yarom, Qian Ge, Gernot Heiser, and Ruby B Lee. 2015. Last-level cache side-channel attacks are practical. In Security and Privacy (SP), 2015IEEE Symposium on. IEEE, 605–622.

[24] Robert Martin, John Demme, and Simha Sethumadhavan. 2012. TimeWarp:rethinking timekeeping and performance monitoring mechanisms to mitigateside-channel attacks. ACM SIGARCH Computer Architecture News 40, 3 (2012),118–129.

[25] Soo-Jin Moon, Vyas Sekar, and Michael K Reiter. 2015. Nomad: Mitigatingarbitrary cloud side channels via provider-assisted migration. In Proceedings ofthe 22nd acm sigsac conference on computer and communications security. ACM,1595–1606.

[26] Michael Neve and Jean-Pierre Seifert. 2006. Advances on access-driven cacheattacks on AES. In International Workshop on Selected Areas in Cryptography.Springer, 147–162.

[27] Dag Arne Osvik, Adi Shamir, and Eran Tromer. 2006. Cache attacks and coun-termeasures: the case of AES. In CryptographersâĂŹ Track at the RSA Conference.Springer, 1–20.

[28] Himanshu Raj, Ripal Nathuji, Abhishek Singh, and Paul England. 2009. Resourcemanagement for isolation enhanced cloud services. In Proceedings of the 2009ACM workshop on Cloud computing security. ACM, 77–84.

[29] Kaveh Razavi, Ben Gras, Erik Bosman, Bart Preneel, Cristiano Giu�rida, andHerbert Bos. 2016. Flip feng shui: Hammering a needle in the software stack. InProceedings of the 25th USENIX Security Symposium.

[30] Thomas Ristenpart, Eran Tromer, Hovav Shacham, and Stefan Savage. 2009.Hey, you, get o� of my cloud: exploring information leakage in third-partycompute clouds. In Proceedings of the 16th ACM conference on Computer andcommunications security. ACM, 199–212.

[31] Bruno Rodrigues, Fernando Magno Quintão Pereira, and Diego F Aranha. 2016.Sparse representation of implicit �ows with applications to side-channel detec-tion. In Proceedings of the 25th International Conference on Compiler Construction.ACM, 110–120.

[32] Jicheng Shi, Xiang Song, Haibo Chen, and Binyu Zang. 2011. Limiting cache-based side-channel inmulti-tenant cloud using dynamic page coloring. InDepend-able Systems and Networks Workshops (DSN-W), 2011 IEEE/IFIP 41st InternationalConference on. IEEE, 194–199.

[33] Kuniyasu Suzaki, Kengo Iijima, Toshiki Yagi, and Cyrille Artho. 2011. MemoryDeduplication As a Threat to the Guest OS. In Proceedings of the Fourth EuropeanWorkshop on System Security (EUROSEC ’11). ACM, New York, NY, USA, Article1, 6 pages. https://doi.org/10.1145/1972551.1972552

[34] Laszlo Szekeres, Mathias Payer, Tao Wei, and Dawn Song. 2013. Sok: Eternal warin memory. In Security and Privacy (SP), 2013 IEEE Symposium on. IEEE, 48–62.

[35] Hassan Takabi, James BD Joshi, and Gail-Joon Ahn. 2010. Security and privacychallenges in cloud computing environments. IEEE Security & Privacy 8, 6 (2010),24–31.

[36] Asoke K Talukder, Lawrence Zimmerman, et al. 2010. Cloud economics: Princi-ples, costs, and bene�ts. In Cloud computing. Springer, 343–360.

[37] Venkatanathan Varadarajan, Thomas Ristenpart, and Michael M Swift. 2014.Scheduler-based Defenses against Cross-VM Side-channels.. In Usenix Security.687–702.

[38] Bhanu C Vattikonda, Sambit Das, and Hovav Shacham. 2011. Eliminating �negrained timers in Xen. In Proceedings of the 3rd ACMworkshop on Cloud computingsecurity workshop. ACM, 41–46.

[39] Zhenghong Wang and Ruby B Lee. 2007. New cache designs for thwarting soft-ware cache-based side channel attacks. In ACM SIGARCH Computer ArchitectureNews, Vol. 35. ACM, 494–505.

[40] Zhenghong Wang and Ruby B Lee. 2008. A novel cache architecture withenhanced performance and security. In Microarchitecture, 2008. MICRO-41. 200841st IEEE/ACM International Symposium on. IEEE, 83–93.

[41] Miguel G Xavier, Marcelo V Neves, Fabio D Rossi, Tiago C Ferreto, TimoteoLange, and Cesar AF De Rose. 2013. Performance evaluation of container-based virtualization for high performance computing environments. In Parallel,

10

Distributed and Network-Based Processing (PDP), 2013 21st Euromicro InternationalConference on. IEEE, 233–240.

[42] Meng Xu, Linh TX Phan, Hyon-Young Choi, and Insup Lee. 2017. vCAT: DynamicCache Management Using CAT Virtualization. (2017).

[43] Yuval Yarom and Naomi Benger. 2014. Recovering OpenSSL ECDSA NoncesUsing the FLUSH+ RELOAD Cache Side-channel Attack. IACR Cryptology ePrintArchive 2014 (2014), 140.

[44] Yuval Yarom and Katrina Falkner. 2014. FLUSH+ RELOAD: A High Resolution,Low Noise, L3 Cache Side-Channel Attack.. In USENIX Security, Vol. 2014. 719–732.

[45] Ying Ye, RichardWest, Zhuoqun Cheng, and Ye Li. 2014. Coloris: a dynamic cachepartitioning system using page coloring. In Proceedings of the 23rd internationalconference on Parallel architectures and compilation. ACM, 381–392.

[46] Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Laszlo Szekeres, StephenMcCamant, Dawn Song, and Wei Zou. 2013. Practical control �ow integrity andrandomization for binary executables. In Security and Privacy (SP), 2013 IEEESymposium on. IEEE, 559–573.

[47] Yinqian Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2012. Cross-VM side channels and their use to extract private keys. In Proceedings of the 2012ACM conference on Computer and communications security. ACM, 305–316.

[48] Yinqian Zhang, Ari Juels, Michael K Reiter, and Thomas Ristenpart. 2014. Cross-tenant side-channel attacks in PaaS clouds. In Proceedings of the 2014 ACMSIGSAC Conference on Computer and Communications Security. ACM, 990–1003.

[49] Yulong Zhang, Min Li, Kun Bai, Meng Yu, and Wanyu Zang. 2012. Incentivecompatible moving target defense against vm-colocation attacks in clouds. InIFIP International Information Security Conference. Springer, 388–399.

[50] Yinqian Zhang and Michael K Reiter. 2013. Düppel: Retro�tting commodityoperating systems to mitigate cache side channels in the cloud. In Proceedings ofthe 2013 ACM SIGSAC conference on Computer & communications security. ACM,827–838.

[51] Ziqiao Zhou, Michael K Reiter, and Yinqian Zhang. 2016. A software approachto defeating side channels in last-level caches. In Proceedings of the 2016 ACMSIGSAC Conference on Computer and Communications Security. ACM, 871–882.

11

a novel scheduling framework leveraging hardware cache ...while there exist many isolation...

Documents