memory dos attacks in multi-tenant clouds: severity and ... · attacks with accurate detection and...

Memory DoS Attacks in Multi-tenant Clouds:Severity and Mitigation

Tianwei ZhangPrinceton University

[email protected]

Yinqian ZhangThe Ohio State University

[email protected]

Ruby B. LeePrinceton University

[email protected]

ABSTRACTIn cloud computing, network Denial of Service (DoS) at-tacks are well studied and defenses have been implemented,but severe DoS attacks on a victim’s working memory bya single hostile VM are not well understood. Memory DoSattacks are Denial of Service (or Degradation of Service) at-tacks caused by contention for hardware memory resourceson a cloud server. Despite the strong memory isolation tech-niques for virtual machines (VMs) enforced by the softwarevirtualization layer in cloud servers, the underlying hard-ware memory layers are still shared by the VMs and can beexploited by a clever attacker in a hostile VM co-located onthe same server as the victim VM, denying the victim theworking memory he needs. We first show quantitatively theseverity of contention on different memory resources. Wethen show that a malicious cloud customer can mount low-cost attacks to cause severe performance degradation for aHadoop distributed application, and 38× delay in responsetime for an E-commerce website in the Amazon EC2 cloud.

Then, we design an effective, new defense against thesememory DoS attacks, using a statistical metric to detecttheir existence and execution throttling to mitigate the at-tack damage. We achieve this by a novel re-purposing ofexisting hardware performance counters and duty cycle mod-ulation for security, rather than for improving performanceor power consumption. We implement a full prototype onthe OpenStack cloud system. Our evaluations show that thisdefense system can effectively defeat memory DoS attackswith negligible performance overhead.

1. INTRODUCTIONPublic Infrastructure-as-a-Service (IaaS) clouds provide

elastic computing resources on demand to customers at lowcost. Anyone with a credit card may host scalable applica-tions in these computing environments, and become a ten-ant of the cloud. To maximize resource utilization, cloudproviders schedule virtual machines (VMs) leased by differ-ent tenants on the same physical machine, sharing the samehardware resources.

While software isolation techniques, like VM virtualiza-tion, carefully isolate memory pages (virtual and physical),most of the underlying hardware memory hierarchy is stillshared by all VMs running on the same physical machinein a multi-tenant cloud environment. Malicious VMs canexploit the multi-tenancy feature to intentionally cause se-vere contention on the shared memory resources to conductDenial-of-Service (DoS) attacks against other VMs sharingthe resources. Moreover, it has been shown that a malicious

cloud customer can intentionally co-locate his VMs with vic-tim VMs to run on the same physical machine [36, 39, 46];this co-location attack can serve as a first step for performingmemory DoS attacks against an arbitrary target.

The severity of memory resource contention has been se-riously underestimated. While it is temping to presume thelevel of interference caused by resource contention is mod-est, and in the worst case, the resulting performance degra-dation is isolated on one compute node, we show this is notthe case. We present advanced attack techniques that, whenexploited by malicious VMs, can induce much more intensememory contention than normal applications could do, andcan degrade the performance of VMs on multiple nodes.

To demonstrate that our attacks work on real applicationsin real-world settings, we applied them to two case stud-ies conducted in a commercial IaaS cloud, Amazon ElasticCompute Cloud (EC2). We show that even if the attackeronly has one VM co-located with one of the many VMs ofthe target multi-node application, significant performancedegradation can be caused to the entire application, ratherthan just to a single node. In our first case study, we showthat when the adversary co-locates one VM with one node ofa 20-node distributed Hadoop application, he may cause upto 3.7× slowdown of the entire distributed application. Oursecond case study shows that our attacks can slow down theresponse latency of an E-commerce application (consistingof load balancers, web servers, database servers and memorycaching servers) by up to 38 times, and reduce the through-put of the servers down to 13%.

Despite the severity of the attacks, neither current cloudproviders nor research literature offer any solutions to mem-ory DoS attacks. Our communication with cloud providerssuggests such issues are not currently addressed, in part be-cause the attack techniques presented in this paper are non-conventional, and existing solutions to network-based DDoSattacks do not help. Research studies have not explored de-fenses against adversarial memory contention either. As willbe discussed in Sec. 6.2, existing solutions [13, 19, 47, 53, 55]only aim to enhance performance isolation between benignapplications. Intentional memory abuses that are evident inmemory DoS attacks are immune to these solutions.

Therefore, a large portion of this paper is devoted tothe design and implementation of a novel and effective ap-proach to detect and mitigate all known types of memoryDoS attacks with low-cost overhead. Our detection strat-egy provides a generalized method for detecting deviationsfrom the baseline behavior of the victim VM due to memoryDoS attacks. We collect the baseline behaviors of the mon-

arX

iv:1

603.

0340

4v3

[cs

.DC

] 4

Oct

201

7

itored VM at runtime, by creating a pseudo isolated period,without completely pausing co-tenant VMs. This providesperiodic (re)establishment of baseline behaviors that adaptto changes in program phases and workload characteristics.Once memory DoS attacks are detected, we show how ma-licious VMs can be identified and their attacks mitigated,using a novel form of selective execution throttling.

We implemented a prototype of our defense solution onthe opensource OpenStack cloud software, and extensivelyevaluated its effectiveness and efficiency. Our evaluationshows that we can accurately detect memory DoS attacksand promptly and effectively mitigate the attacks. The per-formance overhead of persistent performance monitoring islower than 5%, which is low enough to be used in produc-tion public clouds. Because our solution does not requiremodifications of CPU hardware, hypervisor or guest oper-ating systems, it minimally impacts the existing cloud im-plementations. Therefore, we envision our solution can berapidly deployed in public clouds as a new security serviceto customers who require higher security assurances (like inSecurity-on-Demand clouds [24,48]).

In summary, the contributions of this paper are:

• A set of attack techniques to perform memory DoS at-tacks. Measurement of the severity of the resulting Degra-dation of Service (DoS) to the victim VM.• Demonstration of the severity of memory DoS attacks

in public clouds (Amazon EC2) against Hadoop applica-tions and E-commerce websites.• A novel, generalizable, attack detection method to detect

abnormal probability distribution deviations at runtime,that adapts to program phase changes and different work-load inputs.• A novel method for detecting the attack VM using selec-

tive execution throttling.• A new, rapidly deployable, defense for all memory DoS

attacks with accurate detection and low-overhead, usingexisting hardware processor features.

We first discuss our threat model and background of mem-ory resources in Sec. 2. Techniques to perform memory DoSattacks are presented in Sec. 3. We show the power of theseattacks in two case studies conducted in Amazon EC2 inSec. 4. Our new defense techniques are described and eval-uated in Sec. 5. We summarize related work in Sec. 6 andconclude in Sec. 7.

2. BACKGROUND

2.1 Threat Model and AssumptionsWe consider security threats from malicious tenants of

public IaaS clouds. We assume the adversary has the abilityto launch at least one VM on the cloud servers on which thevictim VMs are running. Techniques required to do so havebeen studied [36, 39, 46], and are orthogonal to our work.The adversary can run any program inside his own VM. Wedo not assume that the adversary can send network packetsto the victim directly, thus resource freeing attacks [38] ornetwork-based DoS attacks [28] are not applicable. We donot consider attacks from the cloud providers, or any attacksrequiring direct control of privileged software.

We assume the software and hardware isolation mecha-nisms function correctly as designed. A hypervisor virtual-izes and manages the hardware resources (see Figure 1) so

DRAM

Hypervisor

Core 0 Core 1 Core 2

Shared LLC

IMC

Core 0 Core 1 Core 2

Shared LLC

DRAM

Victim VM

Shared by vCPUs 0 and 1 Shared by vCPUs 0, 1 and 2

Attacker VMvCPU 0 vCPU 1 vCPU 2 ……

Scheduler

InterPkg Bus IMC

Hardware

Figure 1: An attacker VM (with 2 vCPUs) and a victim VMshare multiple layers of memory resources.

that each VM thinks it has the entire computer. A servercan have multiple processor packages, where all processorcores in a package share a Last Level Cache (LLC), whileL1 and L2 caches are private to a processor core and notshared by different cores. All processor packages share theIntegrated Memory Controller (IMC), the inter-package busand the main memory storage (DRAM chips). Each VM isdesignated a disjoint set of virtual CPUs (vCPU), which canbe scheduled to operate on any physical cores based on thehypervisor’s scheduling algorithms. A program running ona vCPU may use all the hardware resources available to thephysical core it runs on. Hence, different VMs may simul-taneously share the same hardware caches, buses, memorychannels and DRAM bank buffers. We assume the cloudprovider may schedule VMs from different customers on thesame server (as co-tenants), but likely on different physicalcores. As is the case today, software-based VM isolation bythe hypervisor only isolates accesses to virtual and physicalmemory pages, but not to the underlying hardware memoryresources shared by the physical cores.

2.2 Hardware Memory ResourcesFigure 2 shows the hardware memory resources in mod-

ern computers. Using Intel processors as examples, mod-ern X86-64 processors usually consist of multiple processorpackages, each of which consists of several physical pro-cessor cores. Each physical core can execute one or twohardware threads in parallel with the support of Hyper-Threading Technology. A hierarchical memory subsystem,from the top to the bottom, is composed of different lev-els of storage-based components (e.g., caches, the DRAMs).These memory components are inter-connected by a vari-ety of scheduling-based components (e.g., memory buses andcontrollers), with various schedulers arbitrating their com-munications. Memory resources shared by different coresare described below:

Last Level Caches (LLC). An LLC is shared by all coresin one package (older processors may have one package sup-ported by multiple LLCs). Intel LLCs usually adopt aninclusive cache policy: every cache line maintained in theupper-level caches (i.e., core-private Level 1 and Level 2caches in each core - not shown in Figure 2) also has a copyin the LLC. In other words, when a cache line in the LLC isevicted, so are the copies in the upper-level caches. A subse-quent access to the memory block mapped to this cache linewill result in an LLC miss, which will lead to the much slower

…

… … …

Scheduling-based memory resources:

Storage-based memory resources:

Processor Package

IMC

DRAM

LLC Ring Bus

QuickPath Interconnect Memory Controller Bus Bank Scheduler

Channel Scheduler

DRAM Bus DRAM Bank Buffer

DRAM Banks

Core Core … …

Processor Package Core Core …

… … …

IMC

… … … … … Figure 2: Shared storage-based and scheduling-based hardware memory resources in multi-core cloud servers.

main memory access. On recent Intel processors (since Ne-halem), LLCs are split into multiple slices, each of which isassociated with one physical core, although every core mayuse the entire LLC. Intel employs static hash mapping al-gorithms to translate the physical address of a cache line toone of the LLC slices that contains the cache line. Thesemappings are unique for each processor model and are notreleased to the public. So it is harder for attackers to gen-erate LLC contention using the method from [43].

Memory Buses. Intel uses a ring bus topology to inter-connect components in the processor package, e.g., processorcores, LLC slices, Integrated Memory Controllers (IMCs),QuickPath Interconnect (QPI) agents, etc. The high-speedQPI provides point-to-point interconnections between dif-ferent processor packages, and between each processor pack-age and I/O devices. The memory controller bus connectsthe LLC slices to the bank schedulers in the IMC, and theDRAM bus connects the IMC’s schedulers to the DRAMbanks. Current memory bus designs with high bandwidthmake it very difficult for attackers to saturate the memorybuses. Also, elimination of bus locking operations for nor-mal atomic operations make bus locking attacks via normalatomic operations (e.g., [43]) less effective. However, someexotic atomic bus locking operations still exist.

DRAM banks. Each DRAM package consists of severalbanks, each of which can be thought of as a two dimensionaldata array with multiple rows and columns. Each bank hasa bank buffer to hold the most recently used row to speed upDRAM accesses. A memory access to a DRAM bank mayeither be served in the bank buffer, which is a buffer-hit(fast), or in the bank itself, which is a buffer-miss (slow).

Integrated Memory Controllers (IMC). Each proces-sor package contains one or multiple IMCs. The communica-tions between an IMC and the portion of DRAM it controlsare supported by multiple memory channels, each of whichserves a set of DRAM banks. When the processor wants toaccess the data in the DRAM, it first calculates the bankthat stores the data based on the physical address, thenit sends the memory request to the IMC that controls thebank. The processor can request data from the IMC in thelocal package, as well as in a different package via QPI. TheIMCs implement a bank priority queue for each bank theyserve, to buffer the memory requests to this bank. A bankscheduler is used to schedule requests in the bank priorityqueue, typically using a First-Ready-First-Come-First-Serve

algorithm that gives high scheduling priority to the requestthat leads to a buffer-hit in the DRAM bank buffer, and thento the request that arrived earliest. Once requests are sched-uled by the bank scheduler, a channel scheduler will furtherschedule them, among requests from other bank schedulers,to multiplex the requests onto a shared memory channel.The channel scheduler usually adopts a First-Come-First-Serve algorithm, which favors the earlier requests. ModernDRAM and IMCs can handle a large amount of requestsconcurrently, so the it is less effective to flood the DRAMand IMCs to generate severe contention, as shown in [32].

3. MEMORY DOS ATTACKS

3.1 Fundamental Attack StrategiesWe have classified all memory resources into either storage-

based or scheduling-based resources. This helps us formulatethe following two fundamental attack strategies for memoryDoS attacks:

• Storage-based contention attack. The fundamentalattack strategy to cause contention on storage-based re-sources is to reduce the probability that the victim’s datais found in an upper-level memory resource (faster), thusforcing it to fetch the data from a lower-level resource(slower).• Scheduling-based contention attack. The funda-

mental attack strategy on a scheduling-based resource isto decrease the probability that the victim’s requests areselected by the scheduler, e.g., by locking the scheduling-based resources temporarily, tricking the scheduler to im-prove the priority of the attacker’s requests, or overwhelm-ing the scheduler by submitting a huge amount of requestssimultaneously.

We systematically show how memory DoS attacks can beconstructed on different layers of memory resources (LLC inSec. 3.2, bus in Sec. 3.3, memory controller and DRAM inSec. 3.4). For each memory component, we first study thebasic techniques the attacker can use to generate resourcecontention and affect the victim’s performance. We mea-sure the effectiveness of the attack techniques on the victimVM with different vCPU locations and program features.Then we propose some practical attacks and evaluate theirimpacts on real-world benchmarks.

Testbed configuration. To demonstrate the severity ofdifferent types of memory DoS attacks, we use a server con-

figuration, representative of many cloud servers, configuredas shown in Table 1. We use Intel processors, since they arethe most common in cloud servers, but the attack methodswe propose are general, and applicable to other processorsand platforms as well.

Table 1: Testbed Configuration

Server Dell PowerEdge R720Processor Packages Two 2.9GHz Intel Xeon E5-2667 (Sandy Bridge)Cores per Package 6 physical cores, or 12 hardware threads with Hyper-ThreadingCore-private L1 I and L1 D: each 32KB, 8-way set-associative;Level 1 and Level 2 caches L2 cache: 256KB, 8-way set-associativeLast Level Cache (LLC) 15MB, 20-way set-associative, shared by cores in package, divided

into 6 slices of 2.5MB each; one slice per corePhysical memory Eight 8GB DRAMs, divided into 8 channels, and 1024 banks

Hypervisor Xen version 4.1.0VM’s OS Ubuntu 12.04 Linux, with 3.13 kernel

In each of the following experiments, we launched twoVMs, one as the attacker and the other as the victim. Bydefault, each VM was assigned a single vCPU. We select amix set of benchmarks for the victim: (1) We use a modi-fied stream program [31,32] as a micro benchmark to explorethe effectiveness of the attacks on victims with different fea-tures. This program allocates two array buffers with thesame size, one as the source and the other as the destina-tion. It copies data from the source to the destination inloops repeatedly, either in a sequential manner (resulting aprogram with high memory locality) or in a random manner(low memory locality). We chose this benchmark because itis memory-intensive and allows us to alter the size of mem-ory footprints and the locality of memory resources. (2) Tofully evaluate the attack effects on real-world applications,we choose 8 macro benchmarks (6 from SPEC2006 [10] and2 from PARSEC [16]) and cryptographic applications basedon OpenSSL as the victim program. Each experiment wasrepeated 10 times, and the mean values and standard devi-ations are reported.

3.2 Cache Contention (Storage Resources)Of the storage-based contention attacks, we found that

the LLC contention results in the most severe performancedegradation. The root vulnerability is that an LLC is sharedby all cores of the same CPU package, without access con-trol or quota enforcement. Therefore a program in one VMcan evict LLC cache lines belonging to another VM. More-over, inclusive LLCs (e.g., most modern Intel LLCs) willpropagate these cache line evictions to core-private L1 andL2 caches, further aggravating the interference between pro-grams (or VMs) in CPU caches.

3.2.1 Contention StudyCache cleansing. To cause LLC contention, the adver-sary can allocate a memory buffer to cover the entire LLC.By accessing one memory address per memory block in thebuffer, the adversary can cleanse the entire cache and evictall of the victim’s data from the LLC to the DRAM.

The optimal buffer used by the attacker should exactlymap to the LLC, which means it can fill up each cache setin each LLC slice without self-conflicts (i.e., evicting earlierlines loaded from this buffer). For example, for a LLC withns slices, nc sets in each slice, and nw-way set-associativity,the attacker would like ns×nc×nw memory blocks to coverall cache lines of all sets in all slices. There are two chal-lenges that make this task difficult for the attacker: the hostphysical addresses of the buffer to index the cache slice are

unknown to the attacker, and the mapping from physicalmemory addresses to LLC slices is not publicly known.

Mapping LLC cache slices: To overcome these challenges,the attacker can first allocate a 1GB Hugepage which is guar-anteed to have continuous host physical addresses; thus heneed not worry about virtual to physical page translationswhich he does not know. Then for each LLC cache set Si inall slices, the attacker sets up an empty group Gi, and startsthe following loop: (i) select block Ak from the Hugepage,which is mapped to set Si by the same index bits in thememory address; (ii) add Ak to Gi; (iii) access all the blocksin Gi; and (iv) measure the average access latency per block.A longer latency indicates block Ak causes self-conflict withother blocks in Gi, so it is removed from Gi. The above loopis repeated until there are ns × nw blocks in Gi, which canexactly fill up set Si in all slices. Next the attacker needs todistinguish which blocks in Gi belong to each slice: He firstselects a new block An mapped to set Si from the Hugepage,and adds it to Gi. This should cause a self-conflict. Thenhe executes the following loop: (i) select one block Am fromGi; (ii) remove it from Gi; (iii) access all the blocks in Gi;and (iv) measure the average access latency. A short latencyindicates block Am can eliminate the self-conflict caused byAn, so it belongs to the same slice as An. The attacker keepsdoing this until he discovers nw blocks that belong to thesame slice as An. These blocks form the group that can fillup set Si in one slice. The above procedure is repeated tillthe blocks in Gi are divided into ns slices for set Si. Afterconducting the above process for each cache set, the attackerobtains a memory buffer with non-consecutive blocks thatmap exactly to the LLC. Such self-conflict elimination is alsouseful in improving side-channel attacks [27].

To test the effectiveness of cache cleansing, we arrangedthe attacker VM and victim VM on the same processor pack-age, thus sharing the LLC and all memory resources in lowerlayers. The adversary first identified the memory buffer thatmaps to the LLC. Then he cleansed the whole LLC repeat-edly. The resulting performance degradation of the victimapplication is shown in Figure 3. The victim suffered fromthe most significant performance degradation when the vic-tim’s buffer size is around 10MB (1.8× slowdown for thehigh locality program, and 5.5× slowdown for the low lo-cality program). When the buffer size is smaller than 5MB(data are stored mainly in upper-level caches), or the size islarger than 25MB (data are stored mainly in the DRAM),the impact of cache contention on LLC is negligible.

H - 1 M H - 5 M H - 1 0 M H - 1 5 M H - 2 0 M H - 2 5 M H - 3 0 M L - 1 M L - 5 M L - 1 0 M L - 1 5 M L - 2 0 M L - 2 5 M L - 3 0 M0123456

Norm

alize

d exe

cutio

n tim

e

Figure 3: Performance slowdown due to LLC cleansing con-tention. We use“H-x”or“L-x”to denote the victim programhas high or low memory locality and has a buffer size of x.

The results can be explained as follows: the maximumperformance degradation can be achieved on victims withmemory footprint smaller than, but close to, the LLC size,which is 15MB, because the victim suffers the least fromself conflicts in LLC and the most from the attacker’s LLCcleansing. Moreover, as a low locality program accesses itsdata in a random order, hardware prefetching is less effectivein enhancing the program’s access speed. So the programaccesses the cache at a relatively lower rate. Its data will beevicted out of the LLC by the attacker with higher proba-bility. That is why the LLC cleansing has a larger impacton low locality programs than on high locality programs.

Takeaways. LLC contention is (more) effective when (1)the attacker and victim VMs share the same LLC, (2) thevictim program’s memory footprint is about the size of LLC,and (3) the victim program has lower memory locality.

3.2.2 Practical Attack EvaluationWe improve this attack by increasing the cleansing speed,

and the accuracy of evicting (thus contending with) the vic-tim’s data.

Multi-threaded LLC cleansing. To speed up the LLCcleansing, the adversary may split the cleansing task into nthreads, with each running on a separate vCPU and cleans-ing only a non-overlapping 1/n of the LLC simultaneously.This effectively increases the cleansing speed by n times.

In our experiment, the attacker VM and the victim VMwere arranged to share the LLC. The attacker VM was as-signed 4 vCPUs. It first prepared the memory buffer thatexactly mapped to the LLC. Then he cleansed the LLCwith (1) one vCPU; (2) 4 vCPUs (each cleansing 1/4 of theLLC). Figure 4 shows that the attack can cause 1.05 ∼ 1.6×slowdown to the victim VM when using one thread, and1.12 ∼ 2.03× slowdown when using four threads.

m c fg o b m k

o m n e t p p a s t a rs o p l e x l b m

c a n n e a l

s t r e a m c l u s t e r0 . 60 . 81 . 01 . 21 . 41 . 61 . 82 . 02 . 2

Norm

alize

d exe

cutio

n tim

e L L C c l e a n s i n g ( 1 t h r e a d ) L L C c l e a n s i n g ( 4 t h r e a d s )

Figure 4: Performance slowdown due to multi-threaded LLCcleansing attack

Adaptive LLC cleansing. The basic LLC cache cleansingtechnique does not work when the victim’s program has amemory footprint (<1MB) that is much smaller than anLLC (e.g., 15MB), since it takes a long time to finish onecomplete LLC cleansing, where most of the memory accessesdo not induce contention with the victim. To achieve finer-grained attacks, we developed a cache probing technique topinpoint the cache sets in the LLC that map to the victim’smemory footprint, and cleanse only these selected sets.

The attacker first allocates a memory buffer covering theentire LLC in his own VM. Then he conducts cache prob-ing in two steps: (1) In the Discover Stage, while the

Algorithm 1: Adaptive LLC cleansing

Input:cache set{}: all the sets in the LLCcache buffer{}: cover the entire LLCcache assoc num: the associativity of LLC

begin/* Discover Stage */victim set=∅for each set i in cache set{} do

Find out j, s.t., accessing j cache lines in set i fromcache buffer{} has no cache conflict (low accessingtime), but accessing j+1 cache lines in set i fromcache buffer{} has cache conflict (high accessing time)

if j<cache assoc num thenadd i to victim set{}

end

end

/* Attack Stage: */while attack is not finished do

for each set i in victim set{} doaccess cache assoc num cache lines in set i from

cache buffer{}end

end

end

victim program runs, for each cache set, the attacker ac-cesses some cache lines belonging to this set and figures outthe maximum number of cache lines which can be accessedwithout causing cache conflicts. If this number is smallerthan the set associativity, this cache set will be selected toconduct adaptive cleansing attacks, because the victim hasfrequently occupied some cache lines in this set; (2) In theAttack Stage, the attacker keeps accessing these selectedcache sets to cleanse the victim’s data. Algorithm 1 showsthe steps to perform the adaptive LLC cleansing.

Figure 5 shows the results of the attacker’s multi-threadedadaptive cleansing attacks against victim applications withcryptographic operations. While the basic cleansing did nothave any effect, the adaptive attacks can achieve around 1.12to 1.4 times runtime slowdown with 1 vCPU, and up to 4.4×slowdown with 4 vCPUs.

A E S

B L O W F I S H R C 4R S A D S A

H M A C M D 5S H A

0

1

2

3

4

5

6

Norm

alize

d exe

cutio

n tim

e C o m p l e t e L L C c l e a n s i n g ( 1 t h r e a d ) C o m p l e t e L L C c l e a n s i n g ( 4 t h r e a d s ) A d a p t i v e L L C c l e a n s i n g ( 1 t h r e a d ) A d a p t i v e L L C c l e a n s i n g ( 4 t h r e a d s )

Figure 5: Performance slowdown due to adaptive LLCcleansing attacks

3.3 Bus Contention (Scheduling Resources)The availability of internal memory buses can be com-

promised by overwhelming or temporarily locking down thebuses. We study the effects of these techniques.

3.3.1 Contention Study

H - 1 c H - 2 c H - 3 c H - 4 c H - 5 c L - 1 c L - 2 c L - 3 c L - 4 c L - 5 c0 . 7

0 . 8

0 . 9

1 . 0

1 . 1

No

rmali

zed e

xecu

tion t

ime

(a) Same package

H - 1 c H - 2 c H - 3 c H - 4 c H - 5 c L - 1 c L - 2 c L - 3 c L - 4 c L - 5 c0 . 7

0 . 8

0 . 9

1 . 0

1 . 1

Norm

alize

d exe

cutio

n tim

e(b) Different packages

Figure 6: Performance slowdown due to bus saturation con-tention. We use“H-xc”or“L-xc” to denote the configurationthat the victim program has high or low locality, and bothof the attacker and victim use x cores to contend for thebus.

Bus saturation. One intuitive approach for an adversaryis to create numerous memory requests to saturate the buses[43]. However, the bus bandwidth in modern processors maybe too high for a single VM to saturate.

To examine the effectiveness of bus saturation contention,we conducted two sets of experiments. In the first set of ex-periments, the victim VM and the attacker VM were locatedin the same processor package but on different physical cores(Figure 6a). They accessed different parts of the LLC, with-out touching the DRAM. Therefore the attacker VM causescontention in the ring bus that connects LLC slices withoutcausing contention in the LLC itself. In the second set of ex-periments, the victim VM and the attacker VM were pinnedon different processor packages (Figure 6b). They accesseddifferent memory channels, without inducing contention inthe memory controller and DRAM modules. Therefore theattacker and victim VMs only contend in buses that connectLLCs and IMCs, as well as the QPI buses. The attacker andvictim were assigned increasing number of vCPUs to causemore bus traffic. Results in Figure 6 show that these buseswere hardly saturated and the impact on the victim’s per-formance was negligible in all cases.

Bus locking. To deny the victim from being scheduled bya scheduling resource, the adversary can temporarily lockdown the internal memory buses. Intel processors providelocked atomic operations for managing shared data struc-tures between multi-processors [6]. Before Intel Pentium(P5) processors, the locked atomic operations always gener-ate LOCK signals on the internal buses to achieve operationatomicity. So other memory accesses are blocked until thelocked atomic operation is completed. For processor fami-lies after P6, the bus lock is transformed into a cache lock:the cache line is locked instead of the bus and the cachecoherency mechanism is used to ensure operation atomicity.This causes much smaller scheduling lockdown times.

However, we have found two exotic atomic operations theadversary can still use to lock the internal memory buses:(1) Locked atomic accesses to unaligned memory blocks: theprocessor has to fetch two adjacent cache lines to completethis unaligned memory access. To guarantee the atomic-ity of accessing the two adjacent cache lines, the processorswill flush in-flight memory accesses issued before, and blockmemory accesses to the bus, until the unaligned memory ac-

cess is finished. (2) Locked atomic accesses to uncacheablememory blocks: when uncached memory pages are accessedin atomic operations, the cache coherency mechanism doesnot work. Hence, the memory bus must be locked to guar-antee atomicity. Listings 1 and 2 show the codes for issuingunaligned and uncached atomic operations. The two pro-grams keep conducting the addition operation of a constant(x) and a memory block (block_addr) (line 5 – 11): in line7, the lock prefix indicates this operation is atomic. The in-struction xaddl indicates this is an addition operation. Thefirst operand is the register eax, which stores x (line 9). Thesecond operand is the first parameter of line 9 (data denotedby the address block_addr). The results will be loaded tothe register eax (line 8). In Listing 1, we set this memoryblock as unaligned (line 4). In Listing 2, we added a newsystem call to set the page table entries of the memory bufferas cache disabled (line 2).

To evaluate the effects of bus locking contention, we chosethe footprint size of the victim program as (1) 8KB, withwhich the L1 cache was under-utilized, (2) 64KB, with whichthe L1 cache was over-utilized but the L2 cache was under-utilized, (3) 512KB, with which the L2 cache was over-utilized but the LLC was under-utilized, and (4) 30MB, withwhich the LLC was over-utilized. The attacker VM kept is-suing unaligned atomic or uncached atomic memory accessesto lock the memory buses. For comparison, we also run an-other group of experiments, where the attacker kept issuingnormal locked memory accesses. We considered two scenar-ios: (1) the attacker and victim shared the same processorpackage, but run on different cores; (2) they were scheduledon different processor packages. The normalized executiontime of the victim program is shown in Figure 7.

We observe that the victim’s performance was significantlyaffected when the its buffer size was larger than the L2caches. This is because the attacker who kept requestingatomic, unaligned memory accesses was only able to lockthe buses within its physical cores, the ring buses aroundthe LLCs in each package, the QPI, and the buses fromeach package to the DRAM. So when the victim’s buffersize was smaller than LLC, it fetched data from the privatecaches in its own core without being affected by the attacker.However, when the victim’s buffer size was larger than theL2 caches, its access to the LLC would be delayed by thebus locking operations, and the performance is degraded (upto 6× slowdown for high locality victim programs and 7×slowdown for low locality victim programs).

Takeaways. We explored two approaches to bus contention.Saturating internal buses is unlikely to cause noticeable per-formance degradation. Bus locking shows promise when thevictim program makes heavy use of the shared LLC or lowerlayer memory resources, whenever the victim VM and at-tacker VM are on the same processor package or differentpackages.

3.3.2 Practical Attack EvaluationTo evaluate the effectiveness of atomic locking attacks on

real-world applications, we scheduled the attacker VM andvictim VM on different processor packages. The attackerVM kept generating atomic locking signals by (1) request-ing unaligned atomic memory accesses, or (2) requesting un-cached atomic memory accesses. The normalized executiontime of the victim program is shown in Figure 8. We observethat the victim’s performance can be degraded as much as 7

Listing 1: Attack using unaligned atomic operations

1 char *buffer = mmap(0, BUFFER_SIZE,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

2

3 int x = 0x0;4 int *block_addr = (int *)(buffer+CACHE_LINE_SIZE-1);5 while (1) {6 __asm__(7 "lock; xaddl %%eax, %1\n\t"8 :"=a"(x)9 :"m"(*block_addr), "a"(x)

10 :"memory");11 }

Listing 2: Attack using uncached atomic operations

1 char *buffer = mmap(0, BUFFER_SIZE,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);

2 syscall(__NR_UnCached, (unsigned long)buffer);3 int x = 0x0;4 int *block_addr = (int *)buffer;5 while (1) {6 __asm__(7 "lock; xaddl %%eax, %1\n\t"8 :"=a"(x)9 :"m"(*block_addr), "a"(x)

10 :"memory");11 }

H - 8 K H - 6 4 K H - 5 1 2K

H - 3 0 M L - 8 K L - 6 4 K L - 5 1 2K

L - 3 0 M0

2

4

6

8

Norm

alize

d exe

cutio

n tim

e N o r m a l a t o m i c a c c e s s U n a l i g n e d a t o m i c a c c e s s U n c a c h e d a t o m i c a c c e s s

(a) Same package

H - 8 K H - 6 4 K H - 5 1 2K

H - 3 0 M L - 8 K L - 6 4 K L - 5 1 2K

L - 3 0 M02468

1 0

Norm

alize

d exe

cutio

n tim

e N o r m a l a t o m i c a c c e s s U n a l i g n e d a t o m i c a c c e s s U n c a c h e d a t o m i c a c c e s s

(b) Different packages

Figure 7: Performance slowdown due to bus locking contention. We use “H-x” or “L-x” to denote the victim program has highor low memory locality and has a buffer size of x.

times when the attacker conducted exotic atomic operations.

m c fg o b m k


c a n n e a l

s t r e a m c l u s t e r0

2

4

6

8

1 0

Norm

alize

d exe

cutio

n tim

e U n a l i g n e d a t o m i c a c c e s s U n c a c h e d a t o m i c a c c e s s

Figure 8: Performance slowdown due to bus locking attacks.

3.4 Memory Contention (Combined Resources)An IMC uses the bank scheduler and channel scheduler to

select the memory requests for each DRAM access. There-fore an adversary may contend on these two schedulers byfrequently issuing memory requests that result in bank bufferhits to boost his priority in the scheduler. Moreover, eachmemory bank is equipped with only one bank buffer to holdthe recently used bank row, so the adversary can easily in-duce storage-based contention on bank buffers by frequentlyoccupying them with his own data.

3.4.1 Contention StudyMemory flooding. Since channel and bank schedulersuse First-Come-First-Serve policies, an attacker can send alarge amount of memory requests to flood the target mem-ory channels or DRAM banks. These requests will contendon the scheduling-based resources with the victim’s memoryrequests. In addition, the attacker can issue requests in se-quential order to achieve high row-hit locality and thus highpriority in the bank scheduler, to further increase the effectof flooding. Furthermore, when the adversary keeps floodingthe IMCs, these memory requests can also evict the victim’sdata out of the DRAM bank buffers. The victim’s bankbuffer hit rate is decreased and its performance is furtherdegraded.

To demonstrate the effects of DRAM contention, we con-figure one attacker VM to operate a memory flooding pro-gram, which kept accessing memory blocks in the same DRAMbank directly without going through caches (i.e., uncachedaccesses). The victim VM did exactly the same with eitherhigh or low memory locality. We conducted two sets of ex-periments: (1) The two VMs access the same bank in thesame channel (Same bank in Figure 9); (2) the two VMs ac-cess two different banks in the same channel (Same channelin Figure 9). To alter the memory request rate issued bythe two VMs, we also changed the number of vCPUs in theattacker and victim VMs. The normalized execution timeof the victim program is shown in Figure 9.

Three types of contention were observed in these exper-iments. First, channel scheduling contention was observed

when the attacker and the victim access different banks inthe same channel. It was enhanced with increased numberof attacker and victim vCPUs, thus increasing the memoryrequest rate (around 1.2× slowdown for “H-5c” and “L-5c”).Second, bank scheduling contention was also observed whenthe attacker and victim accessed the same DRAM bank.When the memory request rate was increased, the victim’sperformance was further degraded by an additional 70% and25% for “H-5c” and “L-5c”, respectively. Third, contentionin DRAM bank buffers was observed when we compare theresults of “Same bank” in Figure 9 between high localityand low locality victim programs —low locality victims al-ready suffer from row-misses and the additional performancedegradation in high locality victims is due to bank buffercontention (1.9× slowdown for H 5c verses 1.45× slowdownfor L 5c).

H - 1 c H - 2 c H - 3 c H - 4 c H - 5 c L - 1 c L - 2 c L - 3 c L - 4 c L - 5 c0 . 7 51 . 0 01 . 2 51 . 5 01 . 7 52 . 0 0

Norm

alize

d exe

cutio

n tim

e S a m e c h a n n e l S a m e b a n k

Figure 9: Performance degradation due to memory channeland bank contention. We use “H-xc” or “L-xc” to denotethe configuration that the victim program has high or lowlocality, and both of the attacker and victim use x cores tocontend for the bus.

We consider the overall effect of memory flooding con-tention. In this experiment, the victim VM runs a highlocality or low locality stream benchmark on its only vCPU.The attacker VM allocates a memory buffer with the size20× that of the LLC and runs a stream program whichkeeps accessing memory blocks sequentially in this buffer togenerate contention in every channel and every bank. To in-crease bus traffic, the attacker employed multiple vCPUs toperform the attack simultaneously. The performance degra-dation, as we can see in Figure 10, was significant whenthe victim’s memory accesses footprint was mostly in theDRAM, and more vCPUs of the attacker VM were used inthe attack. The attacker can use 8 vCPUs to induce about1.5× slowdown to the victim with the buffer size larger thanLLC.

Takeaways. Contention can be induced in channel sched-ulers, bank schedulers and bank buffers between differentprograms from different processor packages. This contentionis especially significant when the victim program’s memoryfootprint is larger than the LLC.

3.4.2 Practical Attack EvaluationWe evaluate two advanced memory flooding attacks.

Multi-threaded memory flooding. The attacker canuse more threads to increase the memory flooding speed, aswe demonstrated in the previous section. We evaluated thisattack on real-world applications. The attacker and victimVMs are located in two different processor packages, so they

H - 8 K H - 6 4 K H - 5 1 2K

H - 3 0 M L - 8 K L - 6 4 K L - 5 1 2K

L - 3 0 M0 . 60 . 81 . 01 . 21 . 41 . 6

Norm

alize

d exe

cutio

n tim

e 1 v C P U 2 v C P U s 4 v C P U s 8 v C P U s

Figure 10: Performance degradation due to memory floodingcontention. We use “H-x” or “L-x” to denote the victimprogram has high or low memory locality and has a buffersize of x.

m c fg o b m k


c a n n e a l

s t r e a m c l u s t e r0 . 80 . 91 . 01 . 11 . 21 . 31 . 41 . 51 . 6

Norm

alize

d exe

cutio

n tim

e C o m p l e t e m e m o r y f l o o d i n g ( 8 t h r e a d s ) A d a p t i v e m e m o r y f l o o d i n g ( 8 t h r e a d s )

Figure 11: Performance overhead due to multi-threaded andadaptive memory flooding attacks.

only share the IMCs and DRAM. The attacker VM issuesfrequent, highly localized memory requests to flood everyDRAM bank and every channel. To increase bus traffic, theattacker employed 8 vCPUs to perform the attack simulta-neously. Figure 11 shows that the victim experiences up toa 1.22× runtime slowdown when the attacker uses 8 vCPUsto generate contention (Complete Memory Flooding bars).

Adaptive memory flooding. For a software programwith smaller memory footprint, only a few memory chan-nels will be involved in its memory accesses. We developed anovel approach with which an adversary may identify mem-ory channels that are more frequently used by a victim pro-gram. To achieve this, the attacker needs to reverse engineerthe unrevealed algorithms that map the physical memoryaddresses to memory banks and channels, in order to accu-rately direct the flows of the memory request flood.

Mapping DRAM banks and channels: The attacker canleverage methods due to Liu et al. [29] to identify the bitsin physical memory addresses that index the DRAM banks.The attacker first allocates a 1GB Hugepage with contin-uous physical addresses, which avoids the unknown trans-lations from guest virtual addresses to machine physicaladdresses. Then he selects two memory blocks from theHugepage whose physical addresses differ in only one bit.He then flushes these two blocks out of caches and accessesthem from the DRAM alternatively. A low latency indicatesthese two memory blocks are served in two banks as there isno contention on bank buffers. In this way, the attacker is

able to identify all the bank bits. Next, the attacker needsto identify the channel bits among the bank bits. We designa novel algorithm which is shown in Algorithm 2 to achievethis goal. The attacker selects two groups of memory blocksfrom the Hugepage, whose bank indexes differ in only onebit. The attacker then allocates two threads to access thetwo groups simultaneously. If the different bank index bit isalso a channel index bit, then the two groups will be in twodifferent channels, and a shorter access time will be observedsince there is no channel contention.

Algorithm 2: Discovering channel index bits

Input:bank bit{} // bank index bitsmemory buffer{} // a memory buffer

Output:channel bit{}

beginchannel bit{}=∅for each bit i ∈ bank bit{} do

buffer A{}=memory buffer{}buffer B{}=memory buffer{}for each memory block d a ∈ buffer A{} do

m a = physical address of d aif (m a’s bit i) 6= 0 then

delete d a from buffer A{}break

endfor each bit j ∈ bank bit{} and i 6= j do

if (m a’s bit j ) 6= 0 thendelete d a from buffer A{}break

end

end

endfor each memory block d b ∈ buffer B{} do

m b = physical address of d bif (m b’s bit i) 6= 1 then

delete d b from buffer B{}break

endfor each bit j ∈ bank bit{} and i 6= j do

if (m b’s bit j ) 6= 0 thendelete d b from buffer B{}break

end

end

endthread A: // access buffer A in an infinite loopwhile (true) do

for each memory block d a ∈ buffer A{} doaccess d a (uncached)

end

endthread B:// access buffer B N times and measure timefor i=0 to N-1 do

for each memory block d b ∈ buffer B{} doaccess d b (uncached)

end

endtotal time = thread B’s execution time;if total time<Threshold then

add i to channel bit{}end

endreturn channel bit{}

end

Then the attacker performs two stages: in the DiscoverStage, the attacker keeps accessing each memory channelfor a number of times and measures his own memory ac-cess time to infer contention from the victim program. Byidentifying the channels with a longer access time, the at-tacker can detect which channels are heavily used by thevictim. Note that the attacker only needs to discover the

Algorithm 3: Adaptive memory flooding

Input:memory channel{}: all the channels in the memorymemory buffer{}

begin/* Discover Stage */victim channel=∅for each channel i in memory channel{} do

access the addresses belonging to channel i frommemory buffer{}, and measure the total time (repeatfor a number of times)

if total time is high thenadd i to victim channel{}

end

end

/* Attack Stage */while attack is not finished do

for each channel i in victim channel{} doaccess the addresses belonging to channel i from

memory buffer{}end

end

end

channels used by the victim, but does not need to know theexact value of channel index bits for a given channel. Inthe Attack Stage, the attacker floods these selected mem-ory channels. Algorithm 3 shows the two steps to conductadaptive memory flooding attacks.

Figure 11 shows the results when the attacker VM uses8 vCPUs to generate contention in selected memory chan-nels which are heavily used by the victim. These adap-tive memory flooding attacks cause 3% ∼ 44% slowdownwhile indiscriminately flooding the entire memory causesonly 0.07 ∼ 22% slowdown.

4. CASE STUDIES IN AMAZON EC2We now evaluate our memory DoS attacks in a real cloud

environment, Amazon EC2. We provide two case studies:memory DoS attacks against distributed applications, andagainst E-Commerce websites.

Legal and ethical considerations. As our attacks onlyinvolve memory accesses within the attacker VM’s own ad-dress space, the experiments we conducted in this sectionconformed with EC2 customer agreement. Nevertheless, weput forth our best efforts in reducing the duration of theattacks to minimally impact other users in the cloud.

VM configurations. We chose the same configuration forthe attacker and victim VMs: t2.medium instances with 2vCPUs, 4GB memory and 8GB disk. Each VM ran UbuntuServer 14.04 LTS with Linux kernel version 3.13.0-48-generic,in full virtualization mode. All VMs were launched in theus-east-1c region. Information exposed through lscpu indi-cated that these VMs were running on 2.5GHz Intel XeonE5-2670 processors, with a 32KB L1D and L1I cache, a256KB L2 cache, and a shared 25MB LLC.

For all the experiments in this section, the attacker em-ploys exotic atomic locking (Sec. 3.3) and LLC cleansingattacks (Sec. 3.2), where each of the 2 attacker vCPUs wasused to keep locking the memory and cleansing the LLC.Memory contention attacks (Section 3.4) are not used sincethey cause much lower performance degradation (availabil-ity loss) to the victim.

VM co-location in EC2. The memory DoS attacks re-quire the attacker and victim VMs to co-locate on the same

machine. Past work [36, 39, 46] have proven the feasibil-ity of such co-location attacks in public clouds. While cloudproviders adopt new technologies (e.g., Virtual Private Cloud[3]) to mitigate prior attacks in [36], new ways are discov-ered to test and detect co-location in [39, 46]. Specifically,Varadarajan et al. [39] achieved co-location in Amazon EC2,Google Compute Engine and Microsoft Azure with low-cost(less than $8) in the order of minutes. They verified co-location with various VM configurations, launch delay be-tween attacker and victim, launch time of day, datacenterlocation, etc.. Xu et al . [46] used similar ideas to achieveco-location in EC2 Virtual Private Cloud. We also appliedthese techniques to achieve co-location in Amazon EC2. Inour experiments, we simultaneously launched a large numberof attacker VMs in the same region as the victim VM. A ma-chine outside EC2 under our control sent requests to staticweb pages hosted in the target victim VM. Each time we se-lect one attacker VM to conduct memory DoS attacks andmeasure the victim VM’s response latency. Delayed HTTPresponses from the victim VM indicates that this attackerwas sharing the machine with the victim.

4.1 Attacking Distributed ApplicationsWe evaluate memory DoS attacks on a multi-node dis-

tributed application deployed in a cluster of VMs, whereeach VM is deployed as one node. We show how much per-formance degradation an adversary can induce to the victimcluster with minimal cost, using a single co-located attackerVM.

Experiment settings. We used Hadoop as the victimsystem. Hadoop consists of two layers: MapReduce for dataprocessing, and Hadoop Distributed File System (HDFS)for data storage. A Hadoop cluster includes a single masternode and multiple slave nodes. The master node acts asboth the Job Tracker for scheduling map or reduce jobs andthe NameNode for hosting HDFS indexes. Each slave nodeacts as both the Task Tracker for conducting the map or re-duce operations and the DataNode for storing data blocks inHDFS. We deployed the Hadoop system with different num-bers of VMs (5, 10, 15 or 20), where one VM was selectedas the master node and the rest were the slave nodes.

The attacker only used one VM to attack the cluster. Heeither co-located the malicious VM with the master node orone of the slave nodes. We ran four different Hadoop bench-marks to test how much performance degradation the singleattacker VM can cause to the Hadoop cluster. Each experi-ment was repeated 5 times. Figure 12 shows the mean valuesof normalized execution time and one standard deviation.

MRBench: This benchmark tests the performance of theMapReduce layer of the Hadoop system: it runs a smallMapReduce job of text processing for a number of times.We set the number of mappers and reducers as the numberof slave nodes for each experiment. Figure 12a shows thatattacking a slave node is more effective since the slave nodeis busy with the map and reduce tasks. In a large Hadoopcluster with 20 nodes, attacking just one slave node intro-duces 2.5× slowdown to the entire distributed system.

TestDFSIO: We use TestDFSIO to evaluate HDFS per-formance. This benchmark writes and reads files stored inHDFS. We configured it to operate on n files with the size of500MB, where n is the number of slave nodes in the Hadoopcluster. Figure 12b shows that attacking the slave node iseffective: the adversary can achieve about 2× slowdown.

NNBench: This program is also used to benchmark HDFSin Hadoop. It generates HDFS-related management requestson the master node of HDFS. We configured it to operateon 200n small files, where n is the number of slave nodes inthe Hadoop cluster. Since the master node is heavily usedfor serving the HDFS requests, attacking the master nodecan introduce up to 3.4× slowdown to the whole Hadoopsystem, as shown in Figure 12c.

Terasort: We use this benchmark to test the overall perfor-mance of both MapReduce and HDFS layers in the Hadoopcluster. TeraSort generates a large set of data and usesmap/reduce operations to sort the data. For each experi-ment, we set the number of mappers and reducers to n, andthe size of data to be sorted to 100n MB, where n is thenumber of slave nodes in the Hadoop cluster. Figure 12dshows that attacking the slave node is very effective: it canbring 2.8 ∼ 3.7 × slowdown to the entire Hadoop system.

Summary. The adversary can deny working memory avail-ability to the victim VM and thus degrade an importantdistributed system’s performance with minimal costs: it canuse just one VM to interfere with one of 20 nodes in the largecluster. The slowdown of a single victim node can cause upto 3.7× slowdown to the whole system.

4.2 Attacking E-Commerce WebsitesA web application consists of load balancers, web servers,

database servers and memory caching servers. Memory DoSattacks can disturb an E-commerce web application by at-tacking various components.

Experiment settings. We chose a popular open sourceE-commerce web application, Magento [7], as the target ofthe attack. The victim application consists of five VMs:a load balancer based on Pound for balancing network re-quests; two Apache web servers to process and deliver webrequests; a MySQL database server to store customer andmerchandise information; and a Memcached server to speedup database transactions. The five VMs were hosted ondifferent cloud servers in EC2. The adversary is able toco-locate his VMs with one or multiple VMs that host thevictim application. We measure the application’s latencyand throughput to evaluate the effectiveness of the attack.

Latency. We launched a client on a local machine outsideof EC2. The client employed httperf [12] to send HTTP re-quests to the load balancer with different rates (connectionsper second) and we measured the average response time. Weevaluated the attack from one or all co-located VMs. Eachexperiment was repeated 10 times and the mean and stan-dard deviation of the latency are reported in Figure 13a.This shows that memory contention on database, load bal-ancer or memcached servers do not have much impact onthe overall performance of the web application, with only upto 2× degradation. This is probably because these serverswere not heavily used in these cases. Memory DoS attackson web servers were the most effective (17× degradation).When the adversary can co-locate with all victim serversand each attacker VM induces contention with the victim,the web server’s HTTP response time was delayed by 38×,for a request rate of 50 connections per second.

Server throughput. Figure 13b shows the results of an-other experiment, where we measured the throughput ofeach victim VM individually, under memory DoS attacks.We used ApacheBench [1] to evaluate the load balancer

5 1 0 1 5 2 00 . 00 . 51 . 01 . 52 . 02 . 53 . 03 . 54 . 04 . 5

No

rmali

zatio

n exe

cutio

n tim

e

# o f n o d e s

A t t a c k i n g t h e m a s t e r A t t a c k i n g o n e s l a v e

(a) MRBench

5 1 0 1 5 2 00 . 00 . 51 . 01 . 52 . 02 . 53 . 03 . 54 . 04 . 5

Norm

aliza

tion e

xecu

tion t

ime

# o f n o d e s


(b) TestDFSIO

5 1 0 1 5 2 00 . 00 . 51 . 01 . 52 . 02 . 53 . 03 . 54 . 04 . 5

Norm

aliza

tion e

xecu

tion t

ime

# o f n o d e s


(c) NNBench

5 1 0 1 5 2 00 . 00 . 51 . 01 . 52 . 02 . 53 . 03 . 54 . 04 . 5

Norm

aliza

tion e

xecu

tion t

ime

# o f n o d e s


(d) TeraSort

Figure 12: Performance slowdown of the Hadoop applications due to memory DoS attacks.

and web servers, SysBench [11] to evaluate the databaseserver and memtier benchmark [8] to evaluate the mem-cached server. This shows memory DoS attacks on theseservers were effective: the throughput can be reduced toonly 13% ∼ 70% under malicious contention by the attacker.

1 0 2 0 3 0 4 0 5 00 . 5

1

2

4

8

1 6

3 2

Norm

alize

d late

ncy

R e q u e s t r a t e ( / s )

l o a d - b a l a n c e r w e b d a t a b a s e m e m c a c h e d a l l

(a) Latency

l o a d - b a l a n c e r w e bd a t a b a s e

m e m c a c h e d0 %1 0 %2 0 %3 0 %4 0 %5 0 %6 0 %7 0 %8 0 %

Norm

alize

d thr

ough

put

S e r v e r(b) Throughput

Figure 13: Latency and throughput of the Magento appli-cation due to memory DoS attacks.

Summary. The adversary can compromise the quality ofE-commerce service and cause financial loss in two ways:(1) long response latency will affect customers’ satisfactionand make them leave this E-commerce website [35]; (2) itcan cause throughput degradation, reducing the number oftransactions completed in a unit time. The cost for theseattacks is relatively cheap: the adversary only needs a fewVMs to perform the attacks, with each t2.medium instancecosting $0.052 per hour.

5. DEFENSE AGAINST MEMORY DOS AT-TACKS

We propose a novel, general-purpose approach to detect-ing and mitigating memory DoS attacks in the cloud. Unlikesome past work, our defense does not require prior profilingof the memory resource usage of the applications. Our de-fense can be provided by the cloud providers as a new secu-rity service to customers. We denote as Protected VMsthose VMs for which the cloud customers require protection.To detect memory DoS attacks, lightweight statistical testsare performed frequently to monitor performance changes ofthe Protected VMs (Sec. 5.1). To mitigate the attacks,execution throttling is used to reduce the impact of the at-

0 2 4 6 8 1 0 1 2 1 40 . 0 0

0 . 0 5

0 . 1 0

0 . 1 5

0 . 2 0

0 . 2 5

0 2 4 6 8 1 0 1 2 1 4 1 60 . 0 0

0 . 0 5

0 . 1 0

0 . 1 5

0 . 2 0

0 . 2 5

0 2 4 6 8 1 0 1 2 1 40 . 0 0

0 . 0 5

0 . 1 0

0 . 1 5

0 . 2 0

0 2 4 6 8 1 0 1 2 1 40 . 0 0

0 . 0 5

0 . 1 0

0 . 1 5

0 . 2 0

0 . 2 5

W e b

prob

abilit

y

m e m o r y b a n d w i d t h ( G B p s )

w / o . c o n t e n t i o n w / . L L C c l e a n s i n g w / . a t o m i c l o c k i n g

D a t a b a s e

prob

abilit

y



M e m c a c h e d

prob

abilit

y



L o a d - b a l a n c e r

prob

abilit

y



Figure 14: Probability distributions of the ProtectedVM’s memory bandwidth.

tacks (Sec. 5.2). A novelty of our approach is the combineduse of two existing hardware features: event counting usinghardware performance counters controllable via the Perfor-mance Monitoring Unit (PMU) and duty cycle modulationcontrollable through the IA32_CLOCK_MODULATION Model Spe-cific Register (MSR).

5.1 Detection MethodThe key insight in detecting memory DoS attacks is that

such attacks are caused by abnormal resource contention be-tween Protected VMs and attacker VMs, and such re-source contention can significantly alter the memory usageof the Protected VM, which can be observed by the cloudprovider. We postulate that the statistics of accesses tomemory resources, by a phase of a software program, fol-low certain probability distributions. When a memory DoSattack happens, these probability distributions will change.Figure 14 shows the probability distributions of the Pro-tected VM’s memory access statistics, without attacks(black), and with two kinds of attacks (gray and shaded),when it runs one of four applications introduced in Sec. 4.2,i.e., the Apache web server, Mysql database, Memcachedand Pound load-balancer. When an attacker is present, theprobability distribution of the Protected VM’s memoryaccess statistics (in this case, memory bandwidth in Giga-Bytes per second) changes significantly.

In practice, only samples drawn from the underlying prob-ability distribution are observable. Therefore, the provider’stask is to collect two sets of samples: [XR

1 , XR2 , ..., XR

nR ] arereference samples collected from the probability distributionwhen we are sure that there are no attacks; [XM

1 , XM2 , ...,

XMnM ] are monitored samples collected from the Protected

VM at runtime, when attacks may occur. If these two setsof samples are not drawn from the same distribution, wecan conclude that the performance of the Protected VMis hindered by its neighboring VMs. When the distance be-tween the two distributions is large, we may conclude theProtected VM is under some memory DoS attacks.

We propose to use the two-sample Kolmogorov-Smirnov(KS) tests [30], as a metric for whether two samples belongto the same probability distribution. The KS statistic isdefined in Equation 1, where Fn(x) is the empirical distri-bution function of the samples [X1, X2, ..., Xn], and sup isthe supremum function (i.e., returning the maximum value).Superscripts M and R denote the monitored samples andreference samples, respectively. nM and nR are the numberof monitored samples and reference samples.

DnM, nR = supx| FM

nM(x)− FRnR(x) | (1)

DαnM, nR =

√nM + nR

nM × nR

√−0.5× ln(

α

2) (2)

Null hypothesis for KS test. We establish the null hy-pothesis that currently monitored samples are drawn fromthe same distribution as the reference samples. Benign per-formance contention with non-attacking, co-tenant VMs willnot alter the probability distribution of the ProtectedVM’s monitored samples significantly, so the KS statisticis small and the null hypothesis is held. Equation 2 intro-duces α: We can reject the null hypothesis with confidencelevel 1−α if the KS statistic, DnM, nR , is greater than prede-termined critical values Dα

nM, nR . Then, the cloud providercan assume, with confidence level 1−α, that a memory DoSattack exists, and trigger a mitigation strategy.

While monitored samples, XMi , are simply collected at

runtime, reference samples, XRi , ideally should be collected

when the Protected VM is not affected by other co-locatedVMs. The technical challenge here is that if these samplesare collected offline, we need to assume the memory accessstatistics of the VM never change during its life time, whichis unrealistic. If samples are collected at runtime, all theco-locating VMs need to be paused during sample collec-tion, which, if performed frequently, can cause significantperformance overhead to benign, co-located VMs.

Pseudo Isolated Reference Sampling. To address thistechnical challenge, we use execution throttling to collect thereference samples at runtime. The basic idea is to throt-tle down the execution speed of other VMs, but maintainthe Protected VM’s speed during the reference samplingstage. This can reduce the co-located VMs’ interferencewithout pausing them.

Execution throttling is based on a feature provided in In-tel Processors called duty cycle modulation [6], which is de-signed to regulate each core’s execution speed and powerconsumption. The processor allows software to assign “dutycycles” to each CPU core: the core will be active duringthese duty cycles, and inactive during the non-duty cycles.For example, the duty cycle of a core can be set from 16/16(no throttling), 15/16, 14/16, ..., down to 1/16 (maximumthrottling). Each core uses a model specific register (MSR),IA32_CLOCK_MODULATION, to control the duty cycle ratio: bit4 of this MSR denotes if the duty cycle modulation is en-

abled for this core; bits 0-3 represent the number of 1/16 ofthe total CPU cycles set as duty cycles.

In execution throttling, the execution speed of other VMswill be throttled down and very little contention is inducedto the Protected VM. As such, reference samples collectedduring the execution throttling stage are drawn from a quasicontention-free distribution.

Figure 15a illustrates the high-level strategy for monitor-ing Protected VMs. The reference samples are collectedduring the reference sampling periods (WR), where otherVMs’ execution speeds are throttled down. The monitoredsamples are collected during the monitored sampling peri-ods (WM ), where co-located VMs run normally, without ex-ecution throttling. KS tests are performed right after eachmonitored sample is collected, and probability distributiondivergence is estimated by comparing with the most recentreference samples. Monitored samples are collected peri-odically at a time interval of LM , and reference samplesare collected periodically at a time interval of LR. We canalso randomize the intervals LM and LR for each period toprevent the attacker from reverse-engineering the detectionscheme and scheduling the attack phases to avoid detection.

If the KS test results reject the null hypothesis, it maybe because the Protected VM is in a different executionphase with different memory access statistics, or it may bedue to memory DoS attacks. To rule out the first possi-bility, double checking automatically occurs since referencesamples are re-collected and updated after a time interval ofLR. If deviation of the probability distribution still exists,attacks can be confirmed.

5.2 Mitigation MethodThe cloud provider has several methods to mitigate the

attack. One is VM migration, which can be achieved eitherby reassigning the vCPUs of a VM to a different CPU pack-age, when the memory resource being contended is in thesame package (e.g., LLC), or by migrating the entire VMto another server, when the memory resource contended isshared system-wide (e.g., memory bus). However, such VMmigration can not completely eliminate the attacker VM’simpact on other VMs.

An alternative approach is to identify the attacker VM,and then employ execution throttling to reduce the execu-tion speed of the malicious VM, while meanwhile the cloudprovider conducts further investigation and/or notifies thecustomer of the suspected attacker VM of observed resourceabuse activities.

Identifying the attacker VM. Once memory DoS at-tacks are detected, to mitigate the threat, the cloud providerneeds to identify which of the co-located VMs is conduct-ing the attack. Here we propose a novel approach to identifymalicious VMs based on selective execution throttling in a bi-nary search manner : First, half of the co-located VMs keepnormal execution speed while the rest of VMs are throt-tled down during reference sampling periods (Figure 15b,2nd Reference Sampling period). If in this case, referencesamples and monitored samples are drawn from the samedistribution, then there are malicious VMs among the onesnot throttled down during the reference sampling period.Then, we select half of the remaining VMs to be throttledwhile all the other VMs are in normal speed, to collect thenext reference samples. In Figure 15b, this is the 3rd Refer-ence Sampling period, where only VM3 is throttled. Since

PROTECTED VM

Co-located VM1

Co-located VM2

Co-located VM3

WR

LR

WM

LM

Execution Throttling Reference sampling Monitored sampling

Co-located VM4

(a) Monitoring the Protected VM.

PROTECTED VM

Co-located VM1

Co-located VM2

Co-located VM4

Execution Throttling Reference sampling Monitored sampling

Attack VM

(b) Identifying co-located VM3 as the attacker VM.

Figure 15: Illustration of monitoring the Protected VM (a) and identifying the attack VM (b). The blue “4” means thenull hypothesis is accepted; while the red “8” means the null hypothesis is rejected.

the subsequent monitored samples have a different distribu-tion compared to this Reference Sample, VM3 is identifiedas the attack VM. Note that if there are multiple attackerVMs on the server, we can use the above procedure to findone VM each time and repeat it until all the attacker VMsare found. By organizing this search for the attacker VMor VMs as a binary search, the time taken to identify thesource of memory contention is O (log n), where n is thenumber of co-tenant VMs on the Protected VM’s server.Algorithm 4 shows how to detect attacker VMs using thisselective execution throttling.

Algorithm 4: Identifying and mitigating the attackerVMs that cause severe resource contention.Input:

VM[1,...,n] /* set of co-tenant VMs */

function IdentifyAttacker(sub VM)/* sub VM: set of VMs to identify */if sub VM.length() = 1 then

return sub VM[0]else

imin = 0imax = sub VM.length()-1imid = d(imin+imax)/2eThrottleDown(sub VM[0,...,imid-1])reference sample = DataCollect()ThrottleUp(sub VM[0,...,imid-1])monitor sample = DataCollect()result = KSTest(reference sample, monitor sample)if result = Reject then

return IdentifyAttacker(sub VM[0,...,imid-1])else

return IdentifyAttacker(sub VM[imid,...,imax])end

end

end

beginvm = IdentifyAttacker(VM)ThrottleDown([vm])

end

5.3 ImplementationWe implement a prototype system of our proposed defense

on the OpenStack platform. Figure 16 shows the defensearchitecture overview. We adopt the CloudMonatt architec-ture from [48]. Specifically, the system includes three typesof servers. The Cloud Controller is the cloud manager thatmanages the VMs. It has a Policy Validation Module toreceive and analyze customers’ requests. It also has a Re-

sponse Module, which can throttle down the attacker VMs’execution speed to mitigate memory DoS attacks. The At-

HPCsHardware

Host OSPMU kernel

DetectorRegulator

Attestation Server Cloud Server

Customer Cloud Controller

Policy Validation Module

ProtectedVM

Other VMs

I32_CLOCK_MODULATION

Hypervisor

Verification Module

Responce Module

Figure 16: Architecture overview.

testation Server is a centralized server for monitoring anddetection of memory DoS attacks. It has a Verification

Module that receives Protected VM’s performance prob-ability distribution, detects memory DoS attacks and iden-tifies malicious VMs.

On each of the cloud servers, we use the KVM hypervisorwhich is the default setup for OpenStack. Other virtual-ization platforms, such as Xen and HyperV, can also beused. Two software modules are installed on the host OS.A Detector measures the memory access characteristics ofthe Protected VM using Performance Monitoring Units(PMU), which are commonly available in most modern pro-cessors. A PMU provides a set of Hardware PerformanceCounters to count hardware-related events. In our imple-mentation, we use the linux kernel API perf_event to mea-sure the memory access statistics for the number of LLCaccesses per sampling period. A Regulator is in charge ofcontrolling VMs’ execution speed. It uses the wrmsr instruc-tion to modify the IA32_CLOCK_MODULATION MSR to controlthe duty cycle ratio.

In our implementation, the parameters involved in refer-ence and monitored sampling are as follows: WR = WM =1s, LM = 2s, LR = 30s. These values were selected to strikea balance between the performance overhead due to execu-tion throttling and detection accuracy. In each samplingperiod, n = 100 samples are collected, with each collectedduring a period of 10ms. We choose 10ms because it is shortenough to provide accurate measurements, and long enoughto return stable results. In the KS tests, the confidence level,1 − α, is set as 0.999, and the threshold to reject the nullhypothesis is Dα = 0.276 (given α = 0.001). If 4 consecutiveKS statistics larger than 0.276 are observed (the choice of 4is elaborated in Sec. 5.4), it is assured that the Protected

VM’s memory access statistics have been changed. Then toconfirm that such changes are due to memory DoS attacks,reference samples will be refreshed and the malicious VMwill be identified.

5.4 EvaluationOur lab testbed comprised three servers. A Dell R210II

Server (equipped with one quad-core, 3.30GHz, Intel XeonE3-1230v2 processor with 8MB LLC) was configured as theCloud Controller as well as the Attestation Server. Two DellPowerEdge R720 Servers (one has two six-core, 2.90GHzIntel Xeon E5-2667 processors with 15MB LLC, the otherhas one eight-core, 2.90GHz Intel Xeon E5-2690 processorwith 20MB LLC) were deployed to function as VM hostingservers.

Detection accuracy. We deployed a Protected VMsharing a cloud server with 8 other VMs. Among these8 VMs, one VM was an attacker VM conducting a multi-threaded LLC cleansing attack with 4 threads (Sec. 3.2), oran atomic locking attack (Sec. 3.3). The remaining 7 VMswere benign VMs running common linux utilities. The Pro-tected VM runs one of the web, database, memcachedor load-balancer applications in the Magento application(Sec. 4.2). The experiments consisted of four stages; theKS statistics of each of the four workloads during the fourstages under the two types of attacks are shown in Figure 17.

In stage I, the Protected VM runs while the attacker isidle. The KS statistic in this stage is relatively low. So weaccept the null hypothesis that the memory accesses of thereference and monitored samples follow the same probabil-ity distribution. In stage II, the attacker VM conducts theLLC cleansing or atomic locking attacks. We observe theKS statistic is much higher than 0.276. The null hypothe-sis is rejected, signaling detection of potential memory DoSattacks. In stage III, the cloud provider runs three roundsof reference resampling to pinpoint the malicious VM. Re-source contention mitigation is performed in stage IV: thecloud provider throttles down the attacker VM’s executionspeed. After this stage, the KS statistic falls back to normalwhich suggests that the attacks are mitigated.

We also evaluated the false positive rates and false nega-tive rates of two different criteria for identifying a memoryaccess anomaly: 1 abnormal KS statistic (larger than thecritical value Dα) or 4 consecutive abnormal KS statistics.Figure 18a shows the true positive rate of LLC cleansingand atomic locking attack detection, at different confidencelevels 1−α. We observe that the true positive rate is alwaysone (thus zero false negatives), regardless of the detectioncriteria (1 vs 4 abnormal KS tests). Figure 18b shows thefalse positive rate, which can be caused by background noisedue to other VMs’ executions. This figure shows that using4 consecutive abnormal KS statistics significantly reducesthe false positive rate.

Effectiveness of mitigation. We evaluated the effective-ness of execution throttling based mitigation. The Pro-tected VM runs the cloud benchmarks from the Magentoapplication while the attacker VM runs LLC cleansing oratomic locking attacks. We chose different duty cycle ratiosfor the attacker VM. Figures 19a and 19b show the nor-malized performance of the Protected VM with differentthrottling ratios, under LLC cleansing and atomic lockingattacks, respectively. The x-axis shows the duty cycle (x ×1/16) given to the co-located VMs, going from no throttling

on the left to maximum throttling on the right of each figure.The y-axis shows the Protected VM’s response latency(for web and load-balancer) or throughput (for memcachedand database) normalized to the ones without attack. Ahigh latency or a small throughput indicates that the per-formance of the Protected VM is highly taffected by theattacker VM. We can see that a smaller throttling ratio caneffectively reduce the attacker’s impact on the victim’s per-formance. When the ratio is set as 1/16, the victim’s per-formance degradation caused by the attacker is kept within12% (compared to 23% ∼ 50% degradation with no throt-tling) for LLC cleansing attacks. It is within 14% for atomiclocking attacks (compared to 7× degradation with no throt-tling).

Latency increase and mitigation. We chose a latency-critical application, the Magento E-commerce application asthe target victim. One Apache web server was selected asthe Protected VM, co-locating with an attacker and 7benign VMs running linux utilities. Figure 20 shows the re-sponse latency with and without our defense. The detectionphase does not affect the Protected VM’s performance(stage I), since the PMU collects monitored samples with-out interrupting the VM’s execution. In stage II, the attackoccurs and the defense system detects the Protected VM’sperformance is degraded. In stage III, attacker VM identi-fication is done. After throttling down the attacker VM instage IV, the Protected VM’s performance is not affectedby the memory DoS attacks. The latency during the attackin Phase II increases significantly, but returns to normal af-ter mitigation in Phase IV.

We also evaluated the performance overhead of co-locatedVMs due to execution throttling in the detection step. Welaunched one VM running one of the eight SPEC2006 orPARSEC benchmarks. Then we periodically throttle downthis VM every 10s, 20s or 30s. Each time throttling lastedfor 1s (the same value for WR and WM used earlier). Thenormalized performance of this VM is shown in Figure 21.We can see that when the server throttles this VM every10s, the performance penalty can be around 10%. However,when the frequency is set to be 30s (our implementationchoice), this penalty is smaller than 5%.

6. RELATED WORK

6.1 Resource Contention AttacksCloud DoS attacks. [28] proposed a DoS attack which candeplete the victim’s network bandwidth from its subnet. [15]proposed a network-initiated DoS attack which causes con-tention in the shared Network Interface Controller. [23] pro-posed cascading performance attacks which exhaust hyper-visor’s I/O processing capability. [14] exploited VM migra-tion to degrade the hypervisor’s performance. Our work isdifferent as we exploit failure of isolation in the hardwarememory subsystem (which has not been addressed by cloudproviders), and not attacks on networks or hypervisors.

Cloud resource stealing attacks. [38] proposed the resource-freeing attack, where a malicious VM can steal one type ofresource from the co-located victim VM by increasing thisVM’s usage of other types of resources. [54] designed a CPUresource attack where an attacker VM can exploit the boostmechanism in the Xen credit scheduler to obtain more CPUresource than paid for. Our attacks do not steal extra cloud

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 00 . 00 . 20 . 40 . 60 . 81 . 0

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 00 . 00 . 20 . 40 . 60 . 81 . 0

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 00 . 00 . 20 . 40 . 60 . 81 . 0

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 00 . 00 . 20 . 40 . 60 . 81 . 0

D

t ( s )

W e bI I I I I I I V

D

t ( s )

D a t a b a s eI I I I I I I V

D

t ( s )

M e m c a c h e dI I I I I I I V

D

t ( s )

L o a d - b a l a n c e rI I I I I I I V

(a) LLC cleansing attack

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 00 . 00 . 20 . 40 . 60 . 81 . 0

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 00 . 00 . 20 . 40 . 60 . 81 . 0

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 00 . 00 . 20 . 40 . 60 . 81 . 0

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 00 . 00 . 20 . 40 . 60 . 81 . 0

D

t ( s )

W e bI I I I I I I V

D

t ( s )

D a t a b a s eI I I I I I I V

D

t ( s )

M e m c a c h e dI I I I I I I V

D

t ( s )

L o a d - b a l a n c e rI I I I I I I V

(b) Atomic locking attack

Figure 17: KS statistics of the Protected VM for detecting and mitigating memory DoS attacks.

9 4 %

9 6 %

9 8 %

1 0 0 %

1 0 2 %

true p

ositiv

e rate

0 . 9 0 . 9 5 0 . 9 7 5 0 . 9 9 0 . 9 9 5 0 . 9 9 9

K S s t a t i s t i c , ( D a )

C o n f i d e n c e l e v e l ( 1 - a ) c h e c k i n g 1 s a m p l e c h e c k i n g 4 s a m p l e s

0 . 1 7 0 . 1 9 0 . 2 1 0 . 2 3 0 . 2 4 0 . 2 8

(a) True positive

0 %

5 %

1 0 %

1 5 %

2 0 %

2 5 %

false

posit

ive ra

te

0 . 9 0 . 9 5 0 . 9 7 5 0 . 9 9 0 . 9 9 5 0 . 9 9 9

K S s t a t i s t i c , ( D a )

C o n f i d e n c e l e v e l ( 1 - a ) c h e c k i n g 1 s a m p l e c h e c k i n g 4 s a m p l e s

0 . 1 7 0 . 1 9 0 . 2 1 0 . 2 3 0 . 2 4 0 . 2 8

(b) False positive

Figure 18: Detection accuracy.

resources. Rather, we aim to induce the maximum perfor-mance degradation to the co-located victim VM targets.

Hardware resource contention studies. [21] studied theeffect of trace cache evictions on the victim’s execution withHyper-Threading enabled in an Intel Pentium 4 Xeon pro-cessor. [43] explored frequently flushing shared L2 caches onmulticore platforms to slow down a victim program. Theystudied saturation and locking of buses that connect L1/L2caches and the main memory [43]. [32] studied contentionattacks on the schedulers of memory controllers. However,due to advances in computer hardware design, caches andDRAMs are larger and their management policies more so-phisticated, so these prior attacks may not work in moderncloud settings.

Timing channels in clouds. Prior studies showed thatshared memory resources can be exploited by an attacker toextract crypto keys from the victim VM using cache side-channel attacks in cloud settings [27, 51, 52], or to trans-mit information, using cache operations [36, 45] or bus ac-tivities [44] in covert channel communications between twoVMs. Unlike side-channel attacks our memory DoS attacksaim to maximize the effects of resource contention, while re-source contention is an unintended side-effect of side-channelattacks. To maximize contention, we addressed various newchallenges, e.g., finding which attacks cause greatest resourcecontention (exotic bus locking versus memory controller at-tacks), maximizing the frequency of resource depletion, and

minimizing self-contention. To the best of our knowledge,we are the first to show that similar attack strategies (en-hanced for resource contention) can be used as availabilityattacks as well as confidentiality attacks.

6.2 Eliminating Resource ContentionVM performance monitoring. Public clouds offer per-formance monitoring services for customers’ VMs and ap-plications, e.g., Amazon CloudWatch [2], Microsoft AzureApplication Insights [9], Google Stackdriver [4], etc.. How-ever, these services only monitor CPU usage, network trafficand disk bandwidth, but not low-level memory usage. Tomeasure a VM’s performance without contention for refer-ence sampling, past work offer three ways: (1) collectingthe VM’s performance characteristics before it is deployedin the cloud [19,53]; (2) measuring the performance of otherVMs which run similar tasks [34,50]; (3) measuring the Pro-tected VM while pausing all other co-located VMs [22,47].The drawback of (1) and (2) is that it only works for pro-grams with predictable and stable performance character-istics, and does not support arbitrary programs running inthe Protected VM. The problem with (3) is the signifi-cant performance overhead inflicted on co-located VMs. Incontrast, we use novel execution throttling of the co-locatedVMs to collect the Protected VM’s baseline (reference)measurements with negligible performance overhead. Whileexecution throttling has been used to achieve resource fair-ness in prior work [20,49]; using it to collect Reference sam-ples at runtime is, to our knowledge, novel.

QoS-aware VM scheduling. Prior research propose topredict interference between different applications (or VMs)by profiling their resource usage offline and then staticallyscheduling them to different servers if co-locating them willlead to excessive resource contention [19, 47, 53]. The un-derlying assumption is that applications (or VMs), whendeployed on the cloud servers, will not change their resourceusage patterns. Unfortunately, these approaches fall short indefense against malicious applications, who can reduce theirresource uses during the profiling stage, then run memoryDoS attacks when deployed, thus evading these QoS schedul-ing mechanisms.

Load-triggered VM migration. Some studies proposeto monitor the resource consumption of guest VMs or the

1 . 01 . 11 . 21 . 31 . 41 . 5

0 . 50 . 60 . 70 . 80 . 91 . 0

0 . 50 . 60 . 70 . 80 . 91 . 0

1 . 01 . 11 . 21 . 31 . 41 . 5

Norm

alize

d Thro

ughp

ut

Norm

alize

d Thro

ughp

ut

Norm

alize

d Lay

ency

Norm

alize

d Lay

ency

1234567891 01 11 21 31 41 5

W e b

t h r o t t l i n g r a t i o ( * 1 / 1 6 )1 6

1234567891 01 11 21 31 41 51 6

1234567891 01 11 21 31 41 51 6

1234567891 01 11 21 31 41 51 6

1234567891 01 11 21 31 41 51 6

D a t a b a s e

t h r o t t l i n g r a t i o ( * 1 / 1 6 )

M e m c a c h e d




(a) Throttling LLC cleansing attacks

1234567

0 . 00 . 20 . 40 . 60 . 81 . 0

0 . 00 . 20 . 40 . 60 . 81 . 0

1234567W e b

t h r o t t l i n g r a t i o ( * 1 / 1 6 )D a t a b a s e


M e m c a c h e d


Norm

alize

d Lay

ency

Norm

alize

d Lay

ency

Norm

alize

d Thro

ughp

ut

Norm

alize

d Thro

ughp

ut


t h r o t t l i n g r a t i o ( * 1 / 1 6 )1234567891 01 11 21 31 41 51 6 1234567891 01 11 21 31 41 51 6

1234567891 01 11 21 31 41 51 6 1234567891 01 11 21 31 41 51 61234567891 01 11 21 31 41 51 6 1234567891 01 11 21 31 41 51 6

1234567891 01 11 21 31 41 51 6 1234567891 01 11 21 31 41 51 6

(b) Throttling atomic locking attacks

Figure 19: Normalized performance of the Protected VM with throttling of memory DoS attacks.

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 01 0 01 1 01 2 01 3 01 4 01 5 01 6 01 7 0

laten

cy (m

s)

t ( s )

w / o d e f e n s e w / d e f e n s e

I I I I I I I V

(a) LLC cleansing attack

1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 01 0 02 0 03 0 04 0 05 0 06 0 0

laten

cy (m

s)

t ( s )

w / o d e f e n s e w / d e f e n s e

I I I I I I I V

(b) Atomic locking attack

Figure 20: Request latency of Magento Application

m c fg o b m k


c a n n e a l

s t r e a m c l u s t e r0 . 80 . 91 . 01 . 11 . 21 . 3

Norm

alize

d exe

cutio

n tim

e 1 0 s 2 0 s 3 0 s

Figure 21: Performance overhead of co-located VMs due tomonitoring.

entire server in real-time, and migrate VMs to different pro-cessor packages or servers when there is severe resource con-tention [13, 17, 40, 55]. By doing so these approaches candynamically balance the workload among multiple packagesor servers when some of them are overloaded, and achieve anoptimal resource allocation. While they work well for per-formance optimization of a set of fully-loaded servers, theyfail to detect carefully-crafted memory DoS attacks. First,the metrics in their methods cannot be used to detect theexistence of memory DoS attacks. These works measure theLLC miss rate [17,55] or memory bandwidth [13,40] of guestVMs or the whole server. A high LLC miss rate or memorybandwidth indicates severe resource contention for the VMsor servers. However, a memory DoS attack does not need tocause high LLC miss rate or memory bandwidth in order todegrade a victim’s performance. For instance, atomic lock-

ing attacks lock the bus temporarily but frequently, whichcould incur decreased LLC accesses and LLC misses in thevictim VM. Our experiments show the victim VM’s LLCmiss rate does not change, and its memory bandwidth is evendecreased. So such micro-architectural measurement cannever reveal severe resource contention, or trigger VM mi-gration in the above approaches. Second, these approachesaim to balance the system’s performance. So they cannotguarantee to choose the victim or attacker VMs for migra-tion when they achieve optimal workload placement. Forinstance, Adaptive LLC cleaning attacks can increase vic-tim VMs’ LLC miss rate. However, the above approacheshave no means to figure out that the victim VM has thestrongest desire for migration. It is possible that there ex-ists another VM, which has even higher LLC miss rate thanthe victim VM, due to its internal execution behaviors, notthe interaction and contention with the attacker. So theabove approaches will migrate this VM instead of the vic-tim VM, as this VM has the highest miss rate. Then thevictim VM will still suffer LLC cleaning attacks.

Performance isolation. While cloud providers can of-fer single-tenant machines to customers with high demandfor security and performance, disallowing resource sharingby VMs will lead to low resource utilization and thus is atodds with the cloud business model. Another option is topartition memory resources to enforce performance isola-tion on shared resources (e.g., LLC [5, 18, 25, 26, 37, 42], orDRAM [32,33,41]). These works aim to achieve fairness be-tween different domains and provide fair QoS. However, theycannot effectively defeat memory DoS attacks. For cachepartitioning, software page coloring methods [26] can causesignificant wastage of LLC space, while hardware cache par-titioning mechanisms have insufficient partitions (e.g., IntelCache Allocation Technology [5] only provides four QoS par-titions on the LLC). Furthermore, LLC cache partitioningmethods cannot resolve atomic locking attacks.

To summarize, existing solutions fail to address memoryDoS attacks because they assume benign applications withnon-malicious behaviors. Also, they are often tailored toonly one type of attack so that they cannot be generalizedto all memory DoS attacks, unlike our proposed defense.

7. CONCLUSIONS

We presented memory DoS attacks, in which a maliciousVM intentionally induces memory resource contention todegrade the performance of co-located victim VMs. Weproposed several advanced techniques to conduct such at-tacks, and demonstrate the severity of the resulting perfor-mance degradation. Our attacks work on modern memorysystems in cloud servers, for which prior attacks on oldermemory systems are often ineffective. We evaluated our at-tacks against two commonly used applications in a publiccloud, Amazon EC2, and show that the adversary can causesignificant performance degradation to not only co-locatedVMs, but to the entire distributed application.

We then designed a novel and generalizable method thatcan detect and mitigate all known memory DoS attacks.Our approach collects the Protected VM’s reference andmonitored behaviors at runtime using the Performance Mon-itor Unit. This is done by establishing a pseudo isolatedcollection environment by using the duty-cycle modulationfeature to throttle the co-resident VMs for collecting Refer-ence samples. Statistical tests are performed to detect differ-ing performance probability distributions between Referenceand Monitored samples, with desired confidence levels. Ourevaluation shows this defense can detect and defeat memoryDoS attacks with very low performance overhead.

8. REFERENCES[1] Ab - the apache software foundation. http:

//httpd.apache.org/docs/2.2/programs/ab.html.

[2] Amazon CloudWatch.https://aws.amazon.com/cloudwatch/.

[3] Amazon virtual private cloud.https://aws.amazon.com/vpc/.

[4] Google Stackdriver.https://cloud.google.com/stackdriver/.

[5] Improving real-time performance by utilizing cacheallocation technology. http://www.intel.com/content/www/us/en/communications/cache-

allocation-technology-white-paper.html.

[6] Intel 64 and ia-32 architectures software developer’smanual, volume 3: System programming guide. http://www.intel.com/content/www/us/en/processors/

architectures-software-developer-manuals.html.

[7] Magento: ecommerce software and ecommerceplatform. http://www.magento.com/.

[8] memtier benchmark.https://github.com/RedisLabs/memtier_benchmark.

[9] Microsoft Azure Application Insights.https://azure.microsoft.com/en-

us/services/application-insights/.

[10] Spec cpu 2006. https://www.spec.org/cpu2006/.

[11] Sysbench: a system performance benchmark.https://launchpad.net/sysbench/.

[12] Welcome to the httperf homepage.http://www.hpl.hp.com/research/linux/httperf/.

[13] J. Ahn, C. Kim, J. Han, Y.-R. Choi, and J. Huh.Dynamic virtual machine scheduling in clouds forarchitectural shared resources. In USENIX Conferenceon Hot Topics in Cloud Computing, 2012.

[14] S. Alarifi and S. D. Wolthusen. Robust coordination ofcloud-internal denial of service attacks. In Intl. Conf.on Cloud and Green Computing, 2013.

[15] H. S. Bedi and S. Shiva. Securing cloud infrastructureagainst co-resident DoS attacks using game theoreticdefense mechanisms. In Intl. Conf. on Advances inComputing, Communications and Informatics, 2012.

[16] C. Bienia. Benchmarking Modern Multiprocessors.PhD thesis, Princeton University, 2011.

[17] S. Blagodurov, S. Zhuravlev, A. Fedorova, andA. Kamali. A case for numa-aware contentionmanagement on multicore systems. In ACM Intl. Conf.on Parallel architectures and compilation techniques.

[18] H. Cook, M. Moreto, S. Bird, K. Dao, D. A.Patterson, and K. Asanovic. A hardware evaluation ofcache partitioning to improve utilization andenergy-efficiency while preserving responsiveness. InIntl. Symp. on Computer Architecture, 2013.

[19] C. Delimitrou and C. Kozyrakis. Paragon: Qos-awarescheduling for heterogeneous datacenters. In Intl.Conf. on Architectural Support for ProgrammingLanguages and Operating Systems, 2013.

[20] E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt.Fairness via source throttling: A configurable andhigh-performance fairness substrate for multi-corememory systems. In Architectural Support forProgramming Languages and Operating Systems, 2010.

[21] D. Grunwald and S. Ghiasi. Microarchitectural denialof service: Insuring microarchitectural fairness. InACM/IEEE Intl. Symp. on Microarchitecture, 2002.

[22] A. Gupta, J. Sampson, and M. B. Taylor. Qualitytime: A simple online technique for quantifyingmulticore execution efficiency. In IEEE Intl. Symp. onPerformance Analysis of Systems and Software, 2014.

[23] Q. Huang and P. P. Lee. An experimental study ofcascading performance interference in a virtualizedenvironment. SIGMETRICS Perform. Eval. Rev.,2013.

[24] P. Jamkhedkar, J. Szefer, D. Perez-Botero, T. Zhang,G. Triolo, and R. B. Lee. A framework for realizingsecurity on demand in cloud computing. In Conf. onCloud Computing Technology and Science, 2013.

[25] D. Kim, H. Kim, and J. Huh. vcache: Providing atransparent view of the llc in virtualizedenvironments. Computer Architecture Letters, 2014.

[26] T. Kim, M. Peinado, and G. Mainar-Ruiz.Stealthmem: System-level protection againstcache-based side channel attacks in the cloud. InUSENIX Security Symp., 2012.

[27] F. Liu, Y. Yarom, Q. Ge, G. Heiser, and R. B. Lee.Last-level cache side-channel attacks are practical. InIEEE Symp. on Security and Privacy, 2015.

[28] H. Liu. A new form of DoS attack in a cloud and itsavoidance mechanism. In ACM Workshop on CloudComputing Security, 2010.

[29] L. Liu, Z. Cui, M. Xing, Y. Bao, M. Chen, and C. Wu.A software memory partition approach for eliminatingbank-level interference in multicore systems. In Intl.Conf. on Parallel Architectures and CompilationTechniques, 2012.

[30] F. J. Massey Jr. The kolmogorov-smirnov test forgoodness of fit. Journal of the American statisticalAssociation, 1951.

[31] J. D. McCalpin. Stream: Sustainable memory

http://httpd.apache.org/docs/2.2/programs/ab.html

http://httpd.apache.org/docs/2.2/programs/ab.html

https://aws.amazon.com/cloudwatch/

https://aws.amazon.com/vpc/

https://cloud.google.com/stackdriver/

http://www.intel.com/content/www/us/en/communications/cache-allocation-technology-white-paper.html



http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html



http://www.magento.com/

https://github.com/RedisLabs/memtier_benchmark

https://azure.microsoft.com/en-us/services/application-insights/

https://azure.microsoft.com/en-us/services/application-insights/

https://www.spec.org/cpu2006/

https://launchpad.net/sysbench/

http://www.hpl.hp.com/research/linux/httperf/

bandwidth in high performance computers.http://www.cs.virginia.edu/stream/.

[32] T. Moscibroda and O. Mutlu. Memory performanceattacks: Denial of memory service in multi-coresystems. In USENIX Security Symp., 2007.

[33] S. P. Muralidhara, L. Subramanian, O. Mutlu,M. Kandemir, and T. Moscibroda. Reducing memoryinterference in multicore systems via application-awarememory channel partitioning. In ACM/IEEE Intl.Symp. on Microarchitecture, 2011.

[34] D. Novakovic, N. Vasic, S. Novakovic, D. Kostic, andR. Bianchini. Deepdive: Transparently identifying andmanaging performance interference in virtualizedenvironments. In USENIX Conf. on Annual TechnicalConference, 2013.

[35] N. Poggi, D. Carrera, R. Gavalda, and E. Ayguade.Non-intrusive estimation of qos degradation impact one-commerce user satisfaction. In IEEE Intl. Symp. onNetwork Computing and Applications, 2011.

[36] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage.Hey, you, get off of my cloud: Exploring informationleakage in third-party compute clouds. In ACM Conf.on Computer and Communications Security, 2009.

[37] D. Sanchez and C. Kozyrakis. Vantage: Scalable andefficient fine-grain cache partitioning. In AMC Intl.Symp. on Computer Architecture, 2011.

[38] V. Varadarajan, T. Kooburat, B. Farley,T. Ristenpart, and M. M. Swift. Resource-freeingattacks: Improve your cloud performance (at yourneighbor’s expense). In ACM Conf. on Computer andCommunications Security, 2012.

[39] V. Varadarajan, Y. Zhang, T. Ristenpart, andM. Swift. A placement vulnerability study inmulti-tenant public clouds. In USENIX SecuritySymp., 2015.

[40] H. Wang, C. Isci, L. Subramanian, J. Choi, D. Qian,and O. Mutlu. A-drm: Architecture-aware distributedresource management of virtualized clusters. In ACMIntl. Conference on Virtual Execution Environments,2015.

[41] Y. Wang, A. Ferraiuolo, and G. E. Suh. Timingchannel protection for a shared memory controller. InIEEE Intl. Symp. on High Performance ComputerArchitecture, 2014.

[42] Z. Wang and R. B. Lee. New cache designs forthwarting software cache-based side channel attacks.In ACM Intl. Symp. on Computer Architecture, 2007.

[43] D. H. Woo and H.-H. S. Lee. Analyzing performancevulnerability due to resource denial-of-service attackon chip multiprocessors. In Workshop on ChipMultiprocessor Memory Systems and Interconnects,2007.

[44] Z. Wu, Z. Xu, and H. Wang. Whispers in thehyper-space: High-speed covert channel attacks in thecloud. In USENIX Security Symp., 2012.

[45] Y. Xu, M. Bailey, F. Jahanian, K. Joshi, M. Hiltunen,and R. Schlichting. An exploration of L2 cache covertchannels in virtualized environments. In ACMWorkshop on Cloud computing security, 2011.

[46] Z. Xu, H. Wang, and Z. Wu. A measurement study onco-residence threat inside the cloud. In USENIXSecurity Symp., 2015.

[47] H. Yang, A. Breslow, J. Mars, and L. Tang.Bubble-flux: Precise online qos management forincreased utilization in warehouse scale computers. InACM Intl. Symp. on Computer Architecture, 2013.

[48] T. Zhang and R. B. Lee. Cloudmonatt: Anarchitecture for security health monitoring andattestation of virtual machines in cloud computing. InACM Intl. Symp. on Computer Architecture, 2015.

[49] X. Zhang, S. Dwarkadas, and K. Shen. Hardwareexecution throttling for multi-core resourcemanagement. In USENIX Annual TechnicalConference, 2009.

[50] X. Zhang, E. Tune, R. Hagmann, R. Jnagal,V. Gokhale, and J. Wilkes. Cpi2: Cpu performanceisolation for shared compute clusters. In ACMEuropean Conf. on Computer Systems, 2013.

[51] Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart.Cross-VM side channels and their use to extractprivate keys. In ACM Conf. on Computer andCommunications Security, 2012.

[52] Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart.Cross-tenant side-channel attacks in PaaS clouds. InACM Conf. on Computer and CommunicationsSecurity, 2014.

[53] Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang.Smite: Precise qos prediction on real-system smtprocessors to improve utilization in warehouse scalecomputers. In IEEE/ACM Intl. Symp. onMicroarchitecture, 2014.

[54] F. Zhou, M. Goel, P. Desnoyers, and R. Sundaram.Scheduler vulnerabilities and coordinated attacks incloud computing. In IEEE Intl. Symp. on NetworkComputing and Applications, 2011.

[55] S. Zhuravlev, S. Blagodurov, and A. Fedorova.Addressing shared resource contention in multicoreprocessors via scheduling. In Intl. Conf. onArchitectural Support for Programming Languages andOperating Systems, 2010.

http://www.cs.virginia.edu/stream/

memory dos attacks in multi-tenant clouds: severity and ... · attacks with accurate detection and...

Documents