a quantitative frameworkmason.gmu.edu/~wconnel2/dissertationfinal.pdfa quantitative framework for...

A QUANTITATIVE FRAMEWORKFOR CYBER MOVING TARGET DEFENSES

by

Warren J. ConnellA Dissertation

Submitted to theGraduate Faculty

ofGeorge Mason UniversityIn Partial fulfillment of

The Requirements for the Degreeof

Doctor of PhilosophyInformation Technology

Committee:

Dr. Massimiliano Albanese, DissertationCo-Director

Dr. Daniel A. Menasce, Co-Director

Dr. Sushil Jajodia, Committee Member

Dr. Rajesh Ganesan, Committee Member

Dr. Stephen Nash, Department Chair

Dr. Kenneth S. Ball, Dean, Volgenau Schoolof Engineering

Date: Fall Semester 2017George Mason UniversityFairfax, VA

A Quantitative Framework for Cyber Moving Target Defenses

A dissertation submitted in partial fulfillment of the requirements for the degree ofDoctor of Philosophy at George Mason University

By

Warren J. ConnellMaster of Science

Wright State University, 2011Bachelor of Science

University of Nebraska, 2007

Director: Dr. Massimiliano Albanese, ProfessorDepartment of Information Science and Technology

Co-director: Dr. Daniel A. Menasce, ProfessorDepartment of Computer Science

Fall Semester 2017George Mason University

Fairfax, VA

Copyright c© 2017 by Warren J. ConnellAll Rights Reserved

ii

Dedication

To all the leaders and mentors I’ve had in the Air Force who have guided me and given meopportunities over the last 20 years.

iii

Acknowledgments

I would like to thank my dissertation directors: Dr. Albanese for his time and patiencepreparing me for the world of academia and Dr. Menasce for his invaluable guidance anddirection. I would also like to thank the rest of my dissertation committee and my fellowPhD students for their comments and sharing their experiences with me. Thanks to myfriends for their fellowship and helping keep me sane–I wouldn’t be here without you. Andfinally, thanks to my wife Kayleen, who is no stranger to the life of an academic widow.

iv

Table of Contents

Page

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Moving Target Defense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1.1 Dynamic Runtime Environments . . . . . . . . . . . . . . . . . . . . 7

2.1.2 Dynamic Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.3 Dynamic Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.4 Dynamic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1.5 Dynamic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.6 MTD Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.2 Attack Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Self-Protecting Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3 MTD Quantification Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1 Threat Model and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Quantification Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.2.1 Mathematical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.2 4-Layer Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.2.3 Computing MTD effectiveness . . . . . . . . . . . . . . . . . . . . . 25

3.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.4.1 Comparing MTDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

v

3.4.2 Selecting Optimal Defenses . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 Combining MTDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.5.1 Experimental Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5.2 Experimental Results and Observations . . . . . . . . . . . . . . . . 37

3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4 Performance Modeling of Moving Target Defenses . . . . . . . . . . . . . . . . . 42

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 Quantitative Analysis of MTDs . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2.1 Reconfiguration Model . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.2.2 Response Time Model . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.3 Analysis of Attack Success Probability . . . . . . . . . . . . . . . . . 53

4.3 Simulation and Experimental Testbed . . . . . . . . . . . . . . . . . . . . . 55

4.4 Numerical Results and Validation . . . . . . . . . . . . . . . . . . . . . . . . 58

4.4.1 Reconfiguration Model . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.4.2 Response Time Model . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.4.3 Optimal Reconfiguration Rate . . . . . . . . . . . . . . . . . . . . . 63

4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5 Performance Modeling of Moving Target Defenses With Reconfiguration Limits . 67

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 Updated Analytic Model Overview . . . . . . . . . . . . . . . . . . . . . . . 67

5.3 Reconfiguration Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.3.1 Core Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.3.2 Drop Reconfiguration Requests Policy . . . . . . . . . . . . . . . . . 71

5.3.3 Wait Reconfiguration Requests Policy . . . . . . . . . . . . . . . . . 74

5.4 Response Time Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

5.5 Combined Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

5.6 Simulation and Experimental Testbed . . . . . . . . . . . . . . . . . . . . . 85

5.7 Numerical Results and Validation . . . . . . . . . . . . . . . . . . . . . . . . 85

5.7.1 Analytic Model Results . . . . . . . . . . . . . . . . . . . . . . . . . 86

5.7.2 Validation with Simulation Results . . . . . . . . . . . . . . . . . . . 90

5.7.3 Validation of the Simulation with Experimental Results . . . . . . . 96

5.7.4 Determining the Optimal Reconfiguration Rate . . . . . . . . . . . . 96

6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

vi

List of Tables

Table Page

2.1 Encryption Settings with Fitness Values and Overhead . . . . . . . . . . . . 16

2.2 Authentication Settings with Fitness Values and Overhead . . . . . . . . . . 16

2.3 Response Times and Fitness Values . . . . . . . . . . . . . . . . . . . . . . . 16

3.1 Sample Case Study Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2 Improvement from Adding MTDs . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3 Case Study Optimal Configuration . . . . . . . . . . . . . . . . . . . . . . . 34

3.4 Attacker Success Rates for all Combinations of Interarrival Rates . . . . . . 40

3.5 Availability for all Combinations of Interarrival Rates . . . . . . . . . . . . 41

4.1 Summary of Variable Names and Descriptions . . . . . . . . . . . . . . . . . 45

4.2 Values of Variables used in Numerical Results . . . . . . . . . . . . . . . . . 58

4.3 Comparison of Availability Results. . . . . . . . . . . . . . . . . . . . . . . . 61

5.1 Summary of Variable Names and Descriptions . . . . . . . . . . . . . . . . . 69

5.2 Example of the Aggregate Departure Rate for c = 10 and c∗ = 4 . . . . . . 79

5.3 Values of Variables Used in Simulation Results . . . . . . . . . . . . . . . . 86

5.4 Comparison of Simulation and Experimental Results for Availability . . . . 96

5.5 Comparison of Simulation and Experimental Results for Response Time . . 97

vii

List of Figures

Figure Page

2.1 Suggested Methods for MTD Quantification . . . . . . . . . . . . . . . . . . 11

2.2 Probabilistic Attack Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Connection Loss and Attacker’s Success as a Function of Shuffle Rate . . . 15

2.4 Sample Sigmoid Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.1 Quantification Framework Layers . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Computing MTD Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.3 Case Study Quantification Framework . . . . . . . . . . . . . . . . . . . . . 29

3.4 Case Study Optimal Configuration . . . . . . . . . . . . . . . . . . . . . . . 33

3.5 Experimental Setup for Combined MTD Experiments . . . . . . . . . . . . 36

3.6 Histogram of Number of VMs Compromised for Service Rotation . . . . . . 38

3.7 Monitor Results for Service Rotation . . . . . . . . . . . . . . . . . . . . . . 38

3.8 Attacker Success Rate and Availability for Service and IP Rotation . . . . . 39

3.9 Comparison of Service and IP Rotation on Attacker Success Rate and Avail-

ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1 Queuing Representation of the Reference Scenario . . . . . . . . . . . . . . 44

4.2 Analytic Model Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4.3 Reconfiguration Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4 CTMC for the Reconfiguration Model . . . . . . . . . . . . . . . . . . . . . 48

4.5 CTMC for the Response Time Model . . . . . . . . . . . . . . . . . . . . . . 51

4.6 Probability of Success Ps vs. Time for Ts = 10 . . . . . . . . . . . . . . . . 54

4.7 Experimental Setup for Quantitative Analysis . . . . . . . . . . . . . . . . . 56

4.8 Control Flow and Movement for Quantitative Analysis . . . . . . . . . . . . 57

4.9 Distribution of the Number of Resources Being Reconfigured for c = 20 . . 59

4.10 Availability vs. Reconfiguration Rate α . . . . . . . . . . . . . . . . . . . . 60

4.11 Comparison of Number of Resources being Reconfigured (α = 0.02 rec/sec) 60

4.12 Number of Available Resources and Response Time for Two Trials with Dif-

fering Values of α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

viii

4.13 Response Time: Simulation vs. Analytical Model with Stability . . . . . . . 64

4.14 Optimization Analysis to Find the Maximum Feasible Reconfiguration Rate

(α) for c = 20 and S = 60 sec . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.1 Analytic Model Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.2 Target and Effective Reconfiguration Rate . . . . . . . . . . . . . . . . . . . 71

5.3 Flowchart of the Reconfiguration Cycle under the Drop Policy . . . . . . . 72

5.4 State Transition Diagram of the Markov Chain for the Reconfiguration Model

under the Drop policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.5 Flowchart of the Reconfiguration Cycle under the Wait Policy . . . . . . . . 75

5.6 State Transition Diagram of the Markov Chain for the Reconfiguration Model

under the Wait Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

5.7 State Transition Diagram for the Response Time Model . . . . . . . . . . . 78

5.8 Average Availability and Resource Utilization for Drop and Wait Policies . 87

5.9 Average Response Time and Resource Age for Drop and Wait Policies . . . 88

5.10 Availability for Varying Levels of c∗ . . . . . . . . . . . . . . . . . . . . . . 88

5.11 Average Resource Utilization for Varying Levels of c∗ . . . . . . . . . . . . . 89

5.12 Average Response Time for Varying Levels of c∗ . . . . . . . . . . . . . . . 90

5.13 Probability Distributions of pk and pk for Varying Levels of α . . . . . . . . 91

5.14 Comparison of pk Between Simulation and Analytical Model for Drop Policy 91

5.15 Comparison of pk Between Simulation and Analytical Model for Wait Policy 92

5.16 Comparison of Availability and Response Time Between Simulation and An-

alytical Model for Drop Policy . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.17 Comparison of Average Resource Age and Drop Percentage Between Simu-

lation and Analytical Model for Drop Policy . . . . . . . . . . . . . . . . . . 93

5.18 Comparison of Effective Reconfiguration Rate Between Simulation and An-

alytical Models for Drop Policy . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.19 Comparison of Availability and Response Time Between Simulation and An-

alytical Model for Wait Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 94

5.20 Comparison of Average Resource Age and Reconfiguration Delay Between

Simulation and Analytical Model for Wait Policy . . . . . . . . . . . . . . . 95

5.21 Comparison of Effective Reconfiguration Rate Between Simulation and An-

alytical Models for Wait Policy . . . . . . . . . . . . . . . . . . . . . . . . . 95

5.22 Utility Values of Various Weight Combinations for Drop Policy . . . . . . . 98

ix

Abstract

A QUANTITATIVE FRAMEWORK FOR CYBER MOVING TARGET DEFENSES

Warren J. Connell, PhD

George Mason University, 2017

Dissertation Directors: Dr. Massimiliano Albanese/ Dr. Daniel A. Menasce

Moving Target Defenses (MTDs) are techniques used to defend computer networks that

seek to delay or prevent attacks during any phase of the cyber kill chain by dynamically

changing the makeup of the systems or network such that an effective attack cannot be

planned or executed. There are a variety of methods available to implement MTDs, such

as dynamically changing network addresses, memory addresses, user-level services, or even

operating systems or data. These changes can take the form of changing signatures or

outward appearance, or actual changes in network configuration or software.

Although many schemes are described in the literature, there is no universal method to

measure their effectiveness. Likewise, there is very little uniformity in how the overhead of

these techniques is measured, if it is even mentioned at all. These factors make it difficult,

if not impossible, to effectively compare MTDs. Therefore, a quantification framework for

MTDs is needed to properly compare MTDs or optimize their performance.

Additionally, many MTDs have a limited scope that usually only covers a subset of

potential attack vectors with no single solution that offers protection in every scenario.

Ideally, several techniques could be combined to provide defense-in-depth, but integration

is often lacking and the lack of universal metrics for evaluating performance prevents us

from assessing the combined impact of multiple techniques.

This work presents a framework for comparing different MTDs or the combined effects

of a set of MTDs by calculating a utility value as a function of the impact the MTD has

on the attacker’s success rate or level of additional effort required. It also calculates a

utility value as a function of the overhead. The weighted average of these utility values can

then be used to compute an aggregate utility value. This model is then tested by several

experiments that compare a variety of MTDs, observing their combined effect, and finding

optimal settings for each MTD.

The proposed framework fulfills the need for a systematic approach to compare MTDs

with one another despite their diversity and make an optimal selection of techniques for a

given scenario. The framework may also be used to find an optimal combination of settings

for those MTDs and adapt their settings for changing external conditions. The model is

not only designed to accommodate existing MTD techniques, but can be extended to work

with any future techniques that may appear. It may also guide future research efforts

by identifying commonly-used MTDs for integration or potentially identify focus areas for

MTD development to address common gaps in coverage.

To further support this concept, we also propose a quantitative analytic model for

assessing the resource availability and performance of MTDs, and a method for determining

the reconfiguration rate that maximizes a utility function that incorporates the tradeoffs

between the attacker’s success probability and response time. This model may be used to

evaluate an individual MTD or used in conjunction with the MTD quantification framework.

The analytic results are validated by simulation and experimentation.

Chapter 1: Introduction

1.1 Background and Motivation

Moving Target Defenses (MTD) are cyber defenses that seek to dynamically change some

aspect of the system being defended, thus removing the adversarys advantage of being

able to study the target system to find vulnerabilities and plan their attack. By working

proactively to disrupt or delay an attacker, MTDs can offer some measure of protection

against unknown (“zero-day”) or even exposed vulnerabilities. As a result, MTD offers a

great potential in turning the asymmetry typical of a cyber security landscape in favor of the

defender and has been heralded as a “game changer” in the field of research [1]. Since then,

a myriad of techniques have been developed since the term first surfaced in the literature,

each targeting different aspects of a system.

However, too often each of the proposed techniques only addresses a narrow subset of

potential attack vectors and different techniques tend to measure their effectiveness in differ-

ent and often incompatible ways. Additionally, in order to provide a comprehensive security

solution, using multiple techniques in conjunction with each other should be considered, but

this raises new issues in terms of optimal selection of a subset of available techniques.

Although some survey papers note where certain MTDs might not work well together [2],

or give a qualitative estimate of their effectiveness and cost [3], a quantitative framework

that can accommodate any existing or future MTDs is essential if this area of research is

to progress past specialized, isolated solutions. The primary research problem this work

addresses is the design and validation of a model to quantify the performance of diverse

MTDs as well as their costs. This work also explores other methods for calculating overall

utility and finding the optimal choice and settings for these MTDs.

1

1.2 Thesis

It is possible to quantify the performance of MTDs by analytically predicting their effective-

ness and response time, and to use this quantification to determine the optimal configuration

for any combination of varying MTDs.

To do this, we must find a way to map MTDs and their settings to a utility value

that captures their effectiveness. We do this by noting that in the reconnaissance phase

of the cyber kill chain, MTDs primarily act by disrupting some portion of the attacker’s

knowledge. Thus, we can map MTDs to knowledge of various aspects of the system. From

there, that knowledge is then leveraged to exploit software weaknesses and we can map that

knowledge to the classes of exploits they enable. Finally, based on the overall probability of

each exploit occurring at a given service, we can arrive at a value that captures the overall

effectiveness of the MTD.

To determine the level of attacker disruption we can analyze an individual MTD to pre-

dict resource age or other measures. For a shuffling or rotation-based MTD, the disruption

is reflected in the average age of a resource, because a smaller window decreases the chance

of success. To determine cost, we can analytically determine response time, which captures

the effects of increased memory, runtime, or bandwidth requirements. We can also use

these values to determine a utility value that represents the tradeoffs of a particular MTD

configuration.

1.3 Research Approach

To address these issues, I present a 4-layer model for MTD quantification that captures

the relationships between MTDs, knowledge, software weaknesses, and individual services.

By expressing the effects that MTDs have on required knowledge as a probability, we can

propagate those values to also calculate the chances of a software weakness being exploited

and determine an overall value for the effectiveness of the MTD.

I also present a method to determine the characteristics of an MTD by using Continuous

2

Time Markov Chains to model the effects of the reconfiguration rate on a system’s security

and response time. This model is validated by the use of simulations and experiments.

From there, I formulate a utility function that takes effectiveness and cost into account,

which can then be used to find an optimal selection and configuration of MTDs for a given

scenario.

1.4 Contributions

This work fulfills this pressing need for a unified framework to comparatively measure

MTDs. We present a novel framework that captures the relationships between available

MTDs and the information such MTDs may affect through probabilistic measures. It also

captures the relationships between services, their software weaknesses, and the knowledge

required to exploit such weaknesses to probabilistically determine the effectiveness of any

given technique or set of techniques, regardless of how they operate.

Likewise, the choice of response time as the primary measure of system overhead cap-

tures the cost of MTDs in a straightforward manner. The model allows for evaluation and

comparison of multiple concurrent MTDs that are required to protect against varied threats

and determine where they might synergize or conflict. Accounting for multiple settings and

measuring their effect on effectiveness and cost is also possible using the model.

While the model is simple in concept, it also lends itself to several extensions. If a

threat model tends to prioritize certain classes of attacks or a service is specifically more

vulnerable to certain attacks, this can be accounted for by using a weighing factor. Likewise,

since effectiveness is based on probability values, they can also be used to make informed

calculations of risk.

Furthermore, the framework has the following desirable attributes:

• Generality: Any existing MTD should be able to fit within the framework. The

relationship between an MTD and the knowledge it protects serves as the interface

that enables to plug that MTD into the framework.

3

• Extensibility: Any future MTD must also be able to fit within the framework,

regardless of how it operates. New MTDs, areas of knowledge to disrupt, or even

classes of software weaknesses can be added to the framework.

• Resilience: Because the framework covers general classes of software weaknesses

rather than specific vulnerabilities, it is less vulnerable to unknown threats and 0-day

attacks.

• Flexibility: The framework is simple and intuitive and can be used in many possible

ways. It may be used for rough estimates of utility values or for more fine-grained

estimates when more fidelity is required.

• Practicality: The framework does not ignore the issue of cost or overhead when

determining the utility of a technique. This can be either incorporated as a simple

constraint or into the overall utility of a proposed solution.

The analytic model for determining MTD effectiveness and cost also makes several im-

portant contributions: (i) The use of Continuous Time Markov Chains to measure MTD

security and performance. Although Markov Chains have been widely used in computer

science since their introduction, their application in capturing the performance of MTDs is

novel. (ii) A method for determining the reconfiguration rate that maximizes a utility func-

tion that incorporates the tradeoffs between the attacker’s success probability and response

time. (iii) The findings for effectiveness and response time values from this model can serve

as inputs to the quantification framework previously described.

1.5 Organization

This dissertation is organized as follows. Chapter 2 covers background and various forms

of moving target defenses, as well as some background on autonomic systems and attack

graphs. Chapter 3 introduces the problem statement and covers the development of the

4

quantification framework and experiments showing how it might be applied. Chapter 4 de-

scribes an analytical model for MTDs using Markov Chains to determine effectiveness and

response time, the simulations and experiments used to validate the model, and findings

regarding the stability of the model. Chapter 5 further improves the model, introducing

policies to limit reconfigurations and preserve response time, further simulations and ex-

periments to validate the analysis, and formulation of a utility function to determine an

optimal reconfiguration rate. The dissertation concludes in Chapter 6 with a summary of

findings and discussion of future work.

5

Chapter 2: Background and Related Work

2.1 Moving Target Defense

Moving Target Defense was first introduced in a series of papers that modeled a system’s

security as a function of its exposed attack surface and showed how MTDs increased diversity

based on software and network transformations [1]. Later papers expanded on this concept,

incorporating aspects of game theory, where an attacker or defender may adopt different

strategies based on the actions of the other [4] or introduce machine learning into MTD

behavior [5].

Since its introduction, a myriad of MTD techniques have been developed in the litera-

ture, each targeting different aspects of a system. Today, they are generally organized by

type according to a taxonomy published by Lincoln Labs [2][6] into the following categories:

• Dynamic Runtime Environments

• Dynamic Platforms

• Dynamic Software

• Dynamic Data

• Dynamic Networks

Although the MTD taxonomy described covers most MTDs as they apply to conven-

tional computer systems, MTD techniques have also been applied on several other plat-

forms that don’t fall neatly into those categories. For example, MTDs have been studied

in resource-constrained environments such as tactical network devices or FPGAs [7], cyber-

physical systems [8], and wireless sensor networks [9][10].

6

2.1.1 Dynamic Runtime Environments

Dynamic Runtime Environments involve changing the environment presented to an appli-

cation dynamically. This is typically done at a very low level and consists of two major cat-

egories: Address Space Layout Randomization (ASLR) and Instruction Set Randomization

(ISR). ASLR protects against buffer overflow attacks by randomizing key locations of mem-

ory [11] and are some of most mature and widely-adopted forms of MTD in use today. Since

first being introduced, many improvements have been proposed, such as changing the focus

of the MTD from preventing invalid memory accesses to offering unpredictable results [12]

or by randomizing instructions on the fly to improve entropy [13]. Another technique that

incorporates aspects of address randomization in its protection is DieHard [14] [15], which

also protects against heap buffer overflows by increasing space between elements and main-

taining multiple replicas of the heap and using voting to ensure control is not subverted.

ISR works to mitigate Return-Oriented Programming (ROP) and code injection attacks

that ASLR does not protect against [16] by ensuring injected code is not immediately

compatible with the target, often by performing simple encryption or adding some additional

required label to each opcode. This can be done at compile time [17], or performed at run-

time in an emulator [18][19]. Is it noted that ISR techniques can often be used in conjunction

with ASLR techniques to supplement each other [2].

2.1.2 Dynamic Platforms

Dynamic Platform MTDs operate at a slightly higher level of abstraction than Dynamic

Runtime Environments by changing platforms such as OS version, OS instance, or CPU ar-

chitecture dynamically. Virtualization is relied upon heavy to implement these techniques.

One method would be to operate using multiple distributions of the Linux OS and ro-

tate between them [20], or by designating roles for each VM and shuffling them between

hosts [21].

Another way to realize Dynamic platforms would be use multivariant systems, a setup

7

where multiple variations of an OS are run at the same time and monitored for any diver-

gence [22]. The variants are specifically crafted so that a malicious attacker attempting to

divert control would only do so one one of the variants, which would then be easily detected

and reverted to a known good state.

Making OS changes on a regular interval can be disruptive to running applications but an

MTD can accomplish this by first taking a snapshot of the current state, execution state,

open files, and network sockets [23]. Other MTDs use similar methods of snapshotting

system images and replacing them with known good copies if tampering is detected or to

disrupt attacker’s persistence on a system [24][25][26].

2.1.3 Dynamic Software

MTDs classified as Dynamic Software often operate similarly to Dynamic Platforms, only

that the focus is more on application level than the OS level. The grouping, order, format,

or the actual instructions within an application’s code can be changed dynamically. One

software approach would be to generalize DieHard for invididual applications instead just

OS use [27]. Other approaches in this category include Multivariant approaches that run

several different versions of software to prevent all machines being compromised by the same

exploit [28]. A simpler implementation of this approach uses a single replica compiled with

the stack working in the opposite direction so that an exploit cannot work on both [29].

Another Dynamic Software Method would to be implement some sort of shuffling or rotation

between software that is currently being executed [30].

2.1.4 Dynamic Data

Dynamic Data MTDs tend to be even more application-specific, focusing primarily on

making some sort of continuous transformation to the format, syntax, encoding, or repre-

sentation of an application’s data. This might take the form of altering HTML tags from a

web server to thwart bots (but allowing legitimate users to render them correctly) [31] or

adding additional required keywords to SQL commands and table names to prevent SQL

8

Injection attacks [32].

2.1.5 Dynamic Networks

Dynamic Networks involves changing network addresses or other properties dynamically.

Dynamic Networks are one of the most widely studied areas of Moving Target Defense, as

most cyber attackers use computer networks as an attack vector and network MTDs can be

implemented at a level of abstraction above individual systems or applications. This sort

of protection is attractive because if an attacker cannot even find the system they want to

target on the network, then the defense would be considered effective.

Perhaps the earliest and most oft-cited example of a Dynamic Network MTD would be IP

hopping [33][34], but many other variants exist. For example, a scheme could include decoy

nodes and shuffle them regularly along with actual nodes to further delay attackers [35].

Instead of changing the target system IP addresses directly, an MTD can implemented

instead by a series of rotating proxies that know the actual address of that of the target [36].

An improvement on the IP-hopping scheme is Random Host Mutation [37][38] which is

implemented at the DNS server and maps ephemeral IP addresses (eIP) to real IP addresses

rIP). This technique randomizes host-to IP bindings based on source identity and time [39]

and is able to maintain connection states. The technique also has the ability to adapt to an

attacker by moving hosts to addresses with a lower probability of being scanned or moving

nodes to addresses that have already been scanned [40].

Instead of centralizing operation of the MTD, it is possible to implement it across an en-

tire network by using a hypervisor to rewrite packets at each node to make each network hop

dynamic. The Self-Shielding Dynamic Network (SDNA) protocol also allows for encryption,

authentication, and redirection to a honeypot for unauthenticated users [41][42][43].

Besides actually changing IP addresses, a network MTD can also take other actions

to virtually affect the network and disrupt attackers. For example, an MTD might only

manipulate an attacker’s view of the network, using some sort of protocol scrubber [44] or the

dynamic defense could come in the form of lightweight sensors that are able to move around

9

the network and swarm around any areas where there are potential discrepancies [45].

It is worth noting that network MTDs also take advantage of evolving technology. IPv6

offers a vastly larger address space and therefore greater entropy to techniques that use

it. MT6D uses the IPv6 address space to create an encrypted tunnel that uses a range of

addresses and ensures protection as well as privacy [46][47]. This technique is also applicable

to embedded systems on the smart grid using IPv6 [48] or as part of a hybrid approach

with a mix of static and dynamic IP addresses [49] to protect mobile-enabled systems.

2.1.6 MTD Quantification

With the great amount of variety in MTD techniques, it is not surprising to find that

they are often quantified in completely different ways. One paper suggests dividing MTD

techniques into “low-level” and “high-level” methods, with low-level methods (such as those

dealing with the runtime environment or OS) tend to have their effectiveness measured via

attack experiments, while high-level methods look at the system as a whole and compute

effectiveness via simulation and/or probability models [50]. An expert survey also suggests

that several different methods be used to measure effectiveness and cost of MTDs, as seen

in Figure 2.1 [3].

The analytic method provides a precise measure of effectiveness if the attack model

allows for it. For example, an MTD that dynamically re-maps the association between

systems and their addresses to avoid probes looking for a vulnerable system can use a

probabilistic urn model to calculate its effectiveness [51]. In the static case, the probability

of at least one successful probe given k probes and v vulnerable machines out of n machines

is:

P (Xk > 0) = 1− P (Xk = 0) = 1−(n−vk

)(nk

) (2.1)

And if all the systems and addressees are completely shuffled between probes, this

probability becomes:

10

!"#$%&'()*

+,#&-*./*

0#&#1*

23)&435*

63&7./8*

9':;$#&<

'.")*

=35*

23#:'">*

?@A3/&*

9;/B3%)*./*

?$'('&#&'.")*

CA3/#&'."#$*

63&7./8*

?DD3(&'B3"3))*

!

O

O

"

!

O

O

E:A$3:3"&#&'."*

F.)&)*

!

M + D

O

X

X

O

!

G3/D./:#"(3*

F.)&)*

X

"

!

O

X

O

!

H)#4'$'&%*

X

X

X

X

O

"

!

93(;/'&%*

G/'./'&%*

"

!

X

"

!

"

!

O

X

M Ð Math based D Ð Data based

Sometimes

Bad

Good

! " # $ ! % & ' & ( ) ! * + , - - . + / / . ) ! 0 ' ' & 1 # 2 3 4 3 . 1 ' & 1 2 1 - % $ $ 1 5 ) % ) 6 4 ! ) 2 / ! # 7 8 / 9 7 / : 7 ; < = ' ) * 3 ! > ? 3 ' + , - - . + / / 3 . 1 ' & 1 2 1 - % $ $ 1 5 ) % ) 6 4 ! ) 2 / 3 3 / = ' ) * 3 @ > ? 3 ' 4 % 3 . A

Figure 2.1: Suggested Methods for MTD Quantification

P (Xk > 0) = 1− P (Xk = 0) = 1−(

1− v

n

)k(2.2)

However, most MTDs do not easily fit into such a mathematical model and must have

their effectiveness assessed by simulation. This is usually depicted as some form of chart

showing attacker’s success rate. This success rate may be interpreted and displayed in a

number of ways and usually contains some reference to the static case and multiple settings

for the dynamic case. For example:

• Attacker success rate for various settings [21]

• Attacker success rate over time [52]

• Asset survival rate over time [53]

• Number of completed attacks over the number of attacks attempted [54]

11

• Ratio of infected hosts over time [38][40]

Instead of using a single metric of attacker’s success rate to measure MTD effectiveness,

several other metrics can be derived from an MTD’s effects. The authors of Random Host

Mutation introduce the metrics of Deterrence, Deception, and Detectability to MTDs in

their work [39]:

Deterrence (Π) measures the cost to the attacker in terms of additional time taken to

carry out an attack and is the ratio of the time Tm required with MTD active to the time

Ts required in the static case.

Π = Tm/Ts (2.3)

Deception (Ω) is the ratio of targets an attacker misses due to effects of an MTD, where

the N is the number of targets discovered out of M total targets:

Ω = N/M (2.4)

Detectability (Ψ) is the ratio of the number of probes Rm required with an MTD active

to the number of probes Rs required in the static case. This represents the case where

presence of an MTD may require the attacker to make more probes or other illegitimate

actions that could be detected.

Ψ = Rm/Rs (2.5)

These metrics provide a different point of view of MTD effectiveness than strict preven-

tion of attacks and show their effectiveness in disrupting and delaying attackers and can be

useful in this quantification work.

Another multi-dimensional metric, introduced by Siege Technologies’ Cyber Quantifica-

tion Framework, measures MTDs based on 4 metrics: Productivity, Success, Confidentiality,

and Integrity, for both the defender and the attacker, for a total of eight different metrics [55]

12

While several authors quantify their MTD effectiveness and do so in similar ways, far

fewer papers on MTDs report their costs in a uniform manner. Examples of reported costs

include varying execution, memory, or network overhead, or additional hardware costs [2].

Other effects might also be reported. For example, a technique designed to proactively

defend against Distributed Denial of Service (DDoS) attacks measures its cost in terms

of packet loss and additional storage resources required to run [56]. A network shuffling

technique may have a trade-off of dropped connections depending on its shuffle rate [51] or

additional latency or overhead on throughput [36].

An ideal quantification method must be able to take effectiveness and cost into account.

One paper characterizes the power of MTD both with and without cost as a factor and

optimizes utility with regard to cost but does not tie the effectiveness and cost functions to

any specific measure [57].

Finally, it is worth noting that in nearly every case where a new metric is introduced, it is

only ever applied to a single author. One expert survey does provide a thorough assessment

of the effectiveness and cost of many techniques across the spectrum of existing MTDs [3].

However, the survey is qualitative in nature and potentially subject to reviewer bias.

2.2 Attack Graphs

This work is also inspired by much of the existing work on attack graphs which defines

the possible preconditions, state transitions, and post-conditions for all possible attacks

on a network [58][59]. Attack graphs are a well-researched area of computer-security, with

several automated tools available to generate them for a given network and find all possible

attack paths [60][61].

Several extensions to the attack graph model have been proposed. Of particular interest

to this work are probabilistic attack graphs that label each potential transition state with

the probability of success as seen in Figure 2.2. In this way, we can calculate the overall

probability of attack success by propagating each probability through each possible attack

13

path [62].

Probabilistic attack graphs have been incorporated in the design of MTDs. One method

assigns roles (e.g., Authorizer, Planner, TargetDB) to hosts on the network and migrates

the roles between hosts. Having knowledge of the attack path between different roles and

the probabilities of attack succeeding can inform their decisions to migrate roles and aids

in validating their results [21].

Figure 2.2: Probabilistic Attack Graph

However, it should be noted that attack graphs have several disadvantages that must

be taken into consideration: they are often tied to specific vulnerabilities, and certain

MTDs have the potential to drastically change the attack surface such that it would require

generating an entirely new attack graph for that particular state.

14

2.3 Self-Protecting Systems

This work in Moving Target Defense is inspired by related work in autonomous computing,

particularly in self-protecting systems. Autonomous computing systems are self-configuring,

self-optimizing, self-healing, and self-protecting [63]. As autonomous systems change their

security mechanisms in response to their environment, this concept can be seen as a form

of moving target defense [64]. This could be realized by automating command and con-

trol of network defenses and MTDs to react to attackers [65][66], finding ways to combine

MTDs [67][68][69] or incorporating game theory concepts into defender actions [70][71][72].

With regards to quantification though, in order to change their settings effectively, self-

protecting systems must be able to have a way to quantify both their effectiveness and

their cost or overhead in order to provide an accurate measure of their utility. For example,

increasing the shuffle rate of an MTD might decrease attacker’s success but also impose a

cost of connection loss as seen in Figure 2.3 [51].

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

shuffle rate

pro

ba

bili

ty

Connection Loss and Attacker Success as Shuffle Rate Increases

Attacker success

Connection loss

Figure 2.3: Connection Loss and Attacker’s Success as a Function of Shuffle Rate

15

One such application is described in an autonomous system that assigns utility values

for various security settings in a streaming media application and changes those settings

to optimize security. For example, the type and strength of encryption or authentication

method may produce different fitness values F and differing amounts of overhead, with the

overall response time also having a fitness value assigned, as seen in Tables 2.1-2.3 [73]:

Table 2.1: Encryption Settings with Fitness Values and Overhead

ConfigurationEncryptionAlgorithm

key length F(conf)Performance

Overhead

(A) DES 56 0.2 0.2

(B) AES 128 0.3 0.3

(C) Blowfish 128 0.4 0.4

(D) Blowfish 448 0.5 0.5

Table 2.2: Authentication Settings with Fitness Values and Overhead

ConfigurationAuthentication

MethodStrength F(auth)

PerformanceOverhead

(A) Password 8 0.2 0.1

(B) Password 16 0.3 0.2

(C) SIM-based (EAP) 1 (COMP 128-1) 0.5 0.4

(D) SIM-based (EAP) 3 (COMP 128-3) 0.6 0.9

Table 2.3: Response Times and Fitness Values

Configuration Response Time F(lat)

(A) t <100 ms 1

(B) t >100 ms 0.75

(C) t >1 s 0.5

(D) t >4 s 0.25

(E) t >10 s 0

Similar techniques have also been used in conjunction with Intrusion Detection Systems

(IDS), either to adjust their thresholds based on a receiver Operating Characteristic (ROC)

16

curve [74], or to find the optimal configuration of multiple available IDSes that balances

security with quality of service (QoS). To find this optimal configuration, we must separately

determine utility values based on security and QoS and combine them into a global utility

value.

To determine the utility obtained from the addition of security, each security mechanism

has a particular detection rate, and if multiple are IDSes are used, an exponential average

is used to generate a lower bound for the actual combined detection rate. This detection

rate is used to generate the security utility for a role r.

USr (−→ρr) =A∑j=1

ar,j (lnN∑i=1

edi,j × εr,i) (2.6)

Likewise, QoS also contributes to the global utility function and is derived from a sigmoid

function, which is calculated as a function of the estimated response time Tr, a service level

objective (SLO) that must be met (σr), and a parameter that determines the shape and

steepness of the sigmoid (δ). The parameter κr may also be used and is chosen such that

when Tr = 0, UTr = 1.

UTr (Tr) = κreδ(σr−Tr)

1 + eδ(σr−Tr)(2.7)

The sigmoid function roughly approximates the unit step function and gives a value

between 0 and 1 based on whether the input met or did not meet the target value and by

how much. In the absence of the parameter κr, when σr = Tr, UTr = 0.5. Several sigmoids

are shown in Figure 2.4 to illustrate how the parameters can be tuned with input from

stakeholders to best meet their requirements.

The total utility for both the combination of security mechanisms and the QoS is the

sum of the individual utility values, weighted by the values wsr and wtr.

17

Figure 2.4: Sample Sigmoid Functions

UStotal (−→ρ ) =

∑∀r

wsr USr (−→ρ ) UTtotal (

−→T ) =

∑∀r

wtr UTr (Tr) (2.8)

Likewise, the utility functions are then combined using a weighted sum to determine

a global utility function. The weights α and β are chosen such that α ≥ 0, β ≥ 0, and

α+ β = 1.

Ug (−→ρ , T ) = α · UTtotal (−→T ) + β · UStotal (

−→S ) (2.9)

Once the global utility is calculated, it is sent to an autonomic security manager compo-

nent that changes the security policies to maximize utility given the current environment.

For example, a spike in workload might cause the system to lower security requirements to

maintain a target response time, giving it more flexibility than a static defense [75][76].

18

Chapter 3: MTD Quantification Framework

This chapter covers our threat model, underlying assumptions, and an overview of the quan-

tification framework, including the mathematical model. Two case studies with applications

that exemplify how the model might be used and how MTD effectiveness is computed are

also included.

3.1 Threat Model and Assumptions

The general nature of the model lets us make very broad, worst-case assumptions about the

cyber threats we are trying to protect against. These assumptions drive much of the design

of the model and will also be noted again later in this chapter where applicable.

We assume that attackers can exploit any possible attack vector against the defender.

Most techniques described in the literature only protect against a narrow subset of possible

attacks and no single MTD can protect against all possible attack vectors. This is handled

by the model by incorporating the notion of combining multiple MTDs to provide a defense-

in-depth solution against any potential attack vector.

We also make the worst-case assumption that no static defense will ever succeed in

stopping attackers, because an attacker has virtually unlimited time to plan and execute an

attack and unknown, 0-day vulnerabilities that can evade static defenses will always exist.

Only MTDs are considered to have an effect on the attacker’s success rate, and even then,

an MTD may not be perfect in its defense.

Finally, we must assume that attackers can be stopped or at least slowed down by pre-

venting them from acquiring accurate knowledge about the target system. The primary

focus here is on the reconnaissance phase, when that knowledge is gathered in order to plan

and execute attacks. The defender’s goal can be achieved by either preventing attackers

19

from accessing that knowledge altogether or by delaying them from acquiring the knowl-

edge until it is no longer useful. This is one of the primary strengths of MTDs as proactive

defenses that can shift the balance of power back to the defender.

We also make several additional simplifying assumptions throughout this chapter that

are summarized here. Future work will allow for revision of many of these assumptions in

order to further generalize the approach.

We assume that services and weaknesses as we define them are time-invariant. We also

assume that services and knowledge blocks as we define them are independent, but multiple

services with dependencies could be modeled. We currently assume that each MTD has

a predefined optimal configuration of its parameters, and that if multiple MTDs affect a

knowledge block, they do not interact and only the most effective one is considered.

3.2 Quantification Framework

It is the goal of this work to develop a unified framework to evaluate the joint effect of

multiple techniques with respect to both effectiveness and cost/overhead. By developing

the capability to quantify MTD techniques, we can also compare any two techniques or sets

of techniques and determine an optimal deployment.

As shown in Figure 3.1, the MTD quantification framework consists of four layers: (i) a

time-invariant service layer that represents the set S of services to be protected; (ii) a

weakness layer that represents the set W of general classes of weaknesses that may be

exploited; (iii) a knowledge layer that represents the set K of all possible blocks of knowledge

required to exploit those weaknesses; and (iv) an MTD layer that represents the set M of

available MTD techniques.

As a motivating example, we consider a SQL service running with an overly simplified

set of weaknesses, required knowledge blocks, and three MTDs available to protect it, as

seen in Figure 3.1.

20

S1SQL DB

W1SQL

Injection

W2Buffer

Overflow

M1Service

Rotation

M2IP Rotation

M3ASLR

K2Knows(IP)

K1Knows(service)

K3Knows(memory)

Layer 4MTD

Layer 3Knowledge

Layer 2Weakness

Layer 1Service

Figure 3.1: Quantification Framework Layers

3.2.1 Mathematical Model

The proposed MTD quantification framework can be formally defined as a 7-tuple

(S,RSW ,W,RWK ,K,RKM ,M) to capture the relationships between the different layers,

where:

• S, W, K, M are the sets of services, weaknesses, knowledge blocks, and MTD tech-

niques, respectively,

• RSW ⊆ S ×W represents relationships between services and the common weaknesses

they are vulnerable to,

• RWK ⊆ W×K represents relationships between weaknesses and the knowledge blocks

required for an attacker to exploit them, and

• RKM ⊆ K × M represents relationships between knowledge blocks and the MTD

techniques that affect them.

The proposed model induces a k-partite graph (with k = 4) G = (S∪W∪K∪M,RSW ∪

RWK ∪RKM ).

21

3.2.2 4-Layer Model

This k-partite graph can be represented as a 4-layer model, which is described in greater

detail here.

The first layer represents the set S of services we wish to protect. From the attackers’

point of view, it could also represent a goal state they wish to reach by exploiting a weakness.

We assume that the services are time-invariant, i.e., the nature of the services does not

change over time, and they cannot be taken down to prevent attacks, as this action would

result in a denial-of-service.

For the sake of presentation, we only consider one service in all of the case studies in

this work, but the model could be extended to consider multiple services running with

dependencies between them, similar to how an exploit chain might occur within attack

graphs.

The second layer represents the set of weaknesses W that services are vulnerable to.

We choose general classes of weaknesses rather than specific vulnerabilities because there

are too many vulnerabilities to enumerate and, depending on the MTD used, the specific

vulnerabilities may change over time. Using general weaknesses when building the model

makes them time-invariant.

The examples used in this work draw these weaknesses primarily from MITRE’s Com-

mon Weakness Enumeration (CWE) project [77], particularly from those known as the

“Top 25 Most Dangerous Software Errors.” Although many of the top software errors are

primarily the result of bad coding practices and better solved at development time, the top

software errors such as SQL Injection, OS Injection, and Classic Buffer Overflow are often

addressed at runtime by MTDs and make for good general categories of weaknesses.

The Microsoft STRIDE Threat Model [78] has also been used as a source of general

threats to draw from in MTD research [79] and can fill in areas where CWE may be

lacking. For example, Information Disclosure (eavesdropping) and Denial of Service are

not specifically addressed by CWE.

22

This example shows two weaknesses, SQL Injection and Buffer Overflow. More weak-

nesses such as OS Injection might also be included in a more complex example, while other

weaknesses, such as Cross-Site Scripting, would not be applicable to this service.

The third layer represents the specific knowledge blocks K required to effectively exploit

a weakness. This knowledge might be required to plan an attack even when no MTD is

deployed (such as a victim’s IP address) or it may be an additional piece of information

required due to the use of an MTD. For example, SQLRand adds a keyword to SQL com-

mands, which must be known for an illegitimate user to perform a SQL injection [32]. We

assume that each knowledge block at this layer is independent, and that they must be ac-

quired using different methods. For example, IP address and port number should not both

be chosen as knowledge blocks, as a method to determine one would also reveal the other.

The relationship between the knowledge layer and weakness layer is many-to-many. A

weakness could have several required pieces of knowledge to exploit it, or a knowledge block

may be key to exploiting several weaknesses. This layer of the model may also be extended

over time, as new MTDs are developed which disrupt new and different areas of an attacker’s

knowledge.

In this example, we assume that, in order to execute a SQL Injection attack, the attacker

must know something about the service being run (e.g., name and version of the specific

database software) and the victim’s IP address. In order to execute a Buffer Overflow attack,

an attacker must know the IP address and some information about the vulnerable memory

locations of the system. A higher-fidelity version of this model may take a knowledge block

and break it into smaller, more specific items that are specifically targeted by available

MTDs.

The fourth and last layer of the model represents the set M of available MTDs that

can be implemented to disrupt the attacker’s knowledge required to exploit weaknesses. We

assume that, when using static defenses – i.e., no MTD deployed – an attacker will acquire

all of the knowledge necessary to exploit a weakness with probability P = 1, and we label

the edge (Kj ,Mi) ∈ RKM between an MTD technique Mi and a knowledge block Kj with

23

the probability Pi,j that an attacker will succeed in acquiring knowledge block Kj despite

the deployment of technique Mi.

For example, if technique M1 in Figure 3.1 (Service Rotation) reduces an attacker’s

likelihood of acquiring knowledge block K1 (i.e., correct version of the service) by 60%, we

would label that edge as P1,1 = 0.4. If an MTD delays an attacker by some factor, we can

also express that as a probability that the attacker will not find the correct information in

a timely manner. For example, an MTD that expands addressable memory by a factor of

10 might reduce the attacker’s probability of success to 0.1, so Pi,j = 0.1.

The exact methodology for determining the value of Pi,j may vary from MTD to MTD

and is a separate line of research. For the purposes of this chapter, we assume that such

optimal configuration has already been identified for each available MTD technique, along

with the corresponding value of Pi,j and the corresponding cost.

Part of the future work involves developing a general approach to modeling the relation-

ship between cost and effectiveness of MTD techniques, as we vary the values of a technique’s

tunable parameters and other aspects of the attacker/defender interaction. Ultimately, this

approach will enable us to identify the optimal configuration for each technique.

Expressing MTD effectiveness in terms of the probability an attacker will succeed in

acquiring required knowledge normalizes the values across multiple diverse techniques in the

[0,1] range, with a theoretically perfect MTD yielding Pi,j = 0, and a completely ineffective

MTD yielding Pi,j = 1. These edge weights are the first values used when computing the

overall effectiveness of an MTD or set of MTDs.

In this example, we apply a service rotation MTD scheme to disrupt knowledge of what

version of service is actually running at any given time, and naıvely assume that rotating

between 4 services reduces the attacker’s probability of correctly knowing which service is

running to P1,1 = 0.25. We apply an IP address rotation scheme to mask the victim’s

IP address. We know from the literature that perfect shuffling results in the attacker’s

likelihood of guessing the correct IP address to be 0.63 at best [51], so we use a conservative

estimate for effectiveness and estimate P2,2 = 0.75. Finally, to protect the knowledge of the

24

memory layout, we use an ASLR scheme that expands the addressable memory such that

the attacker has only a P3,3 = 0.1 probability of having the correct information.

3.2.3 Computing MTD effectiveness

We measure an MTD’s effectiveness starting from the top layer of the model and work our

way down to find the overall probability of attacker’s success. First, we define P (Kj) as the

probability that the attacker has the correct information about knowledge block Kj . Then,

we calculate values of P (Kj) for each knowledge block in layer 3, based on the MTDs that

affect them. If there is no MTD or that MTD is not active, we assume that the attacker is

guaranteed to obtain that information, i.e., P (Kj) = 1.

In this example, each knowledge block has only one MTD that affects it. If multiple

MTDs affect a knowledge block, we can make the simplifying assumption that the resulting

effect is equal to the effect of the best-performing MTD. Therefore:

P (Kj) =

1, if @Mi ∈M s.t. (Kj ,Mi) ∈ RKM

minMi∈M s.t. (Kj ,Mi)∈RKM

Pi,j , otherwise(3.1)

A possible improvement to the model would be to capture the effect of multiple MTDs

acting on the same knowledge block by using some function that would show either di-

minishing returns or other interactions of multiple MTDs acting on the same knowledge

blocks.

Next, we determine the probability P (Wk) that an attacker has gained all the knowledge

required to exploit a given weakness Wk. Since each knowledge block is independent, this

is simply the product of the probabilities associated with all knowledge blocks leading to it.

P (Wk) =∏

Kj∈K s.t. (Wk,Kj)∈RWK

P (Kj) (3.2)

In this example, when calculating P (W1) and P (W2) for SQL Injection and Buffer

25

Overflow, respectively, we obtain:

P (W1) = 0.25 · 0.75 = 0.1875

P (W2) = 0.75 · 0.10 = 0.075

Finally, we must determine the defender’s utility U gained by deploying MTD tech-

niques, based on the reduced probability of exploit for each class of weaknesses. One

potential utility measure could be a function of the probability P (Sl) that an attacker

can compromise a service Sl by exploiting any of the weaknesses leading to it. P (Sl) can

be computed as the probability of the union of non-mutually exclusive events, using the

Inclusion-Exclusion principle. With respect to our running example, P (S1) can be com-

puted as follows:

P (S1) = P (W1 ∪W2) = P (W1) + P (W2)− P (W1 ∩W2) (3.3)

Because W1 and W2 are not necessarily independent (as we see in this example), we

cannot assume P (W1 ∩W2) = P (W1) · P (W2). Instead, we must express each P (W ) in

terms of its corresponding independent knowledge blocks Kj ,

P (W1) = P (K1) · P (K2)

P (W2) = P (K2) · P (K3)

P (W1 ∩W2) = P (K1) · P (K2) · P (K3)

and then express P (S1) as a function of probabilities P (Kj):

P (S1) = P (K1) · P (K2) + P (K2) · P (K3)− P (K1) · P (K2) · P (K3)

26

which results in

P (S1) = 0.25 · 0.75 + 0.75 · 0.1− 0.25 · 0.75 · 0.1 = 0.244

For graphs with 3 or more weaknesses, we can expand Eq. 3.3 to the generalized form

of the Inclusion-Exclusion Principle [80]:

P

⋃Wk∈W

Wk

=

|W|∑i=1

(−1)i−1 ·∑

W∗∈2W s.t. |W∗|=i

P

⋂Wj∈W∗

Wj

We can then compute P (Sl) programatically based on the graph model using the fol-

lowing algorithm:

Algorithm 1 ComputeGoalProbability(W,K,RWK)

Input: A set of weaknesses W, a set of knowledge blocks K, and the set RWK of edgesbetween them

Output: P (Sl), the probability of at least one weakness being exploited1: P (Sl)← 02: for i = 1 to |W| do

3: for all W∗ ∈ 2W s.t. |W∗| = i do4: K∗ ← Kj ∈ K|∃Wk ∈ W∗ s.t. (Wk,Kj) ∈ RWK5: prod← ∏

Kj∈K∗P (Kj)

6: P (Sl)← P (Sl) + (−1)i−1 · prod7: end for8: end for9: return P (Sl)

Finding the probability of the union of multiple events is an NP-hard problem that

cannot be solved in better than O(2n) time [80]. However, the general nature of the weak-

nesses represented in layer 2 of the model should naturally limit their number and keep the

running time of the algorithm manageable, as opposed to vulnerabilities which may number

in the thousands.

Once we obtain P (Sl), we can easily calculate the utility function U = 1 − P (Sl) or

use P (Sl) as the input to another utility function, such as a sigmoid with an inflection

27

S1SQL DB

W1SQL

Injection

W2Buffer

Overflow

M1Service

Rotation

M2IP Rotation

M3ASLR

K2Knows(IP)

K1Knows(service)

K3Knows(memory)

P1,1 = 0.25 P2,2 = 0.75 P3,3 = 0.1

P(W2) = 0.075P(W1) = 0.188

P(S1) = 0.244U = 0.756

P(K1) = 0.25 P(K2)= 0.75 P(K3)= 0.1

Figure 3.2: Computing MTD Effectiveness

point centered around a desired effectiveness, such as those commonly used in autonomic

computing [81]. The complete computations for each of the values in this example are

shown in Figure 3.2.

Note that this choice of utility function relies upon the expectation that at least some

measure of protection will be guaranteed on at least one knowledge block for each weakness,

otherwise the attacker will be guaranteed to exploit that weakness and reduce the utility to

0. A utility function that may solve this issue could be a weighted average of the probabilities

to exploit each weakness, similar to a measure of risk.

3.3 Experimental Evaluation

We now present a more complex example which demonstrates the capabilities of the model.

As seen in Figure 3.3, we keep the same basic service but protect it against two additional

classes of weaknesses, OS Injection (from the CWE [77]) and Eavesdropping (related to In-

formation Disclosure from the STRIDE model [78]). Because there are now four weaknesses

that contribute to the utility function, we must perform the additional calculations under

28

the Inclusion-Exclusion Principle to compute the union of events that lead the attacker to

compromise the service.

P1,1

P4,4

P5,5

P8,7

P10,8

P11,10

P(K1) P(K2 ) P(K3 ) P(K10 )P(K4 ) … P(K9 )…P(K8 )

P(W1) P(W2) P(W3) P(W4)

P(S1)U

P2,1 P2,4 P2,5

P3,2 P3,3

P6,4 P6,10

P7,8P7,6

P9,5 P9,9

Additional constraint: each MTD has a cost CSolve for max(U) within a certain budget

S1(SQL DB)

SQL Injection

Buffer Overflow

M1Service

RotationM4

IP Rotation(MOTAG)

M8ASLR

Knows(IP)Knows (application)

Knows(syscall_mapping)

OS Injection

Knows(OS)

M5OS Rotation

Eaves-dropping

Knows(path)Knows (instr_set)

Knows (stack_dir)

Knows(DBschema)

Knows (keyword)

M3SQLRand

M11Distraction

Cluster

M7Multivariant

Systems

M6Mutable

Networks

M2Intrusion-

Tolerant Sys

M9TALENT

M10Reverse Stack

Execution

Knows (mem_address)

P(S1): Chance attacker reaches goalU = Utility factor: chance of 0 exploits occurring

Pi,j = probability of attacker having knowledge, based on disruption from 1 MTDP(Kj) = probability of attacker having knowledge, based on all MTDsP(Wk) = probability of attacker having all required knowledge (exploit occurs)

Figure 3.3: Case Study Quantification Framework

In several cases, the knowledge blocks required to exploit a weakness have been ex-

panded to provide more detail or to fit the specific MTDs selected for the case study. For

example, knowledge block Knows(Memory) has been broken into separate blocks related to

system call mapping, memory address, and stack direction, and SQL Injection now requires

knowledge of keywords appended to SQL commands and some knowledge of the database

schema, both of which are disrupted by SQLRand.

Most importantly, we can now observe the many-to-many relationships between weak-

nesses, knowledge blocks, and MTDs and conclude that finding the optimal solution is no

longer trivial. However, as long as we have accurate values of Pi,j for each MTD and some

cost constraint, we can determine the final utility as a function of selected MTDs using the

steps previously shown and find an optimal solution using a problem solving method of our

choice, such as stochastic hill climbing or evolutionary methods.

29

Table 3.1: Sample Case Study Evaluation

MTD Pi,j Cost Active? Pi,j (effective) Cost (effective)

M1 (Service Rotation) P1,1 0.500 15 No 1.000 0

M2 (Intrusion Tolerant Systems) P2,1 0.900 25 No 1.000 0P2,4 0.900 1.000P2,5 0.900 1.000

M3 (SQLRand) P3,2 0.300 20 No 1.000 0P3,3 0.300 1.000

M4 (IP Rotation/MOTAG) P4,4 0.900 25 No 1.000 0

M5 (OS Rotation) P5,5 0.700 15 No 1.000 0

M6 (Mutable Networks) P6,4 0.500 20 Yes 0.500 20P6,10 0.500 0.500

M7 (Multivariant Systems) P7,6 0.500 20 No 1.000 0P7,8 0.500 1.000

M8 (ASLR) P8,7 0.500 10 Yes 0.500 10

M9 (TALENT) P9,5 0.500 20 No 1.000 0P9,9 0.500 1.000

M10 (Reverse Stack Execution) P10,8 0.500 20 No 1.000 0

M11 (Distraction Cluster) P11,10 0.500 20 No 1.000 0

Knowledge: Total Cost 30Knows(application) 1.000 Total Budget 120Knows(keyword) 1.000Knows(DBschema) 1.000 Cost:Knows(IP) 0.500 High 25Knows(OS) 1.000 Medium 15Knows(syscall mapping) 1.000 Low 5Knows(mem address) 0.500Knows(stack dir) 1.000 Effectiveness:Knows(instr set) 1.000 High 0.3Knows(path) 0.500 Medium 0.5

Low 0.9Chance of attack success:SQL Injection 0.500OS Injection 0.250Buffer Overflow 0.250Easvesdropping 0.250

Chance of attacker success: 0.500Utility 0.500

As a proof of concept, we can take the model in Figure 3.3 and perform all the necessary

computations programatically. For the purpose of this example, we use qualitative values

for Pi,j and cost from an expert survey [3] which estimates the relative effectiveness and

cost of several MTD techniques by grouping them into categories of Low, Medium, or High.

Whether or not an MTD is active can be treated as a Boolean variable, with inactive MTDs

30

implying an attacker’s probability of success of 1 and a cost of 0.

The values from a sample MTD setup are shown in Table 3.1. The interim calculations

for the probabilities of each knowledge block being acquired and each weakness being able

to be exploited are also shown.

3.4 Applications

Now that we have a method to measure the effectiveness of an MTD deployment, we can

compare MTDs even if they affect vastly different aspects of a system by comparing their

overall results on the general classes of exploitable weaknesses. We show examples of two

different ways the framework may be applied to observe how MTDs might affect the overall

security of a service.

3.4.1 Comparing MTDs

Given a setM of MTD techniques, we would like to identify the one which adds the highest

overall utility.

With respect to the example of Figure 3.3, we start from a baseline deployment including

M6 (Mutable Networks) and M8 (ASLR) to ensure we have a starting utility value to

compare with. The starting deployment is the same one shown earlier in Table 3.1. We

then measure the updated utility value after individually adding each of the other MTDs

to the baseline deployment, and examine the results reported in Table 3.2.

Table 3.2: Improvement from Adding MTDs

MTD Utility DeltaM1 (Service Rotation) 0.5625 0.0625

M2 (Intrusion Tolerant Systems) 0.513 0.013M3 (SQLRand) 0.614 0.114

All Others 0.500 0.000

From the results, we find that, given the preexisting condition of M6 and M8 being

31

active, M3 (SQLRand) offers the greatest increase in utility, with M1, M2, and M3 being

the only ones offering any increase at all. To explain these results, we observe that there is

a lower bound on P (S1) that translates into an upper bound on U .

P (S1) ≥ max(P (W1), P (W2), P (W3), P (W4)) (3.4)

In other words, the overall defense can only be as strong as the protection against

exploitation of its most vulnerable weakness, which in turn benefits from the deployment

of multiple MTDs. Therefore, given the baseline conditions, only an MTD that affects the

most vulnerable weakness will yield any improvement in the utility value. This procedure

could be used iteratively in an attempt to find an optimal solution in a greedy manner, but

there would have to be some way to handle cases where no MTD adds any utility (such as

random selection).

3.4.2 Selecting Optimal Defenses

Given that we have a tool that can evaluate the utility of any configuration, we can also

solve for the optimal selection of MTDs, given the constraints that the presence of each

MTD is a Boolean variable (either present or not) and that the sum of the costs of selected

MTDs be under a given budget. Formally, we can express this as:

Maximize U(m1,m2, · · · ,mn)

s.t.n∑i=1

(cost(Mi) ·mi

)≤ budget mi ∈ 0, 1 ∀i (3.5)

For the purpose of evaluating the framework and making the problem interesting, we

select a value for the budget (120) approximately halfway between 0 and the total cost of

deploying all available MTDs (i.e., 210). This choice ensured that a solution with utility

greater than 0 would be found and that approximately half the MTDs would be chosen

32

as part of the optimal solution. This example was solved using the Generalized Reduced

Gradient Non-linear algorithm [82] with random restarts to eliminate finding local maxima.

After solving, we obtain an optimal solution with the selected MTD highlighted graphically

in Figure 3.4 and full results shown in Table 3.3.

P1,1

P4,4

P5,5

P8,7

P10,8

P11,10

P(K1) P(K2 ) P(K3 ) P(K10 )P(K4 ) … P(K9 )…P(K8 )

P(W1) = 0.023 P(W2) = 0.063 P(W3) = 0.063 P(W4) = 0.25

P(S1) = 0.313U = 0.687

P2,1 P2,4 P2,5

P3,2 P3,3

P6,4 P6,10

P7,8P7,6

P9,5 P9,9

S1(SQL DB)

SQL Injection

Buffer Overflow

M1Service

RotationM4

IP Rotation(MOTAG)

M8ASLR

Knows(IP)Knows (application)

Knows(syscall_mapping)

OS Injection

Knows(OS)

M5OS Rotation

Eaves-dropping

Knows(path)Knows (instr_set)

Knows (stack_dir)

Knows(DBschema)

Knows (keyword)

M3SQLRand

M11Distraction

Cluster

M7Multivariant

Systems

M6Mutable

Networks

M2Intrusion-

Tolerant Sys

M9TALENT

M10Reverse Stack

Execution

Knows (mem_address)

Figure 3.4: Case Study Optimal Configuration

From here we can observe that our choice of a utility function forces the selection of

a variety of MTDs such that each weakness has at least one MTD affecting one of its

knowledge blocks and that protection is evenly spread over the 4 weaknesses.

Visually, we can also observe that an MTD with the ability to affect multiple knowledge

blocks is inherently more powerful than one that only affects one. However, if their cost is

too high or effectiveness too low, it will still not be chosen as part of an optimal solution.

Similarly, an MTD that only affects one knowledge block may be chosen if it is effective,

low-cost, or affects a knowledge block that still receives relatively weak protection from

other MTDs.

33

Table 3.3: Case Study Optimal Configuration

MTD Pi,j C Active? Pi,j (effective) C (effective)

M1 (Service Rotation) P1,1 0.500 15 Yes 0.500 15

M2 (Intrusion Tolerant Systems) P2,1 0.900 25 No 1.000 0P2,4 0.900 1.000P2,5 0.900 1.000

M3 (SQLRand) P3,2 0.300 20 Yes 0.300 20P3,3 0.300 0.300

M4 (IP Rotation/MOTAG) P4,4 0.900 25 No 1.000 0

M5 (OS Rotation) P5,5 0.700 15 No 1.000 0

M6 (Mutable Networks) P6,4 0.500 20 Yes 0.500 20P6,10 0.500 0.500

M7 (Multivariant Systems) P7,6 0.500 20 Yes 0.500 20P7,8 0.500 0.500

M8 (ASLR) P8,7 0.500 10 Yes 0.500 10

M9 (TALENT) P9,5 0.500 20 Yes 0.500 20P9,9 0.500 0.500

M10 (Reverse Stack Execution) P10,8 0.500 20 No 1.000 0

M11 (Distraction Cluster) P11,9 0.500 20 No 1.000 0

Knowledge: Total Cost 105Knows (1,application) 0.500 Total Budget 120Knows (1,keyword) 0.300Knows (1,DBschema) 0.300 Cost:Knows (1,IP) 0.500 High 25Knows (1,OS) 0.500 Medium 15Knows (1, syscall mapping) 0.500 Low 5Knows (1, Mem Address) 0.500Knows (1,stack dir) 0.500 Effectiveness:Knows (1,instr set) 0.500 High 0.3Knows (1,path) 0.500 Medium 0.5

Low 0.9Chance of attack success:SQL Injection 0.023OS Injection 0.063Buffer Overflow 0.063Easvesdropping 0.250

Chance of attacker success: 0.313Utility 0.687

3.5 Combining MTDs

However, one of the challenges in implementing the framework is determining the values of

Pi,j for each MTD and capturing the interactions between MTDs.

We present here experiments performed as a proof of concept to show how multiple

34

MTDs might be combined and their effects measured, along with results. Based on this

testbed, we can measure the attacker’s success rate in a scenario with actual attacks being

made against an MTD to observe if there are any interactions between MTDs and validate

future analytical results.

3.5.1 Experimental Testbed

Our experimental testbed uses the open-source Citrix XenServer1 environment to create and

manage all of our virtual machines. We created six instances of the Metasploitable VM2 for

our targets which contain a number of open vulnerabilities to test against. For our attack

platform, we used a separate server using Kali Linux3 which comes with the Metasploit

Framework, a popular penetration testing platform. Metasploit contains a variety of exploits

that can be scripted from the command-line interface, several of which are effective against

the unpatched version of the Apache web service that comes on Metasploitable. We also

created an independent process to monitor the web server from the point of view of a

legitimate user to see the MTD’s effect on system availability. A block diagram of the setup

is shown in Figure 3.5.

The scripted attack against the web server follows a straightforward attack pattern.

First, the attacker scans the network to obtain a list of all reachable IP addresses and open

ports on the network. If a VM appears with more than four open ports, it is determined to

be a candidate for attack. A more detailed scan is then performed against port 80 on each

candidate VM to identify the web service running on it. If the service is determined to be

the vulnerable Apache service, the script then configures and launches an attack against the

service. If the attack is successfully able to achieve a shell session, the attack is considered

a success. In each trial, we record the number of successes (out of six possible attempts).

We implemented service rotation by installing two additional web services on each VM:

1Available at https://xenserver.org/2Available at https://sourceforge.net/projects/metasploitable/3Available at https://www.kali.org/downloads/

35

Attacker

VM Host

VM

VM

VM

VM

VM

Monitor

VM

Figure 3.5: Experimental Setup for Combined MTD Experiments

lighttpd4 and NGINX5. These web services operate mutually exclusive of each other, but

are all capable of serving the same PHP content. A script on each VM independently

reconfigures the VM by stopping the current web service and starting a new one at a

exponentially distributed random interval. The downtime for changing services is relatively

short, and we tested settings with average interarrival rates of 120, 60, 30, and 10 seconds,

in addition to the static case.

We implemented IP rotation by making use of a feature in DHCP to assign new IP

addresses. The IP rotation script on each VM takes the interface down, uses a utility to

randomly change the MAC address, and brings the interface back up again. The DHCP

server, seeing a new MAC address, assigns a new IP address. DHCP still maintains all

address lease records, preventing duplicate IP addresses and re-using abandoned IP leases.

The process of rehoming an IP address is much longer than for a service, so we tested

4https://www.lighttpd.net/5https://www.nginx.com/resources/wiki/

36

settings with average interarrival rates of 120, 60, and 30 seconds, in addition to the static

case.

To prevent conflicts with the MTDs interfering with each other, both MTDs ran in sepa-

rate threads with a locking mechanism to prevent one MTD from starting a reconfiguration

while the other was still performing one. For each combination of settings, we performed

100 trials and measured attacker success using two different metrics: the average number

of successful exploits (out of six possible), and overall chance of attacker’s success, with the

number of times out of 100 the attacker was able to compromise at least one VM. The mon-

itoring node checks the web server every 0.1 sec and calculates availability as the number of

successful requests divided by the total number of requests. The monitor also records the

service running on the node to verify that each service accounts for approximately 33.3%

of the uptime.

3.5.2 Experimental Results and Observations

For service rotations. we measured the average number (out of six) VMs successfully com-

promised per trial and charted their distributions on a histogram, shown in Figure 3.6. As

we can see, in the case where the interarrival rate = 120 sec, the distribution centers around

two VMs, which is similar to what would be expected of a static case with two vulnerable

VMs out of six available. However, as the reconfiguration rate increases, the distribution

moves to the left as more trials result in fewer VMs compromised.

Two charts showing the overall results from the monitor process are shown in Fig-

ures 3.7a and 3.7b. We observe that each chart shows each service had on average the same

uptime as the others, and that the trials with an average interarrival rate of 60 seconds

had a higher overall availability than those with an average interarrival rate of 10 seconds.

Other results from the monitor process for other interarrival rates showed similar results in

the average share of each web service.

Figures 3.8a and 3.8b show the overall attacker success rate (defined as the probability

that one or more VMs are compromised) and availability for each setting of the individual

37

0

5

10

15

20

25

30

35

40

45

50

0 1 2 3 4 5

Co

un

t

Number of Successful Attacks

Interrarival rate = 10 sec Interrarival rate = 30 sec

Interrarival rate = 60 sec Interrarival rate = 120 sec

Figure 3.6: Histogram of Number of VMs Compromised for Service Rotation

31.36%

33.31%

34.44%

0.89%

Apache2

Nginx

Lighttpd

Unavailable

(a) Interarrival Rate = 60 sec

31.48%

30.95%

32.97%

4.60%

Apache2

Nginx

Lighttpd

Unavailable

(b) Interarrival Rate = 10 sec

Figure 3.7: Monitor Results for Service Rotation

38

MTDs. We note that in both cases, the MTDs are effective at reducing the attacker’s success

rate. Service rotation reduces attacker’s success rate to 54%, while IP rotation reduces it

to 29%. However, for each increase in reconfiguration rate, there is also a corresponding

decrease in availability. Service rotation reduces availability as low as 95.4%, while IP

rotation reduces it as low as 74.3%.

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Static IAR = 120 IAR = 60 IAR = 30 IAR = 10

Attacker Sucess Rate Availability

(a) Service Rotation

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Static IAR = 120 sec IAR = 60 sec IAR = 30 sec

Attacker Sucess Rate Availability

(b) IP Rotation

Figure 3.8: Attacker Success Rate and Availability for Service and IP Rotation

Figures 3.9a and 3.9b show the same data while directly comparing values for the two

MTDs side by side. Here we also observe that while IP rotation is much more effective than

service rotation, it also has the drawback of a far greater reduction in availability.

Complete data for each combination of MTD settings is shown in Tables 3.4 and 3.5.

Here we observe that in most cases, the earlier trends of attacker success rate and availabil-

ity both dropping as reconfiguration rates increase hold true here as well. However, we also

observe that when both MTDs are active, the attacker success rate and availability may

not change, or even rise. This may be due to large confidence intervals or the fact that the

locking mechanism that prevents both MTDs from reconfiguring at once means that the

two MTDs are not truly independent. For example, IP rotation is observed to be the more

39

0.0

0.2

0.4

0.6

0.8

1.0

1.2


Service Rotation IP Rotation

(a) Attacker Success Rate

0.0

0.2

0.4

0.6

0.8

1.0

1.2


Service Rotation IP Rotation

(b) Availability

Figure 3.9: Comparison of Service and IP Rotation on Attacker Success Rate and Avail-ability

expensive operation, costing several seconds of downtime for each reconfiguration and im-

pacting availability accordingly. But if there are more frequent service rotations occurring,

the costly IP rotation operations might be delayed, reducing the overall effective reconfig-

uration rate and lessening the effect on attacker success rate and availability. An area for

future research would be to predict these delays and compute the effective reconfiguration

rate, along with more experimental trials to validate that analysis and reduce the observed

error.

Table 3.4: Attacker Success Rates for all Combinations of Interarrival Rates

IP RotationStatic 120s 60s 30s

ServiceRotation

Static 1.000±0.000 0.880±0.064 0.610±0.096 0.290±0.089120s 0.860±0.068 0.640±0.094 0.550±0.098 0.220±0.08160s 0.900±0.059 0.550±0.098 0.430±0.097 0.140±0.06830s 0.750±0.085 0.550±0.098 0.490±0.098 0.150±0.07010s 0.540±0.085 0.280±0.088 0.210±0.080 0.110±0.061

40

Table 3.5: Availability for all Combinations of Interarrival Rates

IP RotationStatic 120s 60s 30s

ServiceRotation

Static 1.000±0.000 0.915±0.003 0.819±0.005 0.743±0.005120s 0.997±0.000 0.909±0.002 0.838±0.003 0.692±0.00360s 0.991±0.001 0.793±0.003 0.782±0.003 0.647±0.00330s 0.981±0.001 0.855±0.002 0.832±0.003 0.711±0.00310s 0.954±0.001 0.807±0.003 0.761±0.003 0.710±0.003

3.6 Conclusions

We have introduced an MTD quantification framework that has the potential to model

numerous classes of MTDs. However, the challenges of fully realizing this framework are

twofold: for each MTD, we must determine the probability of knowledge disruption Pi,j

individually, and we must also model the MTD cost. Cost could be defined as actual cost

of acquisition or in terms of overhead or performance cost.

We have already shown as an experimental proof of concept how the effectiveness of

multiple MTDs might be measured, along with their effect on system availability. This is

explored further in the next chapters where we present an analytical model that can be

used to predict the effectiveness of MTDs that perform periodic reconfigurations, as well as

their impact on system availability and response time, as response time is a frequently used

metric when determining performance cost [73,81].

41

Chapter 4: Performance Modeling of Moving Target

Defenses

4.1 Introduction

To determine the individual effectiveness and cost of an MTD in our quantification frame-

work, we perform a mathematical analysis of the MTD in question to predict its perfor-

mance. While this may vary from technique to technique, in this chapter we present a model

that can accurately predict the effects of MTDs that are based on periodic reconfigurations

and represent a wide class of the available techniques in the literature.

We consider here a reconfiguration scheme with multiple identical resources serving

requests. Occasionally, at random intervals, we reconfigure those resources in some way.

We use Continuous Time Markov Chains (CTMC) to determine the probability distribution

of the number of resources that are being reconfigured and then use this distribution to

determine the probability distribution of the number of service requests in the system. This

distribution is then used to compute the average number of service requests in the system

and the average response time. The distribution of resources being reconfigured can also

be used to compute the average age of a resource (i.e., the average time between the last

reconfiguration event and the next). This interval is used to determine the probability that

an attacker succeeds during that time.

In our simulations and experiments, we also observe a major difference between predicted

steady-state response times and actual response times based on periodic states of instability

and introduce a metric that quantifies that phenomenon, and introduce an optimization

method that find the optimal reconfiguration rate subject to minimum stability and security

requirements.

42

4.2 Quantitative Analysis of MTDs

The computing environment we consider in this chapter consists of c similar resources (e.g.,

VMs) available to serve incoming service requests that arrive at an average rate λ, join a

single queue, and are served by any of the available resources, with an average service time

S. A generic MTD technique consists in each resource occasionally, at random intervals,

reconfiguring itself independently of the other resources. Thus, each resource handles service

requests as well as reconfiguration requests. While a resource is being reconfigured, it is not

available to handle service requests. Without reconfigurations, the system behaves exactly

like an G/G/c queue [83].

Now, assume that each resource is reconfigured at an average rate of α. As an example,

a reconfiguration could entail swapping out a VM with a clean instance, similar to how

SCIT [84] operates: the VM then comes back online with a new IP address, implementing

a form of IP hopping. These reconfigurations make it more difficult for an attacker to learn

about the VMs, and disrupt attacker’s persistence in the system. The attacker’s success

probability is a function of the average reconfiguration rate. The reconfiguration rate α

also affects the average number of resources available to serve requests (see Figure 4.1).

Reconfigurations reduce resource availability and ultimately increase queuing and response

time.

While these qualitative tradeoffs are intuitive and not surprising, there is a need for

quantitative models for determining the impact of the reconfiguration rate on resource

availability, response time of service requests, and attacker’s success probability. We use

Continuous Time Markov Chains (CTMC) to compute the probability distribution of the

number of resources being reconfigured as a function of α and other parameters and then use

that distribution to determine resource availability and response time, among other metrics.

Markov chains have been used for many decades to study various aspects of computer and

communication systems. The novelty in each case is in how the state of a CTMC should

be defined to represent the system to be analyzed.

Figure 4.2, the framework for our analytic models, shows the reconfiguration model R

43

beingreconfigured

inusebyaservicerequest

availableforuse

λ

Figure 4.1: Queuing Representation of the Reference Scenario

at the top and, at the bottom, the performance model S. The reconfiguration model takes

as inputs the rate α at which resources are reconfigured, the average reconfiguration time

S, and the number of resources c, and produces as outputs the availability of resources, the

average number of resources available, and the probability distribution pk of the number

of available resources. This distribution, along with the number of resources, the average

arrival rate of service requests, and the average service time of requests are inputs to the

performance model, which produces the probability distribution Pk of the number of

service requests in the system and the average response time of requests.

We analyze this generic MTD in three steps: (i) analysis of the effect of the reconfigu-

ration rate α on the probability distribution of available resources; (ii) analysis of the effect

of that availability on response time; and (iii) determination of the attacker’s probability of

success based on reconfiguration rate.

4.2.1 Reconfiguration Model

Figure 4.3 helps explain the basic equations that govern an MTD process. Table 4.1 sum-

marizes the names and descriptions of all variables defined here.

44

Reconfiguration

Model (R)

Performance

Model (S)

Reconfiguration rate

Reconfiguration time

Number of resources

probability

distribution of

number of

available resources

Availability

Avg. no. available resources

Request arrival rate

Request avg. service time

Avg. response time

Figure 4.2: Analytic Model Framework

Table 4.1: Summary of Variable Names and Descriptions

Variable Description

Ps(t) Probability that t time units are needed for a successful attack

Ts Time needed for an attacker to succeed. Ps(Ts) = 1

c Number of resources

c Average number of resources not being reconfigured

N Average number of resources being reconfigured

α Reconfiguration rate (in rec/sec)

S Average time to reconfigure a resource

XReconfiguration throughput, i.e., the aggregate rate at which resourcesstart a reconfiguration operation

λ Average arrival rate of requests to use a resource

T Average time a request spends using a resource

R Average response time of requests

45

Consider that there are c resources (e.g., VMs) that are reconfigured at regular time

intervals. Each resource cycles through a period in which it is available for use and a period

in which it is being reconfigured (see Figure 4.3). Resources are reconfigured independently

of one another at a rate of α reconfigurations per time unit. Thus, the average time a

resource is available for use between the end of a reconfiguration operation and the start

of the next is 1/α. We refer to this as the average age of a resource, which is our primary

security metric used in determining the likelihood of attacker’s success, described later in

Section 4.2.3.

Let c be the average number of resources available for use (i.e., not being reconfigured)

and N be the number of resources being reconfigured. Thus,

c = c+ N (4.1)

Applying Little’s Law [83] to the set of resources we obtain

c = X × (1/α) (4.2)

where X is the system’s reconfiguration throughput (or throughput for short), i.e., the

aggregate rate at which resources complete their reconfiguration, which is the collective

rate at which resources are reconfigured.

Let S be the average time it takes for a resource to complete the reconfiguration pro-

cess. For example, a reconfiguration process could include the time to complete all running

transactions in progress at a server, changing its configuration file, shutting down the server,

and re-booting it. Applying Little’s Law [83] to the set of resources being reconfigured, we

obtain:

N = X × S (4.3)

46

Adding Eqs. 4.2 and 4.3 and combining the result with Eq. 4.1 we obtain:

c = c+ N = X(S + 1/α) (4.4)

We can rewrite Eq. 4.4 in order to express the reconfiguration rate α as a function of

the number of resources c, the time to reconfigure S, and the throughput X:

α = X/(c− S ×X) (4.5)

c

1...

resources

reconfigura,onprocess

α

α

X

N

cS

Figure 4.3: Reconfiguration Cycle

We use the CTMC of Figure 4.4 to compute X. The state k (k = 0, · · · , c) in this

CTMC represents the number of resources in the reconfiguration box of Figure 4.3. Thus,

the number of available resources is c− k.

An expression for pk (k = 0, · · · , c) is obtained by using the general birth-death equation

47

0 1 2 k c-1c-2 c... ...

αc α(c-1) α(c-k+1) α(c-k) 2 α α

1/S 2/S k/S (k+1)/S (c-1)/S c/S

Figure 4.4: CTMC for the Reconfiguration Model

for Markov Chains [83]:

pk = p0

k−1∏i=0

γiµi+1

k = 1, · · · , c (4.6)

p0 =

[1 +

c∑k=1

Πk−1i=0

γiµi+1

]−1

(4.7)

where γk = α · (c − k) for k = 0, · · · , c − 1 is the aggregate rate at which resources are

reconfigured when there are k resources being reconfigured and µk = k/S for k = 1, · · · , c

is the aggregate rate at which resources complete their reconfiguration when there are k

resources being reconfigured. Using the expressions for γk and µk in Eqs. 4.6 and 4.7 we

obtain

pk = p0

k−1∏i=0

α · (c− i)(i+ 1)/S

= p0 · (α · S)k

ck

k = 1, · · · , c (4.8)

An expression for p0 is obtained by noting that the sum of all probabilities is equal to 1.

Thus,

p0 =

1 +

c∑k=1

(α · S)k

ck

−1

(4.9)

48

The values of pk can be easily computed because the summation needed to compute p0

is finite. Given pk and p0 one can then compute the average throughput X as

X =c∑

k=1

(k/S) · pk =1

S

c∑k=1

k · pk (4.10)

The average number of available resources can now be computed by combining Eqs. 4.2

and 4.10:

c =X

α=

1

α · Sc∑

k=1

k · pk (4.11)

The availability A of the set of resources is then given by the fraction of resources

available for use, i.e.,

A = c/c = X/(α · c) (4.12)

It turns out that the availability does not depend on the number of resources but only

on the product of the reconfiguration rate and the reconfiguration time. This can be seen

by combining Eqs. 4.2, 4.4, and 4.12.

A = c/c = (X/α)/c = (1 + α · S)−1 (4.13)

When there is no reconfiguration (i.e., α = 0), the availability is 1 as expected. Eq. 4.13

can be used to determine the values of the product α·S necessary to guarantee an availability

greater than or equal to some value Amin.

α · S ≤ 1

Amin− 1 (4.14)

49

4.2.2 Response Time Model

The c resources are used for some computational purpose and requests to use any of them

arrive at a rate of λ requests per unit of time and are served by any of the available

resources. If no resources are available, a request has to wait in a queue. The number of

resources available for service varies from 0 to c due to reconfiguration (see Figure 4.1). The

probability that k resources are available for service is given by the probability pc−k that

c− k resources are being reconfigured (see Eq. 4.8).

We use the CTMC of Figure 4.5 with an infinite number of states, where a state k =

0, 1, 2, · · · represents the number of service requests in the system, either using one of the

available resources or waiting for one. Note that the system of Figure 4.1 is similar to an

M/M/c queuing system with an important difference. In an M/M/c model, the rate at

which transactions complete is kµ for k = 1, · · · , c and cµ for k > c where µ is the average

rate at which a request completes from a resource. In our case, as explained above, the

transaction completion rate has to consider that resources may be in the process of being

reconfigured. Thus, we follow an approach similar to the development of the results for

the M/M/c queue [83], with a modification in the average transaction completion rate.

Consider the following additional notation:

• Pk: probability that there are k requests in the system (either being serviced or in

the waiting line).

• πj : probability that j resources are available for use (i.e., not being reconfigured),

thus πj = pc−j .

• µδk: average request departure rate at state k. The value of δk is

δk =

k∑j=1

j πj k = 1, · · · , c (4.15)

because the departure rate is µ if only one resource is available (which happens with

50

probability π1), 2µ if only two resources are available (which happens with probability

π2), · · · , and kµ if k resources are available (which happens with probability πk). Note

that δc =∑c

j=1 j πj = c.

• µ = 1/T : average service rate of each resource.

• ρ = λ/(µ c): average utilization of the resources.

0 1 2 c k+1k... ...

λ λ λ λ λ

µδ1

...

µδ2 µδc µδc µδc

Figure 4.5: CTMC for the Response Time Model

As Figure 4.5 shows, the transition rate from state k to k + 1 is λ, the average request

arrival rate, and the transition rate βk from a state k to state k − 1 is given by

βk =

µ δk k < c

µ δc k ≥ c(4.16)

We can now use the generalized birth-death equations (see Eqs. 4.6 and 4.7) to solve for

Pk and P0. We have to break down the expression for Pk into two parts (for k = 1, · · · , c

and k > c) because βk has two expressions. Hence,

Pk = P0 Πk−1i=0

λ

µ δi+1= P0

(λµ

)kΠk−1i=0 δi+1

k = 0, · · · , c (4.17)

51

and

Pk = P0 Πc−1i=0

λ

µ δi+1Πk−1i=c

λ

µ δc= P0

ρk cc

Πc−1i=0δi+1

k = c+ 1, · · · (4.18)

P0 can now be computed as

P0 =

1 +

c∑k=1

(λµ

)kΠk−1i=0 δi+1

+

∞∑k=c+1

ρk cc

Πc−1i=0δi+1

−1

(4.19)

If we move cc/Πc−1i=0δi+1 out of the infinite summation in the above expression, we are left

with the following geometric series, which converges for ρ < 1:

∞∑k=c+1

ρk =ρc+1

1− ρ (4.20)

Note that ρ < 1 is a necessary but not sufficient condition for the system to be stable

as discussed is Section 4.4.2. P0 can now be written as the following easily computable

expression:

P0 =

1 +c∑

k=1

(λµ

)kΠk−1i=0 δi+1

+cc

Πc−1i=0δi+1

ρc+1

1− ρ

−1

(4.21)

The average number of requests in the system can be computed as

Ns =

∞∑k=1

k · Pk =

c∑k=1

k · Pk +

∞∑k=c+1

k · Pk (4.22)

52

The infinite summation in Eq. 4.22 can be written as

P0cc

Πc−1i=0δi+1

[ ∞∑k=c+1

k ρk

]= P0

cc

Πc−1i=0δi+1

[ρ∂

∂ρ

∞∑k=c+1

ρk

]

and is equal to

P0cc

Πc−1i=0δi+1

ρc+1

1− ρ

[ρ

1− ρ + 1 + c

](4.23)

Thus, Eqs. 4.22 and 4.23 allow to compute Ns. Finally, using Little’s Law we compute

the average response time R as R = Ns/λ.

4.2.3 Analysis of Attack Success Probability

The time required for an attacker to acquire sufficient knowledge during the reconnaissance

phase is very helpful in the determination of the attacker’s success. We define the probability

that an attacker succeeds as a function of the time available to complete the reconnaissance

phase. The probability Ps(t) that an attacker succeeds to succeed in attacking a resource

in t time units is important in determining the required reconfiguration rate, i.e., the rate

at which resources needs to be reconfigured.

Figure 4.6 shows two examples of Ps(t): linear and exponential functions. The linear

function, Ps(t) = t/Ts, indicates that the probability of attack success increases linearly with

time and reaches 1 (i.e., success) at time Ts. The exponential function (see for instance

Eq. 4.24) indicates a situation in which the attacker initially accumulates knowledge at a

low rate and then becomes exponentially more knowledgeable over time and succeeds at

time Ts.

Ps(t) = 1− 1− e(t−Ts)

1− e−Ts . (4.24)

As an example, consider an IP sweep combined with a port scan, where the attacker’s

53

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10

Ps

Time (sec)

Linear Ps Exponential Ps

Figure 4.6: Probability of Success Ps vs. Time for Ts = 10

goal is to discover the IP address of the machine running a specific service within target

network. The attack consists in sequentially scanning all IP addresses in a given range.

Assuming an IP space of n addresses and that t∗ time units are required to scan a single IP,

we obtain Ts = n ·t∗ and Ps(t) = tTs

= tn·t∗ . As another example, consider the following DoS

attack. The attacker initially compromises n hosts, which takes t∗ time units. Then, each

of the newly compromised hosts compromises additional n hosts, which takes additional

t∗ time units. At any given time t, the total number of compromised hosts, including the

attacker’s machine, is N(t) = 1 + n + n2 + . . . + nk = 1−nk+1

1−n , where k = bt/t∗c. We

can assume that the attacker’s success probability is proportional to the aggregate amount

of flood traffic that compromised hosts can send to the victim, compared to the victim’s

capacity to handle incoming traffic. Let V denote the volume of traffic the victim can

handle per time unit and let v denote the amount of traffic each compromised node can

send per time unit. Then,

Ps(t) = min

1,N(t) · vV

= min

1,v

V· 1− nbt/t∗c+1

1− n

54

4.3 Simulation and Experimental Testbed

Our analytical results were validated by simulation and by experiments that implemented a

generic MTD. We implemented the simulation using SimPy1, a process-based discrete-event

simulation framework based on standard Python. SimPy supports multiple processes that

contend for access to a resource and automatically handles queuing of events if a resource

is busy, making it ideal for our purposes. Additionally, we used SimPy as a real-time event

generator to control VM reconfigurations and implement a fully operational MTD. For our

VM environment, we used Citrix’s open-source XenServer platform2, which offers pooling of

resources, the ability to quickly clone VMs for reconfiguration, and a command-line interface

that is compatible with our simulation framework.

Our MTD controller runs on a separate server and starts an independent process for

each VM – either a simulated VM or an actual VM in the XenServer pool – that generates

reconfiguration requests. Our experiments show that S is normally distributed, so we used

a normal distribution for S with the same mean and standard deviation as was observed in

the experiments.

Reconfigurations may consist of a number of possible actions, including changing the

IP address or software. In our experiments, we remove a VM instance from the virtual

network and replace it with a fresh copy, similar to how SCIT operates [84]. The fresh copy

also has a new IP address obtained from DHCP, enabling a basic IP-hopping scheme. The

reconfiguration process also collects statistics such as any possible sources of internal delays

within the implementation.

The MTD controller also serves as a traffic generator that creates service requests. Each

service request is an independent process with exponentially distributed interarrival times

with an average arrival rate equal to λ and average service time T in the simulations.

For the experiments, an HTTP request is sent to an idle VM, which has a scripted delay

on the HTTP response with average time T to simulate the time to process a generic

1Available at https://simpy.readthedocs.io/en/3.0/.2Available at https://xenserver.org/

55

service request. Each process records the time at which it was generated, began service,

and completed according to the environment’s internal clock. These records are used to

compute queue time, service time, and response time and are maintained for each request.

We also collect statistics from a separate monitor process that operates at set intervals

to gather information about the number of resources idle, in use, and being reconfigured,

as well as the current queue length. An overview of the system and processes is shown in

Figure 4.7.

. . .

VM1

VMc

VM2MTD Controller /

Traffic Generator

a)

b)

c)

Figure 4.7: Experimental Setup: a) c independent processes to generate reconfigurationrequests (arrival rate α), b) process to generate independent service requests (arrival rateλ), c) monitor process (every 0.01 sec)

The pool of VMs is tracked using three separate states for the VMs: idle, in use (i.e.,

serving a request), or being reconfigured. All requests for a VM must first acquire a shared

resource that gives them access to the pool of idle VMs. A priority queue is used, giving

priority to reconfiguration requests so that reconfiguration is not unnecessarily delayed;

however, reconfiguration requests for a specific VM will not preempt a request currently

being served. Instead, the reconfiguration request flags that VM for reconfiguration and

56

then releases its lock on the idle pool before waiting for that resource to appear in the pool

of VMs to reconfigure. When service requests receive access to the idle pool, they remove

a random VM from the pool and place it in the pool of VMs in use. Once completing the

request, if that VM is flagged for reconfiguration, it is placed in the reconfiguration pool

where the reconfiguration request will pick it up for reconfiguration, otherwise it is placed

back in the idle pool. In the event that a service request finds no VMs in the idle pool, it

waits for one to appear. This additional wait is included in the overall queue time. The

overall flow of control and VM state transitions is shown in Figure 4.8.

d)

e)

b)

a)VM Pools

c)

VM Movement

Requests

f)

Figure 4.8: Control Flow and VM Movement: a) incoming requests, b) priority queue, c)resource lock on idle pool, d) idle VM pool, e) VMs in use, f) reconfiguring VMs

Each iteration of the simulation lasted 6,000 seconds, with no statistics recorded in the

first 1,000 seconds to allow the system to achieve steady-state. Thirty runs were performed

for values of α from 0.001 to 0.050 to obtain the mean, standard deviation, and 95%

confidence intervals for the mean for each statistic. For the experiments, each run is limited

to 600 seconds with statistics recorded after the first 60 seconds for select values of α. The

values of the other input parameters used in the simulations and experiments are given in

57

Table 4.2.

Table 4.2: Values of Variables used in Numerical Results

Variable DescriptionTs 300 secc 20α from 0.001 to 0.050 rec/secS 120 secλ 10 requests/secT 0.5 sec

4.4 Numerical Results and Validation

This section presents several numerical results starting with those obtained from the re-

configuration model along with validation of the model. We then cover the performance

model, including some interesting findings from our implementation of the model. Finally,

we show how the two models can be used to find an optimal value of the reconfiguration

rate that considers tradeoffs between response time and security.

4.4.1 Reconfiguration Model

Figure 4.9 shows the distribution of the number k of resources being reconfigured for four

values of the reconfiguration rate α, out of a total of 20 resources. The graphs show that

as α increases from 0.005 to 0.04 rec/sec, the probability distribution moves to the right.

The average number of resources being reconfigured is 7.50 for α = 0.005 rec/sec, going up

to 16.55 for α = 0.04 rec/sec.

The reconfiguration probabilities pk are used to compute the availability. Figure 4.10

shows three availability curves as a function of the reconfiguration rate α for values of the

reconfiguration time S equal to 60, 90, and 120 seconds, respectively. As the reconfiguration

rate increases, the availability decreases in a non-linear fashion and, as the reconfiguration

time increases, the availability decreases for the same value of α. As the reconfiguration

rate tends to zero, the availability tends to 1 because all resources are available for use.

58

0.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Pro

ba

bil

iity

# of Resources Reconfiguring

α = 0.005 α = 0.01 α = 0.02 α = 0.04

Figure 4.9: Distribution of the Number of Resources Being Reconfigured for c = 20

Note that, as we indicated in Section 4.2, the availability does not depend on the number

of resources c.

We validated the analytic model using the simulation and experiments, using the mean

and standard deviation of the reconfiguration times measured in the experiments. We find

that the probability distribution generated by the simulation closely matches that of the

analytical model, as seen in Figure 4.11.

The theoretical availability results match very well the results obtained by the simula-

tions and experiments as shown in Table 4.3, which indicates that for the same range of

values of α used in Figure 4.10, the percentage absolute relative error between the model and

the simulation does not exceed 2.29% and the error between the model and the experimental

does not exceed 9.62%.

59

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.0

01

0.0

03

0.0

05

0.0

07

0.0

09

0.0

11

0.0

13

0.0

15

0.0

17

0.0

19

0.0

21

0.0

23

0.0

25

0.0

27

0.0

29

0.0

31

0.0

33

0.0

35

0.0

37

0.0

39

0.0

41

0.0

43

0.0

45

0.0

47

0.0

49

Av

ail

ab

ilit

y

α (rec/sec)

S = 60 S = 90 S = 120

Figure 4.10: Availability vs. Reconfiguration Rate α

0.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Pro

ba

bil

ity


Analytical Simulation

Figure 4.11: Comparison of Number of Resources being Reconfigured (α = 0.02 rec/sec)

60

Table 4.3: Comparison of Availability Results.

α Model Simulation Results Absolute Experimental Results AbsoluteResults ± 1/2 of 95% CI % Error ± 1/2 of 95% CI % Error

0.005 0.696 0.694 ± 0.004 0.29 0.686 ± 0.015 1.460.010 0.467 0.466 ± 0.004 0.21 0.465 ± 0.017 0.430.015 0.329 0.330 ± 0.003 0.30 0.318 ± 0.017 3.460.020 0.253 0.255 ± 0.003 0.78 0.247 ± 0.011 2.430.030 0.171 0.175 ± 0.002 2.29 0.156 ± 0.007 9.620.040 0.132 0.133 ± 0.002 0.75 0.121 ± 0.007 9.09

4.4.2 Response Time Model

We now summarize our results about the response time and then explain them in detail.

The key conclusions are: (i) the response time model closely matches the simulation results

for a range of values of α for which the system is stable most of the time (we formally

define stability later); (ii) for larger values of α there is a high variation of the utilization

around its average ρ = λ T/c, which causes the system to become unstable (i.e., ρ > 1) for

non-negligible fractions of time: as a consequence, the queue and the response time grow

infinitely.

As indicated in Section 4.2, ρ must be less than 1. As ρ tends to 1, the response time

tends to infinity because of the term (1−ρ) that appears in the denominator of the expression

of the average number of requests in the system. Because λ and T are assumed constant in

this section (λ = 10 requests/sec and T = 0.5 sec), the variation of ρ depends on c, which

decreases with the availability, which in turn decreases as α increases (see Table 4.3). Thus,

ρ < 1⇒ c > λT ⇒ c > 5.

Before we present the variation of the response time as a function of α, it is instructive

to compare graphs of run-time data captured for α = 0.005 and α = 0.015 in Figures 4.12a

and 4.12b respectively. From this data, we observe a higher coefficient of variance (COV)

for the number of available resources and for the response time. For α = 0.005, the COV

for available resources is 0.123 and for response time is 1.007; but for α = 0.015, these

values go up to 0.287 and 1.676, respectively. More importantly, we also notice that in both

61

0

2

4

6

8

10

12

14

16

18

0

2

4

6

8

10

12

14

16

18

# o

f A

va

ila

ble

Re

sou

rce

s

Re

spo

nse

Tim

e (

sec)

Time (sec)

Response Time # Available Avg # Available

(a) α = 0.005

0

2

4

6

8

10

12

14

16

18

0

2

4

6

8

10

12

14

16

18

# o

f A

va

ila

ble

Re

sou

rce

s

Re

spo

nse

Tim

e (

sec)

Time (sec)

Response Time # Available Avg # Available

(b) α = 0.015

Figure 4.12: Number of Available Resources and Response Time for Two Trials with Dif-fering Values of α

cases, c > 5 as denoted by the dashed line in the graphs, thus so ρ < 1. However, for α

= 0.015, there are periods of time where there are 5 or fewer resources available, causing

a spike in response time. During these periods, ρ ≥ 1 and the queue of service requests

grows infinitely. Furthermore, even as the number of available resources returns above the

minimum required, there is a lagging effect on response times returning to normal as there

are built-up service requests in the queue. Thus, a metric such as ρ alone does not capture

well the effect of episodic instability. To better quantify this effect, we introduce a metric

ω, which we call stability, defined as the fraction of time the system is in a stable state (i.e.,

ρ < 1):

ω =∑k∈N

pk (4.25)

where N = k ∈ 0, 1, · · · , c ∧ λT/(c − k) < 1. Because ω depends on the probabilities

pk, it is a function of α, and we use ω(α) to denote that relationship. Then, a system is

stable for a given set of parameters if ω(α) ≈ 1 because it is almost never in a situation

where ρ > 1. The algorithm to compute ω(α) is listed below.

62

Algorithm 2 ComputeStability(c, λ, T, pk)Input: Resource count c, arrival rate λ, service time T , probability distribution pkOutput: Stability ω

1: ω ← 02: for k = 0→ c do3: if λT/(c− k) < 1 then4: ω ← ω + pk5: end if6: end for7: return ω

Figure 4.13 shows ω superimposed over response time results obtained through simula-

tion and from the analytic model. As we can see, for low values of α, ω is very close to 1 and

the simulation matches the analytic results. As α increases, we observe that the response

time is very sensitive to small decreases in stability. When α = 0.015 rec/sec, ω = 0.775,

which means that the system is unstable 22.5% of the time with requests rapidly building

up in the queue, causing a higher than expected value and variance in the response time.

A possible solution is to limit the number of resources reconfiguring at any one time to

ensure that there are sufficient resources available to handle the expected workload, which

is discussed in the next chapter.

4.4.3 Optimal Reconfiguration Rate

The model presented in Section 4.2 allows one to answer a variety of “what-if” questions

such as “How does the resource availability vary with the time needed to reconfigure a re-

source?” or “How does the average response time of service requests vary with the average

reconfiguration rate?” Additionally, one can solve optimization problems such as maximiz-

ing the reconfiguration rate subject to the following constraints: (i) the stability must be

greater than or equal to a threshold ωmin, and (ii) the average response time must be less

than or equal to a threshold Rmax. More precisely,

Maximize α

s.t. ω ≥ ωmin and R ≤ Rmax

63

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0.0

01

0.0

02

0.0

03

0.0

04

0.0

05

0.0

06

0.0

07

0.0

08

0.0

09

0.0

1

0.0

11

0.0

12

0.0

13

0.0

14

0.0

15

0.0

16

0.0

17

0.0

18

0.0

19

0.0

2

0.0

21

0.0

22

0.0

23

0.0

24

Re

spo

nse

Tim

e (

sec)

Sta

bil

ity

α (rec/sec)

Stability Simulation Analytic

Figure 4.13: Response Time: Simulation vs. Analytical Model with Stability

Because the stability decreases monotonically with α and the response time R increases

monotonically with α (see Figure 4.13), the maximum feasible value αmax of α is

αmax = min(αω, αR) (4.26)

where

αω = argmaxαω ≥ ωmin

αR = argmaxαR ≤ Rmax (4.27)

Consider Figure 4.14 and S = 60 sec, c = 20, ωmin = 0.9, and Rmax = 0.75 sec.

Then, αω = 0.023 rec/sec to satisfy the stability constraint. However, in order to satisfy

the response time constraint, αR = 0.036 rec/sec as illustrated in Figure 4.14. Therefore,

64

α ≤ min (0.023, 0.036) = 0.023 rec/sec. This means that each resource will be available, on

average, for 1/α = 1/0.023 = 43.5 seconds before it is reconfigured.

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

4.00

4.50

5.00

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Re

spo

nse

Tim

e (

sec)

Sta

bil

ity

α (rec/sec)

Stability Response Time

Rmax

αR

ωmin

αω

Figure 4.14: Optimization Analysis to Find the Maximum Feasible Reconfiguration Rate(α) for c = 20 and S = 60 sec

We now consider the interplay of the maximum reconfiguration rate α = 0.023 rec/sec

obtained in the optimization example above and the probability of a successful attack.

Consider a linear function for the probability Ps(t) with Ts = 100 sec. In that case, the

probability that an attacker succeeds after 43.5 sec is 43.5/100 = 0.435. However, if the

knowledge accumulation rate has the exponential form of Eq. 4.24, the probability of a

successful attack at t = 43.5 is

Ps(43.5) = 1− 1− e(43.5−100)

1− e−100≈ 0 (4.28)

In other words, the optimal reconfiguration rate yields a relatively large probability

65

that the attacker is successful if knowledge can be accumulated linearly but a close-to-zero

probability of success when knowledge is accumulated at very slow pace initially.

4.5 Conclusions

The results presented in this chapter are encouraging and indicate this is a promising re-

search direction. However, as mentioned earlier, a possible solution to address the instability

issue is to limit the number of resources reconfiguring at any one time to ensure that there

are sufficient resources available to handle the expected workload.

Two potential policies to enforce a minimum number of available resources are described

in the following chapter, followed by a mathematical analysis of both, with simulations and

experiments to validate the updated model. We also define a utility function that combines

the trade-off between response time and attack success probability and use it to find the

optimal reconfiguration rate that maximizes the system’s utility.

66

Chapter 5: Performance Modeling of Moving Target

Defenses With Reconfiguration Limits

5.1 Introduction

In the previous chapter, we used Continuous Time Markov Chains (CTMC) to predict the

distribution of number of available resources and corresponding response time for MTDs

that involve periodic reconfiguration that removes resources from use. Our simulations and

experiments validated part of this analysis but we also observed response times trending

upward far sooner than initially anticipated due to the system being periodically overloaded,

despite having enough resources on average to meet the expected workload.

In this chapter, we introduce two possible policies that enforce a minimum number of

available resources, followed by a mathematical analysis of both, again using CTMCs, and

results of simulations and experiments that validate the updated model. We also introduce a

utility-based method to allow users to control the trade-off between availability and security

and determine the optimal reconfiguration rate.

5.2 Updated Analytic Model Overview

To ensure that there is always a minimum number of resources available to handle service

requests, we consider policies that limit the maximum number c∗ of resources being re-

configured. If c∗ resources are being reconfigured, additional reconfiguration requests may

either be dropped (drop policy) or queued (wait policy). We analyze this generic MTD in

three steps: (i) analysis of the effect of the reconfiguration rate α on the probability distri-

bution of available resources; (ii) analysis of the effect of that availability on response time;

67

and (iii) calculation of the effective reconfiguration rate and determination of the attacker’s

probability of success.

Our analytic model is derived from the queuing representation shown in previous chapter

in Figure 4.1. As seen in Figure 5.1, the model is similar to the model shown previously.

However, the reconfiguration model and performance model also now take as an additional

input the maximum number of resources c∗ that can be reconfigured at the same time.

The reconfiguration and performance models are solved using CTMCs as explained

in the next two sections and there is also a new middle section to the model used to

iteratively combine the results of the reconfiguration and performance models as explained

in Section 5.5. Table 5.1 contains the names and descriptions of all variables, including new

variables used in our analysis of the two new reconfiguration policies.

Reconfigura+onModel(R)

PerformanceModel(S)

Targetreconfigura+onrate

Reconfigura+on+me

Numberofresources

probabilitydistribu+onofnumberofresourcesbeingreconfigured

Availability

Avg.no.availableresources

Requestarrivalrate

Requestavg.service+me

Avg.response+me

Max.no.resourcesthatcanbereconfiguredsimultaneously

Numberofresources

Probabilitydistribu+onofno.ofservicerequestsinthesystem

Max.no.resourcesthatcanbereconfiguredsimultaneously

Effec+vetargetreconfigura+onrate

AdjustReconfigura+onRate

Figure 5.1: Analytic Model Framework

68

Table 5.1: Summary of Variable Names and Descriptions

Variable Description

Ps(t) Probability that an attacker needs t time units to launch a successful attack

Ts Time needed for an attacker to succeed. Ps(Ts) = 1

c Number of resources

c∗ Maximum number of resources than can be in the process of being reconfigured

ca Minimum number of resources that are available for use. ca = c− c∗

c Average number of resources not being reconfigured

nr Average number of resources being reconfigured

nqrAverage number of waiting reconfiguration requests(this number is zero for the dropped policy)

α Target reconfiguration rate (measured in rec/sec)

α′ Effective reconfiguration rate (measured in rec/sec)

S Average time to reconfigure a resource

pk Probability that there are k reconfiguration requests in the system

pk Probability that k resources are being reconfigured

PkProbability that there are k service requests in the system(being served or waiting to be served)

λ Average arrival rate of service requests

T Average time a service request spends using a resource

R Average response time of service requests

5.3 Reconfiguration Models

This section presents the models for the drop and wait reconfiguration policies. The policies

are both closely based on the CTMCs of the previous reconfiguration model, with slight

modifications necessary to account for the new policies. The following subsection describes

core results that apply to both reconfiguration policies, while in later subsections we provide

specific results for each policy.

5.3.1 Core Results

For both polcies, each resource cycles through periods in which it is available for use or is

being reconfigured. We use k, (k = 0, . . . , c) to denote the number of number of resources

being reconfigured. Several useful results can be obtained from the probabilities pk that k

resources are being reconfigured. These probabilities are a function of the reconfiguration

rate α, the average time S to reconfigure a resource, the number of resources c, and the

maximum number c∗ of resources that can be reconfigured at the same time. Note that c∗

69

is a parameter set by the system admins to control the tradeoff between performance and

availability as we discuss later. Thus,

pk = f(k, α, S, c, c∗) (5.1)

To derive the core results, we assume we know the values of pk, and then show in

subsequent sections how these probabilities can be obtained for each reconfiguration policy.

Let c be the average number of resources available for use (i.e., not being reconfigured) and

nr the average number of resources being reconfigured. Thus,

c = c+ nr (5.2)

But, nr can be obtained from the probabilities pk as

nr =c∗∑k=1

k · pk (5.3)

The availability A of the set of resources is given by the fraction of resources available

for use, i.e.,

A =c

c= 1−

∑c∗

k=1 k · pkc

. (5.4)

While α is the target reconfiguration rate, it cannot always be achieved because the

start of a reconfiguration may be delayed for some time d due to a reconfiguration request

being dropped or queued. This is illustrated in Figure 5.2, which shows the effective time

between reconfigurations. This time is also the average age of each of the resources and

defines the effectiveness of the MTD.

1/α′ = 1/α+ d (5.5)

70

Therefore, the effective reconfiguration rate is α′ = α/(1 + α · d). It turns out that the

value of the delay d depends primarily on the results of the reconfiguration model R, but

is also influenced by whether the resource is idle or not when a reconfiguration is scheduled

to start, which depends on the results of the performance model S. The cyclic dependency

between the two models is addressed in detail in Section 5.5. The next section derives the

equations for the reconfiguration model assuming that the effective reconfiguration rate is

equal to the target rate (i.e., d = 0). Then, in Section 5.4, we derive the performance model

for service requests as a function of the probability distribution of the number of resources

being reconfigured.

Endofareconfigura.on Scheduledstartofnewreconfigura.on

Actualstartofnewreconfigura.on

1/α1/α’

d

Figure 5.2: Target and Effective Reconfiguration Rate

5.3.2 Drop Reconfiguration Requests Policy

Figure 5.3 illustrates the flowchart of the reconfiguration process for the drop policy. If

the number of resources k being reconfigured is equal to c∗, a request to reconfigure is

dropped and a new reconfiguration request is generated after 1/α time units on average.

If the threshold c∗ has not been reached and the resource to be reconfigured is idle (i.e.,

not handling a service request), k is incremented by 1, the resource is reconfigured, k is

decremented by 1, and a new reconfiguration request is generated after 1/α time units on

average. If the resource to be reconfigured is not idle, the reconfiguration has to wait for

the resource to become available.

71

Generatereconfigura-onrequest

k≥c*?

istheresourceidle?

kçk+1

kçk-1

reconfiguretheresource

waitfortheresourcetobecomeidle

YES

YES

NO

NO

Figure 5.3: Flowchart of the Reconfiguration Cycle under the Drop Policy

We now use the CTMC of Figure 5.4 to compute pk, (k = 0, . . . , c∗), the probability

that k resources are being reconfigured. The state k in the CTMC of Figure 5.4 represents

the number of reconfiguration requests in the system, which in the case of the drop policy

is also the number of resources being reconfigured, so pk = pk, k = 0, . . . , c∗.

An expression for pk (k = 0, . . . , c∗) is obtained by using the general birth-death equation

for CTMCs [83]:

pk = p0 ·k−1∏i=0

γiµi+1

k = 1, . . . , c∗ (5.6)

p0 =

[1 +

c∗∑k=1

Πk−1i=0

γiµi+1

]−1

(5.7)

where γk = α · (c − k), for k = 0, . . . , c∗ − 1, is the aggregate rate at which resources are

72

0 1 2 k c*-1 c*... ...

αc α(c-1) α(c-k+1) α(c-k) α(c-c*+1)

1/S 2/S k/S (k+1)/S c*/S

MarkovChainforthecaseinwhichreconfigura?onrequestsaredroppedwhenthethresholdismet.

Figure 5.4: State Transition Diagram of the Markov Chain for the Reconfiguration Modelunder the Drop policy

reconfigured when there are k resources being reconfigured and µk = k/S, for k = 1, . . . , c∗,

is the aggregate rate at which resources complete reconfiguration when there are k resources

being reconfigured. Using the expressions for γk and µk in Eqs. 5.6 and 5.7, we obtain

pk=p0 ·k−1∏i=0

α(c− i)(i+ 1)/S

=p0(α·S)k

ck

k = 1, . . . , c∗ (5.8)

An expression for p0 is obtained by noting that the sum of all probabilities is equal to

1. Thus,

p0 =

1 +

c∗∑k=1

(α · S)k

ck

−1

(5.9)

The values of pk can be easily computed because the summation needed to compute p0

is finite.

In the drop policy, a reconfiguration request is dropped if it arrives when the number of

resources being reconfigured is equal to c∗. Thus, the drop probability , pd, can be computed

as the ratio of the rate of reconfiguration requests that arrive at state k = c∗ multiplied by

the probability of being at that state, to the sum of the aggregate rates γk = α(c − k) of

73

reconfiguration requests across all states k = 0, . . . , c∗. Thus,

pd =pc∗ α (c− c∗)∑c∗

k=0 pk α (c− k)=

pc∗ (c− c∗)∑c∗

k=0 pk (c− k)(5.10)

We can now compute the average age of a resource, i.e., the average time it takes

for a resource to be reconfigured after its last reconfiguration. The probability that a

reconfiguration request is dropped exactly j times is pdj · (1 − pd). If a reconfiguration

request is dropped exactly j times, the average age of the resource will be (j + 1) · 1/α

because 1/α is the average time between successive reconfiguration requests. Thus, the

average age of a resource under the drop policy is

aged =∞∑j=0

j + 1

α· pdj · (1− pd) =

1

α · (1− pd)(5.11)

5.3.3 Wait Reconfiguration Requests Policy

The flowchart of the reconfiguration cycle under the wait policy is depicted in Figure 5.5.

This flowchart is very similar to that of Figure 5.3, with the difference that when the

threshold c∗ has been reached, the reconfiguration request is not dropped. Instead, it waits

until the the number k of resources being reconfigured drops below c∗.

To analyze the wait policy, we consider the CTMC of Figure 5.6 in which the state

k (k = 0, . . . , c) represents the number of reconfiguration requests in the system, either

being processed or waiting to be processed. As before, pk is the probability that there are

k reconfiguration requests in the system.

An expression for pk (k = 0, . . . , c) is obtained by using the general birth-death equation

for Markov Chains given by Eqs 5.6 and 5.7, where γk = α(c − k), for k = 0, . . . , c − 1,

is the aggregate rate at which reconfiguration requests are generated when there are k

reconfiguration requests in the system and the aggregate reconfiguration completion rate

µk for k = 1, . . . , c is given by

74

Generatereconfigura-onrequest

k≥c*?

istheresourceidle?

kçk+1

kçk-1

reconfiguretheresource

waitfortheresourcetobecomeidle

YES

YES

NO

NO

waitfork<c*

Figure 5.5: Flowchart of the Reconfiguration Cycle under the Wait Policy

... ... ...0 1 2 k c* c*+1 c-1 c

αc α(c-1) α(c-k+1) α(c-k) α(c-c*+1) α(c-c*) 2α α

1/S 2/S k/S (k+1)/S c*/S c*/S c*/S c*/S

MarkovChainforthecaseinwhichreconfigura?onrequestsarequeuedwhenthethresholdismet.

Figure 5.6: State Transition Diagram of the Markov Chain for the Reconfiguration Modelunder the Wait Policy

75

µk =

k/S k = 1, . . . , c∗

c∗/S k = c∗ + 1, . . . , c(5.12)

Using the expressions for γk and µk in Eqs. 5.6 and 5.7 we obtain

pk=p0 ·k−1∏i=0

α(c− i)(i+ 1)/S

= p0(α · S)k

ck

k = 1, . . . , c∗ (5.13)

and

pk = p0 ·c∗−1∏i=0

α(c− i)(i+ 1)/S

k−1∏i=c∗

α(c− i)c∗/S

= p0(α·S)kc!

c∗!c∗k−c∗(c− k)!

k = c∗+1, . . . , c (5.14)

An expression for p0 is obtained by noting that the sum of all probabilities is equal to

1. Thus,

p0 = (1 + S1 + S2)−1 (5.15)

where

S1 =

c∗∑k=1

(α · S)k

ck

(5.16)

and

76

S2 =c!

c∗!

c∑k=c∗+1

(α · S)k1

c∗k−c∗(c− k)!

(5.17)

The values of pk can be easily computed because the summations needed to compute

p0 are finite. The values of pk, the probability that k resources are being reconfigured, can

be computed as a function of pk as pk = pk for k = 0, . . . , c∗ − 1 and pc∗ =∑c

k=c∗ pk.

In fact, when the number of reconfiguration requests in the system is smaller than c∗, all

reconfiguration requests cause a resource to be reconfigured. When a reconfiguration request

finds the number of resources being reconfigured equal to the threshold, and this happens

with probability∑c

k=c∗ pk, the request has to wait.

One can compute the throughput Xr of reconfiguration requests as a function of pk as

Xr =1

S

c∗∑k=1

k · pk (5.18)

and the average number Nr of reconfiguration requests in the system as

Nr =c∑

k=1

k · pk (5.19)

Using Little’s law, we can the determine the average time in the system for reconfigura-

tion requests as Rr = Nr/Xr. This corresponds to the sum of the average reconfiguration

time S and the average reconfiguration delay d. Thus,

d =Nr

Xr− S (5.20)

We can now determine the average age of each resource under the wait policy as follows.

After a reconfiguration request completes, it takes 1/α time units on average for the next

77

reconfiguration request to arrive. But, the next request may have to wait. The arrival of a

reconfiguration request can occur anytime within the reconfiguration delay d. On average,

that arrival will have to wait d/2 time units. Thus, the average age of a resource agew is

given by

agew = 1/α+ d/2 (5.21)

5.4 Response Time Model

For the performance model, we use the CTMC of Figure 5.7 with an infinite number of

states where a state k = 0, 1, 2, . . . represents the number of service requests in the system,

either using one of the available resources or waiting for one. Service requests are assumed

to come from a Poisson process at an average rate λ and complete at a rate µδk, where

µ = 1/T (the request completion rate at a resource) and δk is derived from the probability

distribution obtained from the reconfiguration model. Note that the queue of Figure 4.1 is

similar to an M/M/c queuing system with an important difference. In an M/M/c model,

the rate at which transactions complete is kµ for k = 1, . . . , c and cµ for k > c. In our case

we need to adapt the transaction completion rate to take into account the resources that

may be in the process of being reconfigured. Thus, we follow an approach similar to the

derivation of the M/M/c queue results [83], with a modification in the average transaction

completion rate.

0 1 2 c k+1k... ...

λ λ λ λ λ

µδ1

...

µδ2 µδc µδc µδc

Figure 5.7: State Transition Diagram for the Response Time Model

Consider the following additional notation: (i) Pk, the probability that there are k

78

requests in the system, either being processed or waiting for an available resource; (ii)

µ = 1/T , the average service rate of each resource; and (iii) ρ = λ/(µ c), the average

utilization of the resources. We now provide and explain an expression for δk, the multiplier

of the resource service rate µ in the CTMC of Figure 4.5. Before providing a general

expression, we discuss a numerical example. Let c = 10 and c∗ = 4. Therefore, there

are ca = c − c∗ = 6 resources always available for service requests because at most 4

resources can be reconfigured at the same time. Thus, when the number of service requests

in the system is at most ca, the average aggregate departure rate is equal to the number of

requests multiplied by µ (i.e., δk = k for k = 1, . . . , 6). Consider, for example, that there

are 8 service requests in the system, thus ca < k < c. If 0, 1, or 2 resources are being

reconfigured, and this happens with probability p0 + p1 + p2, there are enough resources for

all service requests in the system, and the aggregate departure rate is 8µ. If three resources

are being reconfigured, and this happens with probability p3, there is only one resource,

beyond the six, that can be used for service requests. So, the aggregate departure rate is

(6 + 1)µ = 7µ. For the same reason, if four resources are being reconfigured, there are only

6 available resources and the aggregate departure rate is 6µ. Table 5.2 shows the departure

rates for all states in the example considered here.

Table 5.2: Example of the Aggregate Departure Rate for c = 10 and c∗ = 4

State Departure ratek, k = 1, . . . , 6 kµ7 6µ+ µ(p0 + p1 + p2 + p3)8 6µ+ µp3 + 2µ(p0 + p1 + p2)9 6 + µp3 + 2µp2 + 3µ(p0 + p1)10 6µ+ µp3 + 2µp2 + 3µp1 + 4µp0k, k = 11, . . . 6µ+ µp3 + 2µp2 + 3µp1 + 4µp0

The expression for δk can be generalized as shown below. Note that δk = δc for k =

c + 1, . . ., and that δc is the average number of resources that are not being reconfigured

(e.g., see the expression for state 10 in Table 5.2), and can be used to serve service requests.

79

The ratio (ρ · c/δc) = λ/(µ ·δc) can be interpreted as the average utilization of the resources.

δk=

k k = 1, . . . , ca

ca+∑k−ca−1

j=1 j · pc∗−j+

(k − ca)∑c−k

j=0 pj k = ca + 1, . . . , c

ca+∑c∗

j=1 j.pc∗−j k = c+ 1, . . .

(5.22)

As Figure 4.5 shows, the transition rate from state k to k + 1 is λ, the average arrival

rate of requests to the system, and the transition rate βk from a state k to state k − 1 is

given by

βk =

µ δk k < c

µ δc k ≥ c(5.23)

We can now use the generalized birth-death equations (see Eqs. 5.6 and 5.7) to solve for

Pk and P0. We have to break down the expression for Pk into two parts (for k = 1, . . . , c

and k > c) because βk has two expressions. Hence, for k = 1, . . . , c

Pk=P0 Πk−1i=0

λ

µ δi+1=P0

(λ/µ)k

Πk−1i=0 δi+1

=P0(ρ.c)k

Πk−1i=0 δi+1

(5.24)

and, for k = c+ 1, . . .

Pk = P0 Πc−1i=0

λ

µ δi+1Πk−1i=c

λ

µ δc=P0

(ρ · c)kδk−cc Πc−1

i=0δi+1

= P0ρk · δcc

Πc−1i=0δi+1

(5.25)

80

P0 can now be computed as

P0 =

[1 +

c∑k=1

(ρ · c)kΠk−1i=0 δi+1

+∞∑

k=c+1

δccρk

Πc−1i=0δi+1

]−1

(5.26)

If we move δcc/Πc−1i=0δi+1 out of the infinite summation in the above expression, we are left

with the following geometric series, which converges for ρ < 1:

∞∑k=c+1

ρk =ρc+1

1− ρ (5.27)

Hence, P0 can be easily computed as follows:

P0 =

[1 +

c∑k=1

(ρ · c)kΠk−1i=0 δi+1

+δcc

Πc−1i=0δi+1

ρc+1

1− ρ

]−1

(5.28)

Note that Eqs. 5.24, 5.25, and 5.28 simplify to the well-known equations for the M/M/c

queue [83] when c∗ = 0. The average number Ns of requests in the system can be computed

as

Ns =∞∑k=1

k · Pk =c∑

k=1

k · Pk +

∞∑k=c+1

k · Pk (5.29)

The first summation in the expression above is an easy-to-compute finite summation:

P0

c∑k=1

k(ρ · c)k

Πk−1i=0 δi+1

(5.30)

81

The infinite summation in Eq. 5.29 can be written as

P0δcc

Πc−1i=0δi+1

∞∑k=c+1

k · ρk (5.31)

which can be computed as

P0δcc

Πc−1i=0δi+1

[ρ∂

∂ρ

∞∑k=c+1

ρk

](5.32)

and is equal to

P0δcc

Πc−1i=0δi+1

.ρc+1

1− ρ

[ρ

1− ρ + 1 + c

](5.33)

Thus, Eqs. 5.30 and 5.33 allow us to compute Ns. Finally, using Little’s Law we can

compute the average response time R as R = Ns/λ.

5.5 Combined Model

We now consider the fact that when a reconfiguration request arrives to a resource, it may

be busy serving a service request. In this case, the reconfiguration has to wait until the

resource becomes idle. This affects the rate at which reconfigurations occur. Let α′ be the

effective reconfiguration rate, i.e., the rate at which reconfigurations occur. This effective

reconfiguration rate should be used to compute the reconfiguration probabilities pk. The

reconfiguration rate is equal to the inverse of the average time between reconfigurations.

Thus,

1/ α′ = (1/ α) · Pr[idle] + (1/ α+ Tres) · (1− Pr[idle]) (5.34)

where Pr[idle] is the probability that a resource is idle when it is time to reconfigure and

Tres is the average residual service time when the resource is busy. From renewal theory,

Tres = E[T 2]/2E[T ] (5.35)

82

where T is the random variable that represents the service time [83]. If that variable

is exponentially distributed, Tres = E[T ] = T due to the memoryless property of the

exponential distribution. Let Φ = 1 − Pr[idle]. We can then compute Φ as a function

of the probabilities Pk using the law of total probability as indicated by the equation below

that shows values of Φ and their corresponding probabilities.

Φ =

0 P0

k/c Pk, k = 1, . . . , c− 1

1 1−∑c−1k=0 Pk

(5.36)

The explanation behind Eq. 5.36 is the following. When there are no service requests

in the system, and this happens with probability P0, the probability that a reconfiguration

request finds the resource busy is zero. When all resources are busy, and this happens

with probability 1 −∑c−1k=0 Pk, the probability that a reconfiguration request for a specific

resource finds the resource busy is 1. When k (k = 1, . . . , c − 1) resources are busy, the

probability that a reconfiguration request finds a specific resource busy is equal to 1 minus

the probability that it finds the resource idle. Thus, the probability that the resource is

busy is equal to

1−

c− 1

k

/ck

=k

c(5.37)

because the the probability that the resource is idle is equal to the number of ways one

can choose k resources to be busy out of the remaining c− 1 resources divided by the total

number of ways one can select k resources to be busy out of c resources. Thus, using the

83

Law of Total Probability we get,

Φ = 0× P0 +

[1

c

c−1∑k=1

k · Pk]

+ 1× (1−c−1∑k=0

Pk)

=1

c

c−1∑k=1

k · Pk + (1−c−1∑k=0

Pk) (5.38)

We can now rewrite Eq. 5.34 as

α′ =α

1 + αΦT(5.39)

The probabilities Pk depend on the probabilities pk that depend on α′, which depends

on the probabilities Pk. This is a fixed point problem that can be solved iteratively. Let

S(pk, c, c∗, λ, T ) be the service response time model (see Section 5.4) that computes the

probabilities Pk and let R(c, c∗, α, S) be the reconfiguration model (see Section 5.3) that

computes the probabilities pk. The following iterative algorithm, shown in the steps below

and in Algorithm 3, can be used to solve this fixed point problem. The busy probability Φ

is initially set to zero and it is recomputed at Step 5. The difference between the values of

Φ in successive iterations is checked at Step 6 against a given tolerance ξ.

• Step 1. Initialize: i← 0; Φi ← 0;

• Step 2. Compute α′: α′ ← α/(1 + αΦiT );

• Step 3. Compute the reconfiguration probabilities pk: pk ← R(c, c∗, α′, S);

• Step 4. Compute the service request probabilities Pk: Pk ← S(pk, c, c∗, λ, T );

• Step 5. Increment iteration count and compute new value of the busy probability:

i← i+ 1; Φi ← 1c

∑c−1k=1 k · Pk + (1−∑c−1

k=0 Pk);

• Step 6. Check tolerance: if | Φi−Φi−1

Φi|> ξ go to Step 2;

84

• Step 7. Compute the average response time R as a function of the probabilities Pk.

Algorithm 3 AdjustedResponseT ime(c, c∗, λ, α, T, S, ξ)

Input: Resource count c, resource limit c∗, arrival rate λ, reconfiguration rate α, servicetime T , reconfiguration time S, tolerance ξ

Output: Adjusted response time R1: i← 0; Φi ← 02: repeat3: α′ ← α/(1 + αΦiT )4: pk ← R(c, c∗, α′, S)5: Pk ← S(pk, c, c∗, λ, T )

6: i← i+ 1; Φi ← 1c

∑c−1k=1 k · Pk + (1−∑c−1

k=0 Pk)

7: until | Φi−Φi−1

Φi|≤ ξ

8: R← S(pk, c, c∗, λ, T )9: return R

5.6 Simulation and Experimental Testbed

The simulation and experiments were implemented just as before using SimPy and Citrix

XenServer as described in Section 4.3. The additional reconfiguration policies were imple-

mented using the logic described in the flowcharts of Figure 5.3 and Figure 5.5. It should

be noted that when c = c∗, no reconfiguration limits are in place and the system behaves

exactly as it does before in the previous chapter.

The values of the all input parameters used in the simulations and experiments are given

in Table 5.3. The value of c∗ is chosen based on knowing the values of c, λ, and T . We

choose c∗ such that c − c∗ > λ · T , so with λ = 10 and T = 0.5, c − c∗ > 5 and therefore

c∗ = 14. This ensures that at any given point in time, there are at least 6 resources available

to serve requests and resource utilization ρ is always less than 1. In fact, the maximum

value of ρ in this configuration is (λ · T )/6 = 0.833.

5.7 Numerical Results and Validation

This section presents a variety of numerical results starting with results obtained with the

analytic model. Then, simulation is used to validate the analytic results. In what follows,

85

Table 5.3: Values of Variables Used in Simulation Results

Variable Descriptionc 20c∗ 14α from 0.001 to 0.050 req/secS 120 secλ 10 requests/secT 0.5 secTs 300 sec

simulation and experimental results are compared. Finally, we show how the analytic model

derived here can be used to find an optimal value of the reconfiguration rate that considers

tradeoffs between response times and the chance of defending from attacks.

5.7.1 Analytic Model Results

There are some important tradeoffs illustrated by the equations derived in Sections 5.3

and 5.4. First, as the reconfiguration rate α increases, less time is given for an attacker to

succeed, but the resource availability decreases and both the probability that a request for

a resource has to queue and the response time increase. We illustrate these tradeoffs by

using the equations above in a variety of numerical examples for varying values of α and c∗

for the two policies.

Figures 5.8a and 5.8b show the effect of α increasing on availability and resource utiliza-

tion ρ. As α increases, availability decreases and the average utilization increases, converging

to a limit defined by c∗. Figures 5.9a and 5.9b show the response time and average age of

the resources for both policies. As we would expect, a larger value of α results in a increased

response time due to busier resources on average, but also increased protection in the form

of reduced average age of the resources.

While both policies behave similarly, there are still some differences observed. For very

low reconfiguration rates (α → 0) all c resources are available and both policies behave

the same way as illustrated in the figures. As α increases, the wait policy ensures that a

generated reconfiguration request will be immediately honored as soon as k < c∗ whereas

86

the drop policy will drop requests generated when k ≥ c∗. Therefore, the wait policy will

reconfigure more often than the drop policy and therefore its response time will be higher

than that of the drop policy (see Figure 5.9a) and the average age of resources is lower than

that of the drop policy (see Figure 5.9b). For example, for α = 0.05 rec/sec, the average

response time of service requests of the wait policy is 17.5% higher than that of the drop

policy and the average age of a resource for the drop policy is 57% higher than that of

the wait policy. This illustrates the tradeoff we discussed above in the sense that the wait

policy always exhibits a worse response time than the drop policy but it exhibits a lower

resource age, which diminishes the probability of success from an attacker. Note that with

a policy that limits the maximum number of resources reconfiguring at a time, the response

time and resource age are also limited as a function of the average reconfiguration rate α.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Availability

α (rec/sec)

Drop Wait

(a) Availability

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

ρ

α (rec/sec)

Drop Wait

(b) Resource Utilization

Figure 5.8: Average Availability and Resource Utilization for Drop and Wait Policies

Figures 5.10a and 5.10b show availability for various values of c∗. With c∗ = 14, we

observe that availability converges towards a value of 0.3, which represents 6 of the 20

available resources. For larger values of c∗, availability is even lower, but we cannot tell

by number of available resources alone if the system will be able to keep up without also

knowing the demand for those resources.

To determine if resources can keep up with the demand from our incoming service

87

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Re

spo

nse

Tim

e (

sec)

α (rec/sec)

Drop Wait

(a) Response Time

0

20

40

60

80

100

120

140

160

180

200

Ag

e (

sec)

α (rec/sec)

Drop Wait

(b) Average Age

Figure 5.9: Average Response Time and Resource Age for Drop and Wait Policies

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Av

ail

ab

ilit

y

α (rec/sec)

c* = 14 c* = 17 c* = 20

(a) Drop Policy

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Av

ail

ab

ilit

y

α (rec/sec)

c* = 14 c* = 17 c* = 20

(b) Wait Policy

Figure 5.10: Availability for Varying Levels of c∗

88

requests, we calculate the resource utilization ρ = λ/(µ · c), where µ = 1/T , T = 0.5 sec

and incoming service request rate λ = 10 requests/sec. Figures 5.11a and 5.11b show the

resource utilization for the previously selected values of c∗. Note that any state where ρ ≥ 1

denotes an unstable state where on average, the resources cannot handle the incoming

requests and they will queue infinitely. Therefore, we must choose a value of c∗ such

that c − c∗ > λ · T , leading to our choice of c∗ = 14. For this value of c∗, as α → ∞,

ρ → λ · T/(c − c∗) = (10 · 0.5)/(20 − 14) = 0.833 which ensures the system remains stable

for all values of α.

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

ρ

α (rec/sec)

c* = 14 c* = 17 c* = 20

(a) Drop Policy

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

ρ

α (rec/sec)

c* = 14 c* = 17 c* = 20

(b) Wait Policy

Figure 5.11: Average Resource Utilization for Varying Levels of c∗

Figures 5.12a and 5.12b show average response times with other settings of c∗. With

c∗ > 14, we observe that for values of α above a certain point, the system becomes unable

to handle all incoming service requests because of a scarcity of resources and the response

time grows towards infinity, therefore validating our choice of c∗ = 14 for this number of

resources and our predicted demand for service.

Figure 5.13a shows the distribution of the number k of resources being reconfigured for

several values of the reconfiguration rate α out of a total of 20 resources, with the drop policy

and for c∗ = 14. The graphs show that the distribution is bell-shaped for low values of α,

as we might expect. As α increases towards 0.04 rec/sec, the probability distribution moves

89

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

Re

spo

nse

Tim

e (

sec)

α (rec/sec)

c* = 14 c* = 17 c* = 20

(a) Drop Policy

0

1

2

3

4

5

6

Re

spo

nse

Tim

e (

sec)

α (rec/sec)

c* = 14 c* = 17 c* = 20

(b) Wait Policy

Figure 5.12: Average Response Time for Varying Levels of c∗

to the right until it reaches the maximum number of requests that can be reconfigured. At

that point of saturation, the system will tend to spend the majority of its time in that state.

The average number of resources being reconfigured is 7.49 for α = 0.005 rec/sec, growing

up to 13.47 for α = 0.04 rec/sec. When α = 0.04 rec/sec, the system spends 62.2% of the

time with the maximum possible number of resources being reconfigured.

Similarly, Figure 5.13b shows the same phenomenon for the wait policy. Because, as

explained earlier, with the wait policy all reconfiguration requests will eventually be served,

on average there are more such active requests in the system. The average number of

resources being reconfigured is 7.50 for α = 0.005 rec/sec but rises to 13.96 for α = 0.04

rec/sec. When α = 0.04 rec/sec, the system spends 97.2% of the time with the maximum

number of resources being reconfigured.

5.7.2 Validation with Simulation Results

This section shows validation results between the analytic model and the simulation de-

scribed in Section 4.3. All simulation results show error bars representing 95% confidence

intervals. As seen in the figures described in this subsection, the analytic model results

match very closely the simulation ones.

Figures 5.14a through 5.15b show the probability distributions of the resources shuffling.

90

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Pro

ba

bil

ity


α = 0.005 α = 0.01 α = 0.02 α = 0.04

(a) Drop Policy

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Pro

ba

bil

ity


α = 0.005 α = 0.01 α = 0.02 α = 0.04

(b) Wait Policy

Figure 5.13: Probability Distributions of pk and pk for Varying Levels of α

Here we can again see the shape of the distribution changing from a bell-shaped distribution

to one weighted towards k = c∗ and in nearly all cases, the analytic value is within the 95%

confidence interval with a relative error of < 10%.

0.00

0.05

0.10

0.15

0.20

0.25

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Pro

ba

bil

ity


Simulation Analytic

(a) α = 0.005

0.0

0.1

0.1

0.2

0.2

0.3

0.3

0.4

0.4

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Pro

ba

bil

ity


Simulation Analytic

(b) α = 0.020

Figure 5.14: Comparison of pk Between Simulation and Analytical Model for Drop Policy

The probability distributions obtained from the analytic model and the simulation are

used to calculate overall availability, which are shown in Figure 5.16a. Here we see the

simulation results very nearly match the analytic results, with less than 1% relative error

for nearly every value of α. Figure 5.16b shows simulation and analytic results for the

average response time of requests under the drop policy for a range of values of α. The

91

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Pro

ba

bil

ity


Simulation Analytic

(a) α = 0.005

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Pro

ba

bil

ity


Simulation Analytic

(b) α = 0.020

Figure 5.15: Comparison of pk Between Simulation and Analytical Model for Wait Policy

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Availability

α (rec/sec)

Analytic Simulation

(a) Availability

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Re

spo

nse

Tim

e (

sec)

α (rec/sec)

Analytic Simulation

(b) Response Time

Figure 5.16: Comparison of Availability and Response Time Between Simulation and Ana-lytical Model for Drop Policy

92

maximum absolute percent relative error is 7.7% and occurs at about the midrange of the

values of α.

Figure 5.17a compares the average age of a resource under the drop policy computed

by the analytic model and the simulation. The maximum absolute percent relative error

is below 10% for all values of α. Figure 5.17b compares the percentage of dropped recon-

figuration requests for the drop policy obtained with the analytic model with the results

obtained with the simulation. For most values of α, the maximum absolute percent relative

error is below 5%.

0

20

40

60

80

100

120

140

160

180

200

Ag

e (

sec)

α (rec/sec)

Analytic Simulation

(a) Average Age

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

P(drop)

α (rec/sec)

Simulation Analytic

(b) Drop Percentage

Figure 5.17: Comparison of Average Resource Age and Drop Percentage Between Simulationand Analytical Model for Drop Policy

Based on the drop percentage we can also also determine α′, the effective value of α which

is measured in the simulation as the average time between reconfigurations for each resource,

as opposed to average age which is obtained by sampling each resource periodically to get

its age and averaging the results. We graph α′ in Figure 5.18 for the both the simulation

and the analytic model next to a line denoting the actual value of α. Here we observe for

low values of α, α = α′, but as α increases, the drop policy begins to take effect and limit

α′.

Similar validations were made for the wait policy and show the same degree of validation

between the analytic and simulation results across the entire range of α values.

93

0.00

0.01

0.02

0.03

0.04

0.05

0.06

α'

α (rec/sec)

Simulation Analytic Alpha

Figure 5.18: Comparison of Effective Reconfiguration Rate Between Simulation and Ana-lytical Models for Drop Policy

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Av

ail

ab

ilit

y

α (rec/sec)

Simulation Analytic

(a) Availability

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Re

spo

nse

Tim

e (

sec)

α (rec/sec)

Simulation Analytic

(b) Response Time

Figure 5.19: Comparison of Availability and Response Time Between Simulation and Ana-lytical Model for Wait Policy

94

0

20

40

60

80

100

120

140

160

180

200

Ag

e (

sec)

α (rec/sec)

Simulation Analytic

(a) Average Age

0

5

10

15

20

25

30

35

De

lay

(se

c)

α (rec/sec)

Simulation Analytic

(b) Reconfiguration DelayDrop Percentage

Figure 5.20: Comparison of Average Resource Age and Reconfiguration Delay BetweenSimulation and Analytical Model for Wait Policy

0.00

0.01

0.02

0.03

0.04

0.05

0.06

α'

α (rec/sec)

Simulation Analytic Alpha

Figure 5.21: Comparison of Effective Reconfiguration Rate Between Simulation and Ana-lytical Models for Wait Policy

95

5.7.3 Validation of the Simulation with Experimental Results

We describe here a validation of the simulation model with experimental results ob-

tained with the setup described in Section 4.3. Tables 5.4 and 5.5 compare simula-

tion and experimental results for availability and response time for the drop and wait

policies, respectively, for values of α ranging from 0.005 to 0.050 reconfigurations/sec.

The average values and corresponding 95% confidence intervals are shown in the tables.

The last column in each table shows the absolute percent relative error computed as

100 × (simulation − experiment)/simulation. As we can see from the tables, the errors

are low and are below 5% in all cases except for one case in which the error was 7.8%. This

finding, along with the findings in the above subsection, validates the analytic results with

simulation and experimentation.

Table 5.4: Comparison of Simulation and Experimental Results for Availability

Drop Policy Wait Policyα Simulation Experimental Error Simulation Experimental Error0.005 0.706 ± 0.013 0.701 ± 0.013 0.71% 0.703 ± 0.014 0.701 ± 0.015 0.28%0.010 0.487 ± 0.012 0.495 ± 0.02 1.64% 0.470 ± 0.015 0.473 ± 0.018 0.64%0.015 0.391 ± 0.006 0.381 ± 0.007 2.56% 0.357 ± 0.009 0.359 ± 0.011 0.56%0.020 0.360 ± 0.003 0.350 ± 0.004 2.78% 0.319 ± 0.005 0.311 ± 0.003 2.51%0.025 0.342 ± 0.003 0.335 ± 0.002 2.05% 0.308 ± 0.002 0.305 ± 0.002 0.97%0.030 0.335 ± 0.003 0.331 ± 0.003 1.19% 0.303 ± 0.001 0.301 ± 0.000 0.66%0.040 0.322 ± 0.001 0.321 ± 0.001 0.31% 0.301 ± 0.001 0.300 ± 0.000 0.33%0.050 0.317 ± 0.001 0.316 ± 0.001 0.32% 0.301 ± 0.000 0.300 ± 0.000 0.33%

5.7.4 Determining the Optimal Reconfiguration Rate

The model presented in Section 4.2 allows one to predict the response time and the average

age of each resource in the system. We can then use these results to answer questions such as

“Given objective values for response time and level of protection, what is the reconfiguration

rate that maximizes overall utility?”

We can solve this by estimating the attacker’s chance of success using the method

96

Table 5.5: Comparison of Simulation and Experimental Results for Response Time

Drop Policy Wait Policyα Simulation Experimental Error Simulation Experimental Error0.005 0.503 ± 0.002 0.506 ± 0.003 0.60% 0.507 ± 0.004 0.508 ± 0.002 0.20%0.010 0.533 ± 0.008 0.532 ± 0.01 0.19% 0.544 ± 0.01 0.558 ± 0.017 2.57%0.015 0.605 ± 0.014 0.594 ± 0.014 1.82% 0.664 ± 0.025 0.673 ± 0.029 1.36%0.020 0.636 ± 0.013 0.651 ± 0.017 2.36% 0.731 ± 0.023 0.788 ± 0.030 7.80%0.025 0.667 ± 0.016 0.683 ± 0.017 2.40% 0.801 ± 0.027 0.793 ± 0.024 1.00%0.030 0.679 ± 0.012 0.687 ± 0.021 1.18% 0.791 ± 0.025 0.805 ± 0.020 1.77%0.040 0.718 ± 0.019 0.713 ± 0.021 0.70% 0.806 ± 0.022 0.816 ± 0.029 1.24%0.050 0.725 ± 0.019 0.758 ± 0.025 4.55% 0.798 ± 0.029 0.819 ± 0.031 2.63%

described in Section 4.2.3 in the previous chapter. As an example, we estimate the attacker’s

chance of success using the linear method with Ts = 300 sec, with Ps(t) = 1 when t ≥ Ts.

Next, we assign utility values to the response time and attacker’s chance of success using

the following sigmoid functions:

UR(tr) =eσ(−tr+βR)

1 + eσ(−tr+βR)(5.40)

US(Ps) =eσ(−Ps+βS)

1 + eσ(−Ps+βS)(5.41)

where tr is the response time, βR is the objective response time, Ps is the attacker’s

chance of success, βS is the objective attacker’s success rate, and σ is a steepness parameter

for the sigmoid. We can now compute a global utility function Ug as:

Ug = wR · UR(tr) + wS · US(Ps) (5.42)

where wR and wS are weight factors chosen such that wR +wS = 1. Different values of

wR and wS influence the optimal reconfiguration rate. For example, Figure 5.22 shows the

overall utility values for the drop policy where TS = 300 sec, βR = 55 sec, βS = 0.2, and σ

= 10. When wR = wS , the optimal value is found at α = 0.018 rec/sec. When wR = 0.75,

97

denoting an emphasis on response times at the cost of protection, the optimal value can be

found at α = 0.018 rec/sec, and when wS = 0.75, denoting an emphasis on protection at

the cost of response times, the optimal value is 0.041 rec/sec.

0.30

0.32

0.34

0.36

0.38

0.40

0.42

0.44

0.46

0.48

0.50

Uti

lity

α (rec/sec)

wR = 0.5, wS = 0.5 wR = 0.75, wS = 0.25 wR = 0.25, wS = 0.75

Figure 5.22: Utility Values of Various Weight Combinations for Drop Policy

98

Chapter 6: Conclusions and Future Work

6.1 Conclusions

Moving Target Defense (MTD) has recently emerged as one of the potentially game-changing

themes in cyber security. While the typical asymmetry of the security landscape tends

to favor the attacker, MTD holds promise to change the game in favor of the defender.

Thus, MTD has received significant attention in the last decade, prompting researcher and

practitioners to develop a myriad of different MTD techniques. Unfortunately, most such

techniques are designed to address a very narrow set of attack vectors. Additionally, despite

the significant progress made in this area, the problem of studying and quantifying the cost

and benefits associated with the deployment of MTD techniques has not received sufficient

attention, and shared metrics to assess the performance of MTD techniques are still lacking.

This dissertation has introduced a framework for quantifying moving target defenses.

Our approach to quantifying the benefits of MTDs yields a single, probability-based utility

measure that can accommodate any existing or future MTD, regardless of their nature. Our

multi-layered approach captures the relationship between MTDs and the knowledge blocks

they are designed to protect and the relationship between knowledge blocks and generic

classes of weaknesses that can be exploited using that knowledge.

Furthermore, we have also proposed a quantitative analytic model for assessing the

resource availability and performance of MTDs, and a method for determining the recon-

figuration rate that minimizes the attack success probability subject to availability and

performance constraints.

To demonstrate the usefulness of this framework, we have shown through case studies

that we can compute the joint effectiveness of multiple MTDs as a function of their individ-

ual effectiveness and, by doing so, we can make informed decisions about which MTD or set

99

of MTDs provide better protection based on the security requirements or cost constraints.

We have also carried out simulations and experiments validating the formulations of the

analytical model that allows us to predict the security and response time of an MTD based

on its configuration.

6.2 Future Work

Although the work presented in this dissertation represents a significant step towards effec-

tive MTD quantification, this line of research can continue to be expanded in a number of

directions:

• Application to multiple cyber attack phases: The model works primarily on

disrupting attacker’s knowledge in the reconnaissance phase of the cyber attack chain.

While this may be the most cost-effective way to approach cyber-security, no defense

is perfect. MTDs can also prevent an attacker from gaining a foothold by periodi-

cally refreshing systems with a fresh VM instance. We can model this by treating

persistence as an additional block for exploiting weaknesses, with a probability value

that can also be disrupted by MTD. However, when calculating probabilities using the

model, we must ensure that prevention steps taken during the reconnaissance phase

are evaluated first. This might be realized by using recursion or adding more layers

to the graphical model.

• Application to multiple (dependent) services: The quantification framework

currently only calculates the probability of exploit for a single service or multiple

independent services. Similar to network attack graphs, an attacker may have to

follow a path of exploits to reach a final goal state. We may be able to model this

by treating each step in the attack as a service to be exploited, encapsulating each

service into its own MTD graph, then probabilistically determining the likelihood of

the final goal state being reached by exploiting all the services, similar to how attack

graphs already work. An MTD may apply to the knowledge blocks of a single service

100

or all services across the network, depending on their nature and individual settings.

• Choice of utility function: The way that utility functions are currently combined

in the quantification framework requires there to be sufficient MTDs to at least cover

all weaknesses. If not, then we assume that a weakness not covered by MTD will

be compromised at some point which reduces utility to 0. If the risk from leaving

a weakness unprotected by MTD can be accepted, than some other utility function

such as one based on a weighted average of the probabilities of each weakness being

exploited may suffice.

• Autonomic Controllers: With regard to the analytical model, autonomic con-

trollers can be designed that dynamically control the trade-off between availability

and security by automatically and dynamically adjusting the reconfiguration rate α

or the maximum reconfiguration value of c∗ to achieve target response times. Under

peak load conditions, the system should ensure that a sufficient number of resources

are available to serve requests, and the need to guarantee baseline availability in these

circumstances may, at the user’s discretion, override the need to achieve a target re-

configuration rate. Our work in this direction is motivated by the observation that

system overload due to low resource availability can generate large peaks in response

times.

101

Bibliography

[1] S. Jajodia, A. K. Ghosh, V. Swarup, C. Wang, and X. S. Wang, Moving Target Defense:Creating Asymmetric Uncertainty for Cyber Threats, 1st ed. Springer PublishingCompany, Incorporated, 2011.

[2] H. Okhravi, M. Rabe, T. Mayberry, W. Leonard, T. Hobson, D. Bigelow, andW. Streilein, “Survey of cyber moving target techniques,” DTIC Document, Tech.Rep., 2013.

[3] K. A. Farris and G. Cybenko, “Quantification of moving target cyber defenses,” inSPIE Defense+ Security. International Society for Optics and Photonics, 2015, pp.94 560L–94 560L.

[4] S. Jajodia, A. K. Ghosh, V. Subrahmanian, V. Swarup, C. Wang, and X. S. Wang,“Moving target defense ii,” Application of game Theory and Adversarial Modeling.Series: Advances in Information Security, vol. 100, p. 203, 2013.

[5] R. Colbaugh and K. Glass, “Moving target defense for adaptive adversaries,” in Intel-ligence and Security Informatics (ISI), 2013 IEEE International Conference on, June2013, pp. 50–55.

[6] H. Okhravi, T. Hobson, D. Bigelow, and W. Streilein, “Finding focus in the blur ofmoving-target techniques,” Security & Privacy, IEEE, vol. 12, no. 2, pp. 16–26, 2014.

[7] M. I. Husain, K. Courtright, and R. Sridhar, “Lightweight reconfigurable encryptionarchitecture for moving target defense,” in Military Communications Conference, MIL-COM 2013 - 2013 IEEE, Nov 2013, pp. 214–219.

[8] Y. Li, R. Dai, and J. Zhang, “Morphing communications of cyber-physical systemstowards moving-target defense,” in Communications (ICC), 2014 IEEE InternationalConference on, June 2014, pp. 592–598.

[9] V. Casola, A. D. Benedictis, and M. Albanese, “A moving target defense approach forprotecting resource-constrained distributed devices,” in Information Reuse and Inte-gration (IRI), 2013 IEEE 14th International Conference on, Aug 2013, pp. 22–29.

[10] M. Albanese, A. D. Benedictis, S. Jajodia, and K. Sun, “A moving target defense mech-anism for manets based on identity virtualization,” in Communications and NetworkSecurity (CNS), 2013 IEEE Conference on, Oct 2013, pp. 278–286.

[11] H. Shacham, M. Page, B. Pfaff, E.-J. Goh, N. Modadugu, and D. Boneh, “On the effec-tiveness of address-space randomization,” in Proceedings of the 11th ACM conferenceon Computer and communications security. ACM, 2004, pp. 298–307.

102

[12] S. Bhatkar, D. C. DuVarney, and R. Sekar, “Efficient techniques for comprehensiveprotection from memory error exploits.” in Usenix Security, 2005.

[13] L. V. Davi, A. Dmitrienko, S. Nurnberger, and A.-R. Sadeghi, “Gadge me if youcan: Secure and efficient ad-hoc instruction-level randomization for x86 and arm,”in Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer andCommunications Security, ser. ASIA CCS ’13. New York, NY, USA: ACM, 2013,pp. 299–310. [Online]. Available: http://doi.acm.org/10.1145/2484313.2484351

[14] E. D. Berger and B. G. Zorn, “Diehard: Probabilistic memory safety for unsafelanguages,” in Proceedings of the 27th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation, ser. PLDI ’06. New York, NY, USA: ACM,2006, pp. 158–168. [Online]. Available: http://doi.acm.org/10.1145/1133981.1134000

[15] G. Novark and E. D. Berger, “Dieharder: Securing the heap,” in Proceedingsof the 17th ACM Conference on Computer and Communications Security, ser.CCS ’10. New York, NY, USA: ACM, 2010, pp. 573–584. [Online]. Available:http://doi.acm.org/10.1145/1866307.1866371

[16] G. S. Kc, A. D. Keromytis, and V. Prevelakis, “Countering code-injection attackswith instruction-set randomization,” in Proceedings of the 10th ACM Conference onComputer and Communications Security, ser. CCS ’03. New York, NY, USA: ACM,2003, pp. 272–280. [Online]. Available: http://doi.acm.org/10.1145/948109.948146

[17] K. Onarlioglu, L. Bilge, A. Lanzi, D. Balzarotti, and E. Kirda, “G-free:Defeating return-oriented programming through gadget-less binaries,” in Proceedingsof the 26th Annual Computer Security Applications Conference, ser. ACSAC’10. New York, NY, USA: ACM, 2010, pp. 49–58. [Online]. Available:http://doi.acm.org/10.1145/1920261.1920269

[18] E. G. Barrantes, D. H. Ackley, T. S. Palmer, D. Stefanovic, and D. D. Zovi,“Randomized instruction set emulation to disrupt binary code injection attacks,” inProceedings of the 10th ACM Conference on Computer and Communications Security,ser. CCS ’03. New York, NY, USA: ACM, 2003, pp. 281–289. [Online]. Available:http://doi.acm.org/10.1145/948109.948147

[19] E. G. Barrantes, D. H. Ackley, S. Forrest, and D. Stefanovic, “Randomized instructionset emulation,” ACM Trans. Inf. Syst. Secur., vol. 8, no. 1, pp. 3–40, Feb. 2005.[Online]. Available: http://doi.acm.org/10.1145/1053283.1053286

[20] M. Thompson, N. Evans, and V. Kisekka, “Multiple os rotational environment animplemented moving target defense,” in Resilient Control Systems (ISRCS), 2014 7thInternational Symposium on, Aug 2014, pp. 1–6.

[21] R. Zhuang, S. Zhang, A. Bardas, S. A. DeLoach, X. Ou, and A. Singhal, “Investigatingthe application of moving target defenses to network security,” in Resilient ControlSystems (ISRCS), 2013 6th International Symposium on, Aug 2013, pp. 162–169.

[22] B. Cox, D. Evans, A. Filipi, J. Rowanhill, W. Hu, J. Davidson, J. Knight, A. Nguyen-Tuong, and J. Hiser, “N-variant systems: A secretless framework for security through

103

diversity,” in Proceedings of the 15th Conference on USENIX Security Symposium -Volume 15, ser. USENIX-SS’06. Berkeley, CA, USA: USENIX Association, 2006.[Online]. Available: http://dl.acm.org/citation.cfm?id=1267336.1267344

[23] H. Okhravi, A. Comella, E. Robinson, and J. Haines, “Creating a cyber moving targetfor critical infrastructure applications using platform diversity,” International Journalof Critical Infrastructure Protection, vol. 5, no. 1, pp. 30–39, 2012.

[24] D. Arsenault, A. Sood, and Y. Huang, “Secure, resilient computing clusters:Self-cleansing intrusion tolerance with hardware enforced security (scit/hes),” inProceedings of the The Second International Conference on Availability, Reliability andSecurity, ser. ARES ’07. Washington, DC, USA: IEEE Computer Society, 2007, pp.343–350. [Online]. Available: http://dx.doi.org/10.1109/ARES.2007.134

[25] A. K. Bangalore and A. K. Sood, “Securing web servers using self cleansing intrusiontolerance (scit),” in Dependability, 2009. DEPEND ’09. Second International Confer-ence on, June 2009, pp. 60–65.

[26] Y. Huang, D. Arsenault, and A. Sood, “Incorruptible system self-cleansing for intrusiontolerance,” in 2006 IEEE International Performance Computing and CommunicationsConference, April 2006, pp. 4 pp.–496.

[27] T. Roeder and F. B. Schneider, “Proactive obfuscation,” ACM Trans. Comput.Syst., vol. 28, no. 2, pp. 4:1–4:54, Jul. 2010. [Online]. Available: http://doi.acm.org/10.1145/1813654.1813655

[28] A. J. O’Donnell and H. Sethu, “On achieving software diversity for improvednetwork security using distributed coloring algorithms,” in Proceedings of the11th ACM Conference on Computer and Communications Security, ser. CCS’04. New York, NY, USA: ACM, 2004, pp. 121–131. [Online]. Available:http://doi.acm.org/10.1145/1030083.1030101

[29] B. Salamat, A. Gal, and M. Franz, “Reverse stack execution in a multi-variant execu-tion environment,” in Workshop on Compiler and Architectural Techniques for Appli-cation Reliability and Security, 2008, pp. 1–7.

[30] M. Azab, R. Hassan, and M. Eltoweissy, “Chameleonsoft: A moving target defense sys-tem,” in Collaborative Computing: Networking, Applications and Worksharing (Col-laborateCom), 2011 7th International Conference on, Oct 2011, pp. 241–250.

[31] S. Vikram, C. Yang, and G. Gu, “Nomad: Towards non-intrusive moving-target de-fense against web bots,” in Communications and Network Security (CNS), 2013 IEEEConference on, Oct 2013, pp. 55–63.

[32] S. W. Boyd and A. D. Keromytis, “SQLrand: Preventing SQL injection attacks,” inInternational Conference on Applied Cryptography and Network Security. Springer,2004, pp. 292–302.

[33] K. Trovato, “Ip hopping for secure data transfer,” Apr. 10 2003, uS Patent App.09/973,311. [Online]. Available: https://www.google.com/patents/US20030069981

104

[34] M. Carvalho and R. Ford, “Moving-target defenses for computer networks,” IEEESecurity Privacy, vol. 12, no. 2, pp. 73–76, Mar 2014.

[35] A. Clark, K. Sun, and R. Poovendran, “Effectiveness of ip address randomization indecoy-based moving target defense,” in Decision and Control (CDC), 2013 IEEE 52ndAnnual Conference on, Dec 2013, pp. 678–685.

[36] Q. Jia, K. Sun, and A. Stavrou, “Motag: Moving target defense against internet denialof service attacks,” in Computer Communications and Networks (ICCCN), 2013 22ndInternational Conference on, July 2013, pp. 1–9.

[37] E. Al-Shaer, Q. Duan, and J. H. Jafarian, “Random host mutation for moving targetdefense,” in Security and Privacy in Communication Networks. Springer, 2012, pp.310–327.

[38] J. Jafarian, E. Al-Shaer, and Q. Duan, “An effective address mutation approach fordisrupting reconnaissance attacks,” Information Forensics and Security, IEEE Trans-actions on, vol. 10, no. 12, pp. 2562–2577, Dec 2015.

[39] J. H. H. Jafarian, E. Al-Shaer, and Q. Duan, “Spatio-temporal address mutation forproactive cyber agility against sophisticated attackers,” in Proceedings of the FirstACM Workshop on Moving Target Defense, ser. MTD ’14. New York, NY, USA: ACM,2014, pp. 69–78. [Online]. Available: http://doi.acm.org/10.1145/2663474.2663483

[40] J. Jafarian, E. Al-Shaer, and Q. Duan, “Adversary-aware ip address randomizationfor proactive agility against sophisticated attackers,” in Computer Communications(INFOCOM), 2015 IEEE Conference on, April 2015, pp. 738–746.

[41] J. Yackoski, P. Xie, H. Bullen, J. Li, and K. Sun, “A self-shielding dynamic networkarchitecture,” in 2011 - MILCOM 2011 Military Communications Conference, Nov2011, pp. 1381–1386.

[42] J. Yackoski, J. Li, S. A. DeLoach, and X. Ou, “Mission-oriented moving targetdefense based on cryptographically strong network dynamics,” in Proceedings of theEighth Annual Cyber Security and Information Intelligence Research Workshop, ser.CSIIRW ’13. New York, NY, USA: ACM, 2013, pp. 57:1–57:4. [Online]. Available:http://doi.acm.org/10.1145/2459976.2460040

[43] J. Yackoski, H. Bullen, X. Yu, and J. Li, Moving Target Defense II: Applicationof Game Theory and Adversarial Modeling. New York, NY: Springer New York,2013, ch. Applying Self-Shielding Dynamics to the Network Architecture, pp. 97–115.[Online]. Available: http://dx.doi.org/10.1007/978-1-4614-5416-8 6

[44] M. Albanese, E. Battista, S. Jajodia, and V. Casola, “Manipulating the attacker’s viewof a system’s attack surface,” in Communications and Network Security (CNS), 2014IEEE Conference on, Oct 2014, pp. 472–480.

[45] G. A. Fink, J. N. Haack, A. D. McKinnon, and E. W. Fulp, “Defense on the move:Ant-based cyber defense,” IEEE Security Privacy, vol. 12, no. 2, pp. 36–43, Mar 2014.

105

[46] M. Dunlop, S. Groat, W. Urbanski, R. Marchany, and J. Tront, “Mt6d: A movingtarget ipv6 defense,” in MILITARY COMMUNICATIONS CONFERENCE, 2011 -MILCOM 2011, Nov 2011, pp. 1321–1326.

[47] O. Hardman, S. Groat, R. Marchany, and J. Tront, “Optimizing a network layermoving target defense for specific system architectures,” in Proceedings of theNinth ACM/IEEE Symposium on Architectures for Networking and CommunicationsSystems, ser. ANCS ’13. Piscataway, NJ, USA: IEEE Press, 2013, pp. 117–118.[Online]. Available: http://dl.acm.org/citation.cfm?id=2537857.2537877

[48] S. Groat, M. Dunlop, W. Urbanksi, R. Marchany, and J. Tront, “Using an ipv6 movingtarget defense to protect the smart grid,” in 2012 IEEE PES Innovative Smart GridTechnologies (ISGT), Jan 2012, pp. 1–7.

[49] S. Groat, R. Moore, R. Marchany, and J. Tront, “Securing static nodes in mobile-enabled systems using a network-layer moving target defense,” in Engineering ofMobile-Enabled Systems (MOBS), 2013 1st International Workshop on the, May 2013,pp. 42–47.

[50] J. Xu, P. Guo, M. Zhao, R. F. Erbacher, M. Zhu, and P. Liu, “Comparing differentmoving target defense techniques,” in Proceedings of the First ACM Workshop onMoving Target Defense, ser. MTD ’14. New York, NY, USA: ACM, 2014, pp. 97–107.[Online]. Available: http://doi.acm.org/10.1145/2663474.2663486

[51] T. Carroll, M. Crouse, E. Fulp, and K. Berenhaut, “Analysis of network address shuf-fling as a moving target defense,” in Communications (ICC), 2014 IEEE InternationalConference on, June 2014, pp. 701–706.

[52] M. P. Collins, “A cost-based mechanism for evaluating the effectiveness of movingtarget defenses,” in Decision and Game Theory for Security. Springer, 2012, pp.221–233.

[53] W. Peng, F. Li, C.-T. Huang, and X. Zou, “A moving-target defense strategy for cloud-based services with heterogeneous and dynamic attack surfaces,” in Communications(ICC), 2014 IEEE International Conference on, June 2014, pp. 804–809.

[54] R. Zhuang, S. Zhang, S. A. DeLoach, X. Ou, and A. Singhal, “Simulation-based ap-proaches to studying effectiveness of moving-target network defense,” in National sym-posium on moving target research, 2012.

[55] K. Zaffarano, J. Taylor, and S. Hamilton, “A quantitative framework for movingtarget defense effectiveness evaluation,” in Proceedings of the Second ACM Workshopon Moving Target Defense, ser. MTD ’15. New York, NY, USA: ACM, 2015, pp.3–10. [Online]. Available: http://doi.acm.org/10.1145/2808475.2808476

[56] F. Gillani, E. Al-Shaer, S. Lo, Q. Duan, M. Ammar, and E. Zegura, “Agile virtualizedinfrastructure to proactively defend against cyber attacks,” in Computer Communica-tions (INFOCOM), 2015 IEEE Conference on, April 2015, pp. 729–737.

106

[57] Y. Han, W. Lu, and S. Xu, “Characterizing the power of moving target defense viacyber epidemic dynamics,” in Proceedings of the 2014 Symposium and Bootcamp onthe Science of Security, ser. HotSoS ’14. New York, NY, USA: ACM, 2014, pp.10:1–10:12. [Online]. Available: http://doi.acm.org/10.1145/2600176.2600180

[58] C. Phillips and L. P. Swiler, “A graph-based system for network-vulnerability analysis,”in Proceedings of the 1998 workshop on New security paradigms. ACM, 1998, pp. 71–79.

[59] S. Jha, O. Sheyner, and J. Wing, “Two formal analyses of attack graphs,” in ComputerSecurity Foundations Workshop, 2002. Proceedings. 15th IEEE, 2002, pp. 49–63.

[60] O. Sheyner, J. Haines, S. Jha, R. Lippmann, and J. M. Wing, “Automated generationand analysis of attack graphs,” in Security and privacy, 2002. Proceedings. 2002 IEEESymposium on. IEEE, 2002, pp. 273–284.

[61] S. Jajodia, S. Noel, and B. OBerry, “Topological analysis of network attack vulnera-bility,” in Managing Cyber Threats. Springer, 2005, pp. 247–266.

[62] L. Wang, T. Islam, T. Long, A. Singhal, and S. Jajodia, “An attack graph-basedprobabilistic security metric,” in IFIP Annual Conference on Data and ApplicationsSecurity and Privacy. Springer, 2008, pp. 283–296.

[63] D. M. Chess, C. Palmer, and S. R. White, “Security in an autonomic computingenvironment,” IBM Syst. J., vol. 42, no. 1, pp. 107–118, Jan. 2003. [Online]. Available:http://dx.doi.org/10.1147/sj.421.0107

[64] M. Atighetchi, P. Pal, F. Webber, and C. Jones, “Adaptive use of network-centricmechanisms in cyber-defense,” in Object-Oriented Real-Time Distributed Computing,2003. Sixth IEEE International Symposium on, May 2003, pp. 183–192.

[65] M. Carvalho, T. C. Eskridge, L. Bunch, A. Dalton, R. Hoffman, J. M. Bradshaw, P. J.Feltovich, D. Kidwell, and T. Shanklin, “Mtc2: A command and control frameworkfor moving target defense and cyber resilience,” in Resilient Control Systems (ISRCS),2013 6th International Symposium on, Aug 2013, pp. 175–180.

[66] P. Pal, R. Schantz, A. Paulos, and B. Benyo, “Managed execution environment as amoving-target defense infrastructure,” IEEE Security Privacy, vol. 12, no. 2, pp. 51–59,Mar 2014.

[67] B. Schmerl, J. Camara, G. Moreno, D. Garlan, and A. O. Mellinger, “Architecture-based self-adaptation for moving target defense (cmu-isr-14-109),” 2014.

[68] B. Schmerl, J. Camara, J. Gennari, D. Garlan, P. Casanova, G. A. Moreno, T. J.Glazier, and J. M. Barnes, “Architecture-based self-protection: Composing andreasoning about denial-of-service mitigations,” in Proceedings of the 2014 Symposiumand Bootcamp on the Science of Security, ser. HotSoS ’14. New York, NY, USA: ACM,2014, pp. 2:1–2:12. [Online]. Available: http://doi.acm.org/10.1145/2600176.2600181

107

[69] E. Yuan, S. Malek, B. Schmerl, D. Garlan, and J. Gennari, “Architecture-based self-protecting software systems,” in Proceedings of the 9th InternationalACM Sigsoft Conference on Quality of Software Architectures, ser. QoSA’13. New York, NY, USA: ACM, 2013, pp. 33–42. [Online]. Available:http://doi.acm.org/10.1145/2465478.2465479

[70] J. Camara, G. A. Moreno, and D. Garlan, “Stochastic game analysis and latencyawareness for proactive self-adaptation,” in Proceedings of the 9th InternationalSymposium on Software Engineering for Adaptive and Self-Managing Systems, ser.SEAMS 2014. New York, NY, USA: ACM, 2014, pp. 155–164. [Online]. Available:http://doi.acm.org/10.1145/2593929.2593933

[71] K. M. Carter, J. F. Riordan, and H. Okhravi, “A game theoretic approach to strategydetermination for dynamic platform defenses,” in Proceedings of the First ACMWorkshop on Moving Target Defense, ser. MTD ’14. New York, NY, USA: ACM,2014, pp. 21–30. [Online]. Available: http://doi.acm.org/10.1145/2663474.2663478

[72] T. Glazier, J. Camara, B. Schmerl, and D. Garlan, “Analyzing resilience proper-ties of different topologies of collective adaptive systems,” in Self-Adaptive and Self-Organizing Systems Workshops (SASOW), 2015 IEEE International Conference on,Sept 2015, pp. 55–60.

[73] M. Alia, M. Lacoste, R. He, and F. Eliassen, “Putting together qos andsecurity in autonomic pervasive systems,” in Proceedings of the 6th ACMWorkshop on QoS and Security for Wireless and Mobile Networks, ser. Q2SWinet’10. New York, NY, USA: ACM, 2010, pp. 19–28. [Online]. Available:http://doi.acm.org/10.1145/1868630.1868634

[74] M. Q. Ali, E. Al-Shaer, H. Khan, and S. A. Khayam, “Automated anomalydetector adaptation using adaptive threshold tuning,” ACM Trans. Inf. Syst.Secur., vol. 15, no. 4, pp. 17:1–17:30, Apr. 2013. [Online]. Available: http://doi.acm.org/10.1145/2445566.2445569

[75] F. B. Alomari and D. A. Menasce, “Self-protecting and self-optimizing databasesystems: Implementation and experimental evaluation,” in Proceedings of the2013 ACM Cloud and Autonomic Computing Conference, ser. CAC ’13. NewYork, NY, USA: ACM, 2013, pp. 18:1–18:10. [Online]. Available: http://doi.acm.org/10.1145/2494621.2494631

[76] F. Alomari and D. Menasce, “An autonomic framework for integrating security andquality of service support in databases,” in Software Security and Reliability (SERE),2012 IEEE Sixth International Conference on, June 2012, pp. 51–60.

[77] S. Christey. (2011) 2011 CWE/SANS top 25 most dangerous software errors. MITRE.[Online]. Available: http://cwe.mitre.org/top25/

[78] M. Howard and D. LeBlanc, “The STRIDE threat model,” in Writing Secure Code.Microsoft Press, 2002.

108

[79] N. Soule, B. Simidchieva, F. Yaman, R. Watro, J. Loyall, M. Atighetchi, M. Carvalho,D. Last, D. Myers, and B. Flatley, “Quantifying minimizing attack surfaces containingmoving target defenses,” in Resilience Week (RWS), 2015, Aug 2015, pp. 1–6.

[80] S. G. Chen, “Reduced recursive inclusion-exclusion principle for the probability ofunion events,” in 2014 IEEE International Conference on Industrial Engineering andEngineering Management, Dec 2014, pp. 11–13.

[81] F. Alomari and D. A. Menasce, “An autonomic framework for integrating security andquality of service support in databases,” in Software Security and Reliability (SERE),2012 IEEE Sixth International Conference on, June 2012, pp. 51–60.

[82] L. S. Lasdon, R. L. Fox, and M. W. Ratner, “Nonlinear optimization using the gener-alized reduced gradient method,” Revue francaise d’automatique, d’informatique et derecherche operationnelle. Recherche operationnelle, vol. 8, no. 3, pp. 73–103, 1974.

[83] L. Kleinrock, Queueing Systems. Volume 1: Theory. Wiley-Interscience, 1975.

[84] A. K. Bangalore and A. K. Sood, “Securing web servers using self cleansing intrusiontolerance (SCIT),” in Dependability, 2009. DEPEND’09. Second International Confer-ence on. IEEE, 2009, pp. 60–65.

109

Curriculum Vitae

Warren Connell is a member of the U.S. Air Force’s Civilian Institution Program. He hashad a variety of assignments in his 20-year career, including installing network intrusiondetection systems at Air Force network operation centers and managing information as-surance activities for all Air Force Mission Planning software. He received his Bachelor ofScience in Computer Engineering from the University of Nebraska in 2007 and went on toreceive his Master of Science in Computer Engineering at Wright State University in 2011.After finishing his Doctor of Philosophy in Information Technology at George Mason in2017, he is slated for assignment to the F-35 Joint Program Office, followed by a positionon the faculty at the Air Force Institute of Technology.

110

a quantitative frameworkmason.gmu.edu/~wconnel2/dissertationfinal.pdfa quantitative framework for...

Documents