a quantitative frameworkmason.gmu.edu/~wconnel2/dissertationfinal.pdfa quantitative framework for...
TRANSCRIPT
A QUANTITATIVE FRAMEWORKFOR CYBER MOVING TARGET DEFENSES
by
Warren J. ConnellA Dissertation
Submitted to theGraduate Faculty
ofGeorge Mason UniversityIn Partial fulfillment of
The Requirements for the Degreeof
Doctor of PhilosophyInformation Technology
Committee:
Dr. Massimiliano Albanese, DissertationCo-Director
Dr. Daniel A. Menasce, Co-Director
Dr. Sushil Jajodia, Committee Member
Dr. Rajesh Ganesan, Committee Member
Dr. Stephen Nash, Department Chair
Dr. Kenneth S. Ball, Dean, Volgenau Schoolof Engineering
Date: Fall Semester 2017George Mason UniversityFairfax, VA
A Quantitative Framework for Cyber Moving Target Defenses
A dissertation submitted in partial fulfillment of the requirements for the degree ofDoctor of Philosophy at George Mason University
By
Warren J. ConnellMaster of Science
Wright State University, 2011Bachelor of Science
University of Nebraska, 2007
Director: Dr. Massimiliano Albanese, ProfessorDepartment of Information Science and Technology
Co-director: Dr. Daniel A. Menasce, ProfessorDepartment of Computer Science
Fall Semester 2017George Mason University
Fairfax, VA
Copyright c© 2017 by Warren J. ConnellAll Rights Reserved
ii
Dedication
To all the leaders and mentors I’ve had in the Air Force who have guided me and given meopportunities over the last 20 years.
iii
Acknowledgments
I would like to thank my dissertation directors: Dr. Albanese for his time and patiencepreparing me for the world of academia and Dr. Menasce for his invaluable guidance anddirection. I would also like to thank the rest of my dissertation committee and my fellowPhD students for their comments and sharing their experiences with me. Thanks to myfriends for their fellowship and helping keep me sane–I wouldn’t be here without you. Andfinally, thanks to my wife Kayleen, who is no stranger to the life of an academic widow.
iv
Table of Contents
Page
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background and Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Moving Target Defense . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Dynamic Runtime Environments . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Dynamic Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 Dynamic Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Dynamic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.5 Dynamic Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.6 MTD Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Attack Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3 Self-Protecting Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 MTD Quantification Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Threat Model and Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Quantification Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.1 Mathematical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.2 4-Layer Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.3 Computing MTD effectiveness . . . . . . . . . . . . . . . . . . . . . 25
3.3 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Comparing MTDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
v
3.4.2 Selecting Optimal Defenses . . . . . . . . . . . . . . . . . . . . . . . 32
3.5 Combining MTDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1 Experimental Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5.2 Experimental Results and Observations . . . . . . . . . . . . . . . . 37
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4 Performance Modeling of Moving Target Defenses . . . . . . . . . . . . . . . . . 42
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Quantitative Analysis of MTDs . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.1 Reconfiguration Model . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.2 Response Time Model . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.3 Analysis of Attack Success Probability . . . . . . . . . . . . . . . . . 53
4.3 Simulation and Experimental Testbed . . . . . . . . . . . . . . . . . . . . . 55
4.4 Numerical Results and Validation . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.1 Reconfiguration Model . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.2 Response Time Model . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.3 Optimal Reconfiguration Rate . . . . . . . . . . . . . . . . . . . . . 63
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5 Performance Modeling of Moving Target Defenses With Reconfiguration Limits . 67
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Updated Analytic Model Overview . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Reconfiguration Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.1 Core Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.3.2 Drop Reconfiguration Requests Policy . . . . . . . . . . . . . . . . . 71
5.3.3 Wait Reconfiguration Requests Policy . . . . . . . . . . . . . . . . . 74
5.4 Response Time Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.5 Combined Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.6 Simulation and Experimental Testbed . . . . . . . . . . . . . . . . . . . . . 85
5.7 Numerical Results and Validation . . . . . . . . . . . . . . . . . . . . . . . . 85
5.7.1 Analytic Model Results . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.7.2 Validation with Simulation Results . . . . . . . . . . . . . . . . . . . 90
5.7.3 Validation of the Simulation with Experimental Results . . . . . . . 96
5.7.4 Determining the Optimal Reconfiguration Rate . . . . . . . . . . . . 96
6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
vi
List of Tables
Table Page
2.1 Encryption Settings with Fitness Values and Overhead . . . . . . . . . . . . 16
2.2 Authentication Settings with Fitness Values and Overhead . . . . . . . . . . 16
2.3 Response Times and Fitness Values . . . . . . . . . . . . . . . . . . . . . . . 16
3.1 Sample Case Study Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2 Improvement from Adding MTDs . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Case Study Optimal Configuration . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Attacker Success Rates for all Combinations of Interarrival Rates . . . . . . 40
3.5 Availability for all Combinations of Interarrival Rates . . . . . . . . . . . . 41
4.1 Summary of Variable Names and Descriptions . . . . . . . . . . . . . . . . . 45
4.2 Values of Variables used in Numerical Results . . . . . . . . . . . . . . . . . 58
4.3 Comparison of Availability Results. . . . . . . . . . . . . . . . . . . . . . . . 61
5.1 Summary of Variable Names and Descriptions . . . . . . . . . . . . . . . . . 69
5.2 Example of the Aggregate Departure Rate for c = 10 and c∗ = 4 . . . . . . 79
5.3 Values of Variables Used in Simulation Results . . . . . . . . . . . . . . . . 86
5.4 Comparison of Simulation and Experimental Results for Availability . . . . 96
5.5 Comparison of Simulation and Experimental Results for Response Time . . 97
vii
List of Figures
Figure Page
2.1 Suggested Methods for MTD Quantification . . . . . . . . . . . . . . . . . . 11
2.2 Probabilistic Attack Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Connection Loss and Attacker’s Success as a Function of Shuffle Rate . . . 15
2.4 Sample Sigmoid Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Quantification Framework Layers . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Computing MTD Effectiveness . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Case Study Quantification Framework . . . . . . . . . . . . . . . . . . . . . 29
3.4 Case Study Optimal Configuration . . . . . . . . . . . . . . . . . . . . . . . 33
3.5 Experimental Setup for Combined MTD Experiments . . . . . . . . . . . . 36
3.6 Histogram of Number of VMs Compromised for Service Rotation . . . . . . 38
3.7 Monitor Results for Service Rotation . . . . . . . . . . . . . . . . . . . . . . 38
3.8 Attacker Success Rate and Availability for Service and IP Rotation . . . . . 39
3.9 Comparison of Service and IP Rotation on Attacker Success Rate and Avail-
ability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1 Queuing Representation of the Reference Scenario . . . . . . . . . . . . . . 44
4.2 Analytic Model Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Reconfiguration Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 CTMC for the Reconfiguration Model . . . . . . . . . . . . . . . . . . . . . 48
4.5 CTMC for the Response Time Model . . . . . . . . . . . . . . . . . . . . . . 51
4.6 Probability of Success Ps vs. Time for Ts = 10 . . . . . . . . . . . . . . . . 54
4.7 Experimental Setup for Quantitative Analysis . . . . . . . . . . . . . . . . . 56
4.8 Control Flow and Movement for Quantitative Analysis . . . . . . . . . . . . 57
4.9 Distribution of the Number of Resources Being Reconfigured for c = 20 . . 59
4.10 Availability vs. Reconfiguration Rate α . . . . . . . . . . . . . . . . . . . . 60
4.11 Comparison of Number of Resources being Reconfigured (α = 0.02 rec/sec) 60
4.12 Number of Available Resources and Response Time for Two Trials with Dif-
fering Values of α . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
viii
4.13 Response Time: Simulation vs. Analytical Model with Stability . . . . . . . 64
4.14 Optimization Analysis to Find the Maximum Feasible Reconfiguration Rate
(α) for c = 20 and S = 60 sec . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.1 Analytic Model Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Target and Effective Reconfiguration Rate . . . . . . . . . . . . . . . . . . . 71
5.3 Flowchart of the Reconfiguration Cycle under the Drop Policy . . . . . . . 72
5.4 State Transition Diagram of the Markov Chain for the Reconfiguration Model
under the Drop policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.5 Flowchart of the Reconfiguration Cycle under the Wait Policy . . . . . . . . 75
5.6 State Transition Diagram of the Markov Chain for the Reconfiguration Model
under the Wait Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.7 State Transition Diagram for the Response Time Model . . . . . . . . . . . 78
5.8 Average Availability and Resource Utilization for Drop and Wait Policies . 87
5.9 Average Response Time and Resource Age for Drop and Wait Policies . . . 88
5.10 Availability for Varying Levels of c∗ . . . . . . . . . . . . . . . . . . . . . . 88
5.11 Average Resource Utilization for Varying Levels of c∗ . . . . . . . . . . . . . 89
5.12 Average Response Time for Varying Levels of c∗ . . . . . . . . . . . . . . . 90
5.13 Probability Distributions of pk and pk for Varying Levels of α . . . . . . . . 91
5.14 Comparison of pk Between Simulation and Analytical Model for Drop Policy 91
5.15 Comparison of pk Between Simulation and Analytical Model for Wait Policy 92
5.16 Comparison of Availability and Response Time Between Simulation and An-
alytical Model for Drop Policy . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.17 Comparison of Average Resource Age and Drop Percentage Between Simu-
lation and Analytical Model for Drop Policy . . . . . . . . . . . . . . . . . . 93
5.18 Comparison of Effective Reconfiguration Rate Between Simulation and An-
alytical Models for Drop Policy . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.19 Comparison of Availability and Response Time Between Simulation and An-
alytical Model for Wait Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 94
5.20 Comparison of Average Resource Age and Reconfiguration Delay Between
Simulation and Analytical Model for Wait Policy . . . . . . . . . . . . . . . 95
5.21 Comparison of Effective Reconfiguration Rate Between Simulation and An-
alytical Models for Wait Policy . . . . . . . . . . . . . . . . . . . . . . . . . 95
5.22 Utility Values of Various Weight Combinations for Drop Policy . . . . . . . 98
ix
Abstract
A QUANTITATIVE FRAMEWORK FOR CYBER MOVING TARGET DEFENSES
Warren J. Connell, PhD
George Mason University, 2017
Dissertation Directors: Dr. Massimiliano Albanese/ Dr. Daniel A. Menasce
Moving Target Defenses (MTDs) are techniques used to defend computer networks that
seek to delay or prevent attacks during any phase of the cyber kill chain by dynamically
changing the makeup of the systems or network such that an effective attack cannot be
planned or executed. There are a variety of methods available to implement MTDs, such
as dynamically changing network addresses, memory addresses, user-level services, or even
operating systems or data. These changes can take the form of changing signatures or
outward appearance, or actual changes in network configuration or software.
Although many schemes are described in the literature, there is no universal method to
measure their effectiveness. Likewise, there is very little uniformity in how the overhead of
these techniques is measured, if it is even mentioned at all. These factors make it difficult,
if not impossible, to effectively compare MTDs. Therefore, a quantification framework for
MTDs is needed to properly compare MTDs or optimize their performance.
Additionally, many MTDs have a limited scope that usually only covers a subset of
potential attack vectors with no single solution that offers protection in every scenario.
Ideally, several techniques could be combined to provide defense-in-depth, but integration
is often lacking and the lack of universal metrics for evaluating performance prevents us
from assessing the combined impact of multiple techniques.
This work presents a framework for comparing different MTDs or the combined effects
of a set of MTDs by calculating a utility value as a function of the impact the MTD has
on the attacker’s success rate or level of additional effort required. It also calculates a
utility value as a function of the overhead. The weighted average of these utility values can
then be used to compute an aggregate utility value. This model is then tested by several
experiments that compare a variety of MTDs, observing their combined effect, and finding
optimal settings for each MTD.
The proposed framework fulfills the need for a systematic approach to compare MTDs
with one another despite their diversity and make an optimal selection of techniques for a
given scenario. The framework may also be used to find an optimal combination of settings
for those MTDs and adapt their settings for changing external conditions. The model is
not only designed to accommodate existing MTD techniques, but can be extended to work
with any future techniques that may appear. It may also guide future research efforts
by identifying commonly-used MTDs for integration or potentially identify focus areas for
MTD development to address common gaps in coverage.
To further support this concept, we also propose a quantitative analytic model for
assessing the resource availability and performance of MTDs, and a method for determining
the reconfiguration rate that maximizes a utility function that incorporates the tradeoffs
between the attacker’s success probability and response time. This model may be used to
evaluate an individual MTD or used in conjunction with the MTD quantification framework.
The analytic results are validated by simulation and experimentation.
Chapter 1: Introduction
1.1 Background and Motivation
Moving Target Defenses (MTD) are cyber defenses that seek to dynamically change some
aspect of the system being defended, thus removing the adversarys advantage of being
able to study the target system to find vulnerabilities and plan their attack. By working
proactively to disrupt or delay an attacker, MTDs can offer some measure of protection
against unknown (“zero-day”) or even exposed vulnerabilities. As a result, MTD offers a
great potential in turning the asymmetry typical of a cyber security landscape in favor of the
defender and has been heralded as a “game changer” in the field of research [1]. Since then,
a myriad of techniques have been developed since the term first surfaced in the literature,
each targeting different aspects of a system.
However, too often each of the proposed techniques only addresses a narrow subset of
potential attack vectors and different techniques tend to measure their effectiveness in differ-
ent and often incompatible ways. Additionally, in order to provide a comprehensive security
solution, using multiple techniques in conjunction with each other should be considered, but
this raises new issues in terms of optimal selection of a subset of available techniques.
Although some survey papers note where certain MTDs might not work well together [2],
or give a qualitative estimate of their effectiveness and cost [3], a quantitative framework
that can accommodate any existing or future MTDs is essential if this area of research is
to progress past specialized, isolated solutions. The primary research problem this work
addresses is the design and validation of a model to quantify the performance of diverse
MTDs as well as their costs. This work also explores other methods for calculating overall
utility and finding the optimal choice and settings for these MTDs.
1
1.2 Thesis
It is possible to quantify the performance of MTDs by analytically predicting their effective-
ness and response time, and to use this quantification to determine the optimal configuration
for any combination of varying MTDs.
To do this, we must find a way to map MTDs and their settings to a utility value
that captures their effectiveness. We do this by noting that in the reconnaissance phase
of the cyber kill chain, MTDs primarily act by disrupting some portion of the attacker’s
knowledge. Thus, we can map MTDs to knowledge of various aspects of the system. From
there, that knowledge is then leveraged to exploit software weaknesses and we can map that
knowledge to the classes of exploits they enable. Finally, based on the overall probability of
each exploit occurring at a given service, we can arrive at a value that captures the overall
effectiveness of the MTD.
To determine the level of attacker disruption we can analyze an individual MTD to pre-
dict resource age or other measures. For a shuffling or rotation-based MTD, the disruption
is reflected in the average age of a resource, because a smaller window decreases the chance
of success. To determine cost, we can analytically determine response time, which captures
the effects of increased memory, runtime, or bandwidth requirements. We can also use
these values to determine a utility value that represents the tradeoffs of a particular MTD
configuration.
1.3 Research Approach
To address these issues, I present a 4-layer model for MTD quantification that captures
the relationships between MTDs, knowledge, software weaknesses, and individual services.
By expressing the effects that MTDs have on required knowledge as a probability, we can
propagate those values to also calculate the chances of a software weakness being exploited
and determine an overall value for the effectiveness of the MTD.
I also present a method to determine the characteristics of an MTD by using Continuous
2
Time Markov Chains to model the effects of the reconfiguration rate on a system’s security
and response time. This model is validated by the use of simulations and experiments.
From there, I formulate a utility function that takes effectiveness and cost into account,
which can then be used to find an optimal selection and configuration of MTDs for a given
scenario.
1.4 Contributions
This work fulfills this pressing need for a unified framework to comparatively measure
MTDs. We present a novel framework that captures the relationships between available
MTDs and the information such MTDs may affect through probabilistic measures. It also
captures the relationships between services, their software weaknesses, and the knowledge
required to exploit such weaknesses to probabilistically determine the effectiveness of any
given technique or set of techniques, regardless of how they operate.
Likewise, the choice of response time as the primary measure of system overhead cap-
tures the cost of MTDs in a straightforward manner. The model allows for evaluation and
comparison of multiple concurrent MTDs that are required to protect against varied threats
and determine where they might synergize or conflict. Accounting for multiple settings and
measuring their effect on effectiveness and cost is also possible using the model.
While the model is simple in concept, it also lends itself to several extensions. If a
threat model tends to prioritize certain classes of attacks or a service is specifically more
vulnerable to certain attacks, this can be accounted for by using a weighing factor. Likewise,
since effectiveness is based on probability values, they can also be used to make informed
calculations of risk.
Furthermore, the framework has the following desirable attributes:
• Generality: Any existing MTD should be able to fit within the framework. The
relationship between an MTD and the knowledge it protects serves as the interface
that enables to plug that MTD into the framework.
3
• Extensibility: Any future MTD must also be able to fit within the framework,
regardless of how it operates. New MTDs, areas of knowledge to disrupt, or even
classes of software weaknesses can be added to the framework.
• Resilience: Because the framework covers general classes of software weaknesses
rather than specific vulnerabilities, it is less vulnerable to unknown threats and 0-day
attacks.
• Flexibility: The framework is simple and intuitive and can be used in many possible
ways. It may be used for rough estimates of utility values or for more fine-grained
estimates when more fidelity is required.
• Practicality: The framework does not ignore the issue of cost or overhead when
determining the utility of a technique. This can be either incorporated as a simple
constraint or into the overall utility of a proposed solution.
The analytic model for determining MTD effectiveness and cost also makes several im-
portant contributions: (i) The use of Continuous Time Markov Chains to measure MTD
security and performance. Although Markov Chains have been widely used in computer
science since their introduction, their application in capturing the performance of MTDs is
novel. (ii) A method for determining the reconfiguration rate that maximizes a utility func-
tion that incorporates the tradeoffs between the attacker’s success probability and response
time. (iii) The findings for effectiveness and response time values from this model can serve
as inputs to the quantification framework previously described.
1.5 Organization
This dissertation is organized as follows. Chapter 2 covers background and various forms
of moving target defenses, as well as some background on autonomic systems and attack
graphs. Chapter 3 introduces the problem statement and covers the development of the
4
quantification framework and experiments showing how it might be applied. Chapter 4 de-
scribes an analytical model for MTDs using Markov Chains to determine effectiveness and
response time, the simulations and experiments used to validate the model, and findings
regarding the stability of the model. Chapter 5 further improves the model, introducing
policies to limit reconfigurations and preserve response time, further simulations and ex-
periments to validate the analysis, and formulation of a utility function to determine an
optimal reconfiguration rate. The dissertation concludes in Chapter 6 with a summary of
findings and discussion of future work.
5
Chapter 2: Background and Related Work
2.1 Moving Target Defense
Moving Target Defense was first introduced in a series of papers that modeled a system’s
security as a function of its exposed attack surface and showed how MTDs increased diversity
based on software and network transformations [1]. Later papers expanded on this concept,
incorporating aspects of game theory, where an attacker or defender may adopt different
strategies based on the actions of the other [4] or introduce machine learning into MTD
behavior [5].
Since its introduction, a myriad of MTD techniques have been developed in the litera-
ture, each targeting different aspects of a system. Today, they are generally organized by
type according to a taxonomy published by Lincoln Labs [2][6] into the following categories:
• Dynamic Runtime Environments
• Dynamic Platforms
• Dynamic Software
• Dynamic Data
• Dynamic Networks
Although the MTD taxonomy described covers most MTDs as they apply to conven-
tional computer systems, MTD techniques have also been applied on several other plat-
forms that don’t fall neatly into those categories. For example, MTDs have been studied
in resource-constrained environments such as tactical network devices or FPGAs [7], cyber-
physical systems [8], and wireless sensor networks [9][10].
6
2.1.1 Dynamic Runtime Environments
Dynamic Runtime Environments involve changing the environment presented to an appli-
cation dynamically. This is typically done at a very low level and consists of two major cat-
egories: Address Space Layout Randomization (ASLR) and Instruction Set Randomization
(ISR). ASLR protects against buffer overflow attacks by randomizing key locations of mem-
ory [11] and are some of most mature and widely-adopted forms of MTD in use today. Since
first being introduced, many improvements have been proposed, such as changing the focus
of the MTD from preventing invalid memory accesses to offering unpredictable results [12]
or by randomizing instructions on the fly to improve entropy [13]. Another technique that
incorporates aspects of address randomization in its protection is DieHard [14] [15], which
also protects against heap buffer overflows by increasing space between elements and main-
taining multiple replicas of the heap and using voting to ensure control is not subverted.
ISR works to mitigate Return-Oriented Programming (ROP) and code injection attacks
that ASLR does not protect against [16] by ensuring injected code is not immediately
compatible with the target, often by performing simple encryption or adding some additional
required label to each opcode. This can be done at compile time [17], or performed at run-
time in an emulator [18][19]. Is it noted that ISR techniques can often be used in conjunction
with ASLR techniques to supplement each other [2].
2.1.2 Dynamic Platforms
Dynamic Platform MTDs operate at a slightly higher level of abstraction than Dynamic
Runtime Environments by changing platforms such as OS version, OS instance, or CPU ar-
chitecture dynamically. Virtualization is relied upon heavy to implement these techniques.
One method would be to operate using multiple distributions of the Linux OS and ro-
tate between them [20], or by designating roles for each VM and shuffling them between
hosts [21].
Another way to realize Dynamic platforms would be use multivariant systems, a setup
7
where multiple variations of an OS are run at the same time and monitored for any diver-
gence [22]. The variants are specifically crafted so that a malicious attacker attempting to
divert control would only do so one one of the variants, which would then be easily detected
and reverted to a known good state.
Making OS changes on a regular interval can be disruptive to running applications but an
MTD can accomplish this by first taking a snapshot of the current state, execution state,
open files, and network sockets [23]. Other MTDs use similar methods of snapshotting
system images and replacing them with known good copies if tampering is detected or to
disrupt attacker’s persistence on a system [24][25][26].
2.1.3 Dynamic Software
MTDs classified as Dynamic Software often operate similarly to Dynamic Platforms, only
that the focus is more on application level than the OS level. The grouping, order, format,
or the actual instructions within an application’s code can be changed dynamically. One
software approach would be to generalize DieHard for invididual applications instead just
OS use [27]. Other approaches in this category include Multivariant approaches that run
several different versions of software to prevent all machines being compromised by the same
exploit [28]. A simpler implementation of this approach uses a single replica compiled with
the stack working in the opposite direction so that an exploit cannot work on both [29].
Another Dynamic Software Method would to be implement some sort of shuffling or rotation
between software that is currently being executed [30].
2.1.4 Dynamic Data
Dynamic Data MTDs tend to be even more application-specific, focusing primarily on
making some sort of continuous transformation to the format, syntax, encoding, or repre-
sentation of an application’s data. This might take the form of altering HTML tags from a
web server to thwart bots (but allowing legitimate users to render them correctly) [31] or
adding additional required keywords to SQL commands and table names to prevent SQL
8
Injection attacks [32].
2.1.5 Dynamic Networks
Dynamic Networks involves changing network addresses or other properties dynamically.
Dynamic Networks are one of the most widely studied areas of Moving Target Defense, as
most cyber attackers use computer networks as an attack vector and network MTDs can be
implemented at a level of abstraction above individual systems or applications. This sort
of protection is attractive because if an attacker cannot even find the system they want to
target on the network, then the defense would be considered effective.
Perhaps the earliest and most oft-cited example of a Dynamic Network MTD would be IP
hopping [33][34], but many other variants exist. For example, a scheme could include decoy
nodes and shuffle them regularly along with actual nodes to further delay attackers [35].
Instead of changing the target system IP addresses directly, an MTD can implemented
instead by a series of rotating proxies that know the actual address of that of the target [36].
An improvement on the IP-hopping scheme is Random Host Mutation [37][38] which is
implemented at the DNS server and maps ephemeral IP addresses (eIP) to real IP addresses
rIP). This technique randomizes host-to IP bindings based on source identity and time [39]
and is able to maintain connection states. The technique also has the ability to adapt to an
attacker by moving hosts to addresses with a lower probability of being scanned or moving
nodes to addresses that have already been scanned [40].
Instead of centralizing operation of the MTD, it is possible to implement it across an en-
tire network by using a hypervisor to rewrite packets at each node to make each network hop
dynamic. The Self-Shielding Dynamic Network (SDNA) protocol also allows for encryption,
authentication, and redirection to a honeypot for unauthenticated users [41][42][43].
Besides actually changing IP addresses, a network MTD can also take other actions
to virtually affect the network and disrupt attackers. For example, an MTD might only
manipulate an attacker’s view of the network, using some sort of protocol scrubber [44] or the
dynamic defense could come in the form of lightweight sensors that are able to move around
9
the network and swarm around any areas where there are potential discrepancies [45].
It is worth noting that network MTDs also take advantage of evolving technology. IPv6
offers a vastly larger address space and therefore greater entropy to techniques that use
it. MT6D uses the IPv6 address space to create an encrypted tunnel that uses a range of
addresses and ensures protection as well as privacy [46][47]. This technique is also applicable
to embedded systems on the smart grid using IPv6 [48] or as part of a hybrid approach
with a mix of static and dynamic IP addresses [49] to protect mobile-enabled systems.
2.1.6 MTD Quantification
With the great amount of variety in MTD techniques, it is not surprising to find that
they are often quantified in completely different ways. One paper suggests dividing MTD
techniques into “low-level” and “high-level” methods, with low-level methods (such as those
dealing with the runtime environment or OS) tend to have their effectiveness measured via
attack experiments, while high-level methods look at the system as a whole and compute
effectiveness via simulation and/or probability models [50]. An expert survey also suggests
that several different methods be used to measure effectiveness and cost of MTDs, as seen
in Figure 2.1 [3].
The analytic method provides a precise measure of effectiveness if the attack model
allows for it. For example, an MTD that dynamically re-maps the association between
systems and their addresses to avoid probes looking for a vulnerable system can use a
probabilistic urn model to calculate its effectiveness [51]. In the static case, the probability
of at least one successful probe given k probes and v vulnerable machines out of n machines
is:
P (Xk > 0) = 1− P (Xk = 0) = 1−(n−vk
)(nk
) (2.1)
And if all the systems and addressees are completely shuffled between probes, this
probability becomes:
10
!"#$%&'()*
+,#&-*./*
0#*
23)&435*
63&7./8*
9':;$#&<
'.")*
=35*
23#:'">*
?@A3/&*
9;/B3%)*./*
?$'('&#&'.")*
CA3/#&'."#$*
63&7./8*
?DD3(&'B3"3))*
!
O
O
"
!
O
O
E:A$3:3"&#&'."*
F.)&)*
!
M + D
O
X
X
O
!
G3/D./:#"(3*
F.)&)*
X
"
!
O
X
O
!
H)#4'$'&%*
X
X
X
X
O
"
!
93(;/'&%*
G/'./'&%*
"
!
X
"
!
"
!
O
X
M Ð Math based D Ð Data based
Sometimes
Bad
Good
! " # $ ! % & ' & ( ) ! * + , - - . + / / . ) ! 0 ' ' & 1 # 2 3 4 3 . 1 ' & 1 2 1 - % $ $ 1 5 ) % ) 6 4 ! ) 2 / ! # 7 8 / 9 7 / : 7 ; < = ' ) * 3 ! > ? 3 ' + , - - . + / / 3 . 1 ' & 1 2 1 - % $ $ 1 5 ) % ) 6 4 ! ) 2 / 3 3 / = ' ) * 3 @ > ? 3 ' 4 % 3 . A
Figure 2.1: Suggested Methods for MTD Quantification
P (Xk > 0) = 1− P (Xk = 0) = 1−(
1− v
n
)k(2.2)
However, most MTDs do not easily fit into such a mathematical model and must have
their effectiveness assessed by simulation. This is usually depicted as some form of chart
showing attacker’s success rate. This success rate may be interpreted and displayed in a
number of ways and usually contains some reference to the static case and multiple settings
for the dynamic case. For example:
• Attacker success rate for various settings [21]
• Attacker success rate over time [52]
• Asset survival rate over time [53]
• Number of completed attacks over the number of attacks attempted [54]
11
• Ratio of infected hosts over time [38][40]
Instead of using a single metric of attacker’s success rate to measure MTD effectiveness,
several other metrics can be derived from an MTD’s effects. The authors of Random Host
Mutation introduce the metrics of Deterrence, Deception, and Detectability to MTDs in
their work [39]:
Deterrence (Π) measures the cost to the attacker in terms of additional time taken to
carry out an attack and is the ratio of the time Tm required with MTD active to the time
Ts required in the static case.
Π = Tm/Ts (2.3)
Deception (Ω) is the ratio of targets an attacker misses due to effects of an MTD, where
the N is the number of targets discovered out of M total targets:
Ω = N/M (2.4)
Detectability (Ψ) is the ratio of the number of probes Rm required with an MTD active
to the number of probes Rs required in the static case. This represents the case where
presence of an MTD may require the attacker to make more probes or other illegitimate
actions that could be detected.
Ψ = Rm/Rs (2.5)
These metrics provide a different point of view of MTD effectiveness than strict preven-
tion of attacks and show their effectiveness in disrupting and delaying attackers and can be
useful in this quantification work.
Another multi-dimensional metric, introduced by Siege Technologies’ Cyber Quantifica-
tion Framework, measures MTDs based on 4 metrics: Productivity, Success, Confidentiality,
and Integrity, for both the defender and the attacker, for a total of eight different metrics [55]
12
While several authors quantify their MTD effectiveness and do so in similar ways, far
fewer papers on MTDs report their costs in a uniform manner. Examples of reported costs
include varying execution, memory, or network overhead, or additional hardware costs [2].
Other effects might also be reported. For example, a technique designed to proactively
defend against Distributed Denial of Service (DDoS) attacks measures its cost in terms
of packet loss and additional storage resources required to run [56]. A network shuffling
technique may have a trade-off of dropped connections depending on its shuffle rate [51] or
additional latency or overhead on throughput [36].
An ideal quantification method must be able to take effectiveness and cost into account.
One paper characterizes the power of MTD both with and without cost as a factor and
optimizes utility with regard to cost but does not tie the effectiveness and cost functions to
any specific measure [57].
Finally, it is worth noting that in nearly every case where a new metric is introduced, it is
only ever applied to a single author. One expert survey does provide a thorough assessment
of the effectiveness and cost of many techniques across the spectrum of existing MTDs [3].
However, the survey is qualitative in nature and potentially subject to reviewer bias.
2.2 Attack Graphs
This work is also inspired by much of the existing work on attack graphs which defines
the possible preconditions, state transitions, and post-conditions for all possible attacks
on a network [58][59]. Attack graphs are a well-researched area of computer-security, with
several automated tools available to generate them for a given network and find all possible
attack paths [60][61].
Several extensions to the attack graph model have been proposed. Of particular interest
to this work are probabilistic attack graphs that label each potential transition state with
the probability of success as seen in Figure 2.2. In this way, we can calculate the overall
probability of attack success by propagating each probability through each possible attack
13
path [62].
Probabilistic attack graphs have been incorporated in the design of MTDs. One method
assigns roles (e.g., Authorizer, Planner, TargetDB) to hosts on the network and migrates
the roles between hosts. Having knowledge of the attack path between different roles and
the probabilities of attack succeeding can inform their decisions to migrate roles and aids
in validating their results [21].
Figure 2.2: Probabilistic Attack Graph
However, it should be noted that attack graphs have several disadvantages that must
be taken into consideration: they are often tied to specific vulnerabilities, and certain
MTDs have the potential to drastically change the attack surface such that it would require
generating an entirely new attack graph for that particular state.
14
2.3 Self-Protecting Systems
This work in Moving Target Defense is inspired by related work in autonomous computing,
particularly in self-protecting systems. Autonomous computing systems are self-configuring,
self-optimizing, self-healing, and self-protecting [63]. As autonomous systems change their
security mechanisms in response to their environment, this concept can be seen as a form
of moving target defense [64]. This could be realized by automating command and con-
trol of network defenses and MTDs to react to attackers [65][66], finding ways to combine
MTDs [67][68][69] or incorporating game theory concepts into defender actions [70][71][72].
With regards to quantification though, in order to change their settings effectively, self-
protecting systems must be able to have a way to quantify both their effectiveness and
their cost or overhead in order to provide an accurate measure of their utility. For example,
increasing the shuffle rate of an MTD might decrease attacker’s success but also impose a
cost of connection loss as seen in Figure 2.3 [51].
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
shuffle rate
pro
ba
bili
ty
Connection Loss and Attacker Success as Shuffle Rate Increases
Attacker success
Connection loss
Figure 2.3: Connection Loss and Attacker’s Success as a Function of Shuffle Rate
15
One such application is described in an autonomous system that assigns utility values
for various security settings in a streaming media application and changes those settings
to optimize security. For example, the type and strength of encryption or authentication
method may produce different fitness values F and differing amounts of overhead, with the
overall response time also having a fitness value assigned, as seen in Tables 2.1-2.3 [73]:
Table 2.1: Encryption Settings with Fitness Values and Overhead
ConfigurationEncryptionAlgorithm
key length F(conf)Performance
Overhead
(A) DES 56 0.2 0.2
(B) AES 128 0.3 0.3
(C) Blowfish 128 0.4 0.4
(D) Blowfish 448 0.5 0.5
Table 2.2: Authentication Settings with Fitness Values and Overhead
ConfigurationAuthentication
MethodStrength F(auth)
PerformanceOverhead
(A) Password 8 0.2 0.1
(B) Password 16 0.3 0.2
(C) SIM-based (EAP) 1 (COMP 128-1) 0.5 0.4
(D) SIM-based (EAP) 3 (COMP 128-3) 0.6 0.9
Table 2.3: Response Times and Fitness Values
Configuration Response Time F(lat)
(A) t <100 ms 1
(B) t >100 ms 0.75
(C) t >1 s 0.5
(D) t >4 s 0.25
(E) t >10 s 0
Similar techniques have also been used in conjunction with Intrusion Detection Systems
(IDS), either to adjust their thresholds based on a receiver Operating Characteristic (ROC)
16
curve [74], or to find the optimal configuration of multiple available IDSes that balances
security with quality of service (QoS). To find this optimal configuration, we must separately
determine utility values based on security and QoS and combine them into a global utility
value.
To determine the utility obtained from the addition of security, each security mechanism
has a particular detection rate, and if multiple are IDSes are used, an exponential average
is used to generate a lower bound for the actual combined detection rate. This detection
rate is used to generate the security utility for a role r.
USr (−→ρr) =A∑j=1
ar,j (lnN∑i=1
edi,j × εr,i) (2.6)
Likewise, QoS also contributes to the global utility function and is derived from a sigmoid
function, which is calculated as a function of the estimated response time Tr, a service level
objective (SLO) that must be met (σr), and a parameter that determines the shape and
steepness of the sigmoid (δ). The parameter κr may also be used and is chosen such that
when Tr = 0, UTr = 1.
UTr (Tr) = κreδ(σr−Tr)
1 + eδ(σr−Tr)(2.7)
The sigmoid function roughly approximates the unit step function and gives a value
between 0 and 1 based on whether the input met or did not meet the target value and by
how much. In the absence of the parameter κr, when σr = Tr, UTr = 0.5. Several sigmoids
are shown in Figure 2.4 to illustrate how the parameters can be tuned with input from
stakeholders to best meet their requirements.
The total utility for both the combination of security mechanisms and the QoS is the
sum of the individual utility values, weighted by the values wsr and wtr.
17
Figure 2.4: Sample Sigmoid Functions
UStotal (−→ρ ) =
∑∀r
wsr USr (−→ρ ) UTtotal (
−→T ) =
∑∀r
wtr UTr (Tr) (2.8)
Likewise, the utility functions are then combined using a weighted sum to determine
a global utility function. The weights α and β are chosen such that α ≥ 0, β ≥ 0, and
α+ β = 1.
Ug (−→ρ , T ) = α · UTtotal (−→T ) + β · UStotal (
−→S ) (2.9)
Once the global utility is calculated, it is sent to an autonomic security manager compo-
nent that changes the security policies to maximize utility given the current environment.
For example, a spike in workload might cause the system to lower security requirements to
maintain a target response time, giving it more flexibility than a static defense [75][76].
18
Chapter 3: MTD Quantification Framework
This chapter covers our threat model, underlying assumptions, and an overview of the quan-
tification framework, including the mathematical model. Two case studies with applications
that exemplify how the model might be used and how MTD effectiveness is computed are
also included.
3.1 Threat Model and Assumptions
The general nature of the model lets us make very broad, worst-case assumptions about the
cyber threats we are trying to protect against. These assumptions drive much of the design
of the model and will also be noted again later in this chapter where applicable.
We assume that attackers can exploit any possible attack vector against the defender.
Most techniques described in the literature only protect against a narrow subset of possible
attacks and no single MTD can protect against all possible attack vectors. This is handled
by the model by incorporating the notion of combining multiple MTDs to provide a defense-
in-depth solution against any potential attack vector.
We also make the worst-case assumption that no static defense will ever succeed in
stopping attackers, because an attacker has virtually unlimited time to plan and execute an
attack and unknown, 0-day vulnerabilities that can evade static defenses will always exist.
Only MTDs are considered to have an effect on the attacker’s success rate, and even then,
an MTD may not be perfect in its defense.
Finally, we must assume that attackers can be stopped or at least slowed down by pre-
venting them from acquiring accurate knowledge about the target system. The primary
focus here is on the reconnaissance phase, when that knowledge is gathered in order to plan
and execute attacks. The defender’s goal can be achieved by either preventing attackers
19
from accessing that knowledge altogether or by delaying them from acquiring the knowl-
edge until it is no longer useful. This is one of the primary strengths of MTDs as proactive
defenses that can shift the balance of power back to the defender.
We also make several additional simplifying assumptions throughout this chapter that
are summarized here. Future work will allow for revision of many of these assumptions in
order to further generalize the approach.
We assume that services and weaknesses as we define them are time-invariant. We also
assume that services and knowledge blocks as we define them are independent, but multiple
services with dependencies could be modeled. We currently assume that each MTD has
a predefined optimal configuration of its parameters, and that if multiple MTDs affect a
knowledge block, they do not interact and only the most effective one is considered.
3.2 Quantification Framework
It is the goal of this work to develop a unified framework to evaluate the joint effect of
multiple techniques with respect to both effectiveness and cost/overhead. By developing
the capability to quantify MTD techniques, we can also compare any two techniques or sets
of techniques and determine an optimal deployment.
As shown in Figure 3.1, the MTD quantification framework consists of four layers: (i) a
time-invariant service layer that represents the set S of services to be protected; (ii) a
weakness layer that represents the set W of general classes of weaknesses that may be
exploited; (iii) a knowledge layer that represents the set K of all possible blocks of knowledge
required to exploit those weaknesses; and (iv) an MTD layer that represents the set M of
available MTD techniques.
As a motivating example, we consider a SQL service running with an overly simplified
set of weaknesses, required knowledge blocks, and three MTDs available to protect it, as
seen in Figure 3.1.
20
S1SQL DB
W1SQL
Injection
W2Buffer
Overflow
M1Service
Rotation
M2IP Rotation
M3ASLR
K2Knows(IP)
K1Knows(service)
K3Knows(memory)
Layer 4MTD
Layer 3Knowledge
Layer 2Weakness
Layer 1Service
Figure 3.1: Quantification Framework Layers
3.2.1 Mathematical Model
The proposed MTD quantification framework can be formally defined as a 7-tuple
(S,RSW ,W,RWK ,K,RKM ,M) to capture the relationships between the different layers,
where:
• S, W, K, M are the sets of services, weaknesses, knowledge blocks, and MTD tech-
niques, respectively,
• RSW ⊆ S ×W represents relationships between services and the common weaknesses
they are vulnerable to,
• RWK ⊆ W×K represents relationships between weaknesses and the knowledge blocks
required for an attacker to exploit them, and
• RKM ⊆ K × M represents relationships between knowledge blocks and the MTD
techniques that affect them.
The proposed model induces a k-partite graph (with k = 4) G = (S∪W∪K∪M,RSW ∪
RWK ∪RKM ).
21
3.2.2 4-Layer Model
This k-partite graph can be represented as a 4-layer model, which is described in greater
detail here.
The first layer represents the set S of services we wish to protect. From the attackers’
point of view, it could also represent a goal state they wish to reach by exploiting a weakness.
We assume that the services are time-invariant, i.e., the nature of the services does not
change over time, and they cannot be taken down to prevent attacks, as this action would
result in a denial-of-service.
For the sake of presentation, we only consider one service in all of the case studies in
this work, but the model could be extended to consider multiple services running with
dependencies between them, similar to how an exploit chain might occur within attack
graphs.
The second layer represents the set of weaknesses W that services are vulnerable to.
We choose general classes of weaknesses rather than specific vulnerabilities because there
are too many vulnerabilities to enumerate and, depending on the MTD used, the specific
vulnerabilities may change over time. Using general weaknesses when building the model
makes them time-invariant.
The examples used in this work draw these weaknesses primarily from MITRE’s Com-
mon Weakness Enumeration (CWE) project [77], particularly from those known as the
“Top 25 Most Dangerous Software Errors.” Although many of the top software errors are
primarily the result of bad coding practices and better solved at development time, the top
software errors such as SQL Injection, OS Injection, and Classic Buffer Overflow are often
addressed at runtime by MTDs and make for good general categories of weaknesses.
The Microsoft STRIDE Threat Model [78] has also been used as a source of general
threats to draw from in MTD research [79] and can fill in areas where CWE may be
lacking. For example, Information Disclosure (eavesdropping) and Denial of Service are
not specifically addressed by CWE.
22
This example shows two weaknesses, SQL Injection and Buffer Overflow. More weak-
nesses such as OS Injection might also be included in a more complex example, while other
weaknesses, such as Cross-Site Scripting, would not be applicable to this service.
The third layer represents the specific knowledge blocks K required to effectively exploit
a weakness. This knowledge might be required to plan an attack even when no MTD is
deployed (such as a victim’s IP address) or it may be an additional piece of information
required due to the use of an MTD. For example, SQLRand adds a keyword to SQL com-
mands, which must be known for an illegitimate user to perform a SQL injection [32]. We
assume that each knowledge block at this layer is independent, and that they must be ac-
quired using different methods. For example, IP address and port number should not both
be chosen as knowledge blocks, as a method to determine one would also reveal the other.
The relationship between the knowledge layer and weakness layer is many-to-many. A
weakness could have several required pieces of knowledge to exploit it, or a knowledge block
may be key to exploiting several weaknesses. This layer of the model may also be extended
over time, as new MTDs are developed which disrupt new and different areas of an attacker’s
knowledge.
In this example, we assume that, in order to execute a SQL Injection attack, the attacker
must know something about the service being run (e.g., name and version of the specific
database software) and the victim’s IP address. In order to execute a Buffer Overflow attack,
an attacker must know the IP address and some information about the vulnerable memory
locations of the system. A higher-fidelity version of this model may take a knowledge block
and break it into smaller, more specific items that are specifically targeted by available
MTDs.
The fourth and last layer of the model represents the set M of available MTDs that
can be implemented to disrupt the attacker’s knowledge required to exploit weaknesses. We
assume that, when using static defenses – i.e., no MTD deployed – an attacker will acquire
all of the knowledge necessary to exploit a weakness with probability P = 1, and we label
the edge (Kj ,Mi) ∈ RKM between an MTD technique Mi and a knowledge block Kj with
23
the probability Pi,j that an attacker will succeed in acquiring knowledge block Kj despite
the deployment of technique Mi.
For example, if technique M1 in Figure 3.1 (Service Rotation) reduces an attacker’s
likelihood of acquiring knowledge block K1 (i.e., correct version of the service) by 60%, we
would label that edge as P1,1 = 0.4. If an MTD delays an attacker by some factor, we can
also express that as a probability that the attacker will not find the correct information in
a timely manner. For example, an MTD that expands addressable memory by a factor of
10 might reduce the attacker’s probability of success to 0.1, so Pi,j = 0.1.
The exact methodology for determining the value of Pi,j may vary from MTD to MTD
and is a separate line of research. For the purposes of this chapter, we assume that such
optimal configuration has already been identified for each available MTD technique, along
with the corresponding value of Pi,j and the corresponding cost.
Part of the future work involves developing a general approach to modeling the relation-
ship between cost and effectiveness of MTD techniques, as we vary the values of a technique’s
tunable parameters and other aspects of the attacker/defender interaction. Ultimately, this
approach will enable us to identify the optimal configuration for each technique.
Expressing MTD effectiveness in terms of the probability an attacker will succeed in
acquiring required knowledge normalizes the values across multiple diverse techniques in the
[0,1] range, with a theoretically perfect MTD yielding Pi,j = 0, and a completely ineffective
MTD yielding Pi,j = 1. These edge weights are the first values used when computing the
overall effectiveness of an MTD or set of MTDs.
In this example, we apply a service rotation MTD scheme to disrupt knowledge of what
version of service is actually running at any given time, and naıvely assume that rotating
between 4 services reduces the attacker’s probability of correctly knowing which service is
running to P1,1 = 0.25. We apply an IP address rotation scheme to mask the victim’s
IP address. We know from the literature that perfect shuffling results in the attacker’s
likelihood of guessing the correct IP address to be 0.63 at best [51], so we use a conservative
estimate for effectiveness and estimate P2,2 = 0.75. Finally, to protect the knowledge of the
24
memory layout, we use an ASLR scheme that expands the addressable memory such that
the attacker has only a P3,3 = 0.1 probability of having the correct information.
3.2.3 Computing MTD effectiveness
We measure an MTD’s effectiveness starting from the top layer of the model and work our
way down to find the overall probability of attacker’s success. First, we define P (Kj) as the
probability that the attacker has the correct information about knowledge block Kj . Then,
we calculate values of P (Kj) for each knowledge block in layer 3, based on the MTDs that
affect them. If there is no MTD or that MTD is not active, we assume that the attacker is
guaranteed to obtain that information, i.e., P (Kj) = 1.
In this example, each knowledge block has only one MTD that affects it. If multiple
MTDs affect a knowledge block, we can make the simplifying assumption that the resulting
effect is equal to the effect of the best-performing MTD. Therefore:
P (Kj) =
1, if @Mi ∈M s.t. (Kj ,Mi) ∈ RKM
minMi∈M s.t. (Kj ,Mi)∈RKM
Pi,j , otherwise(3.1)
A possible improvement to the model would be to capture the effect of multiple MTDs
acting on the same knowledge block by using some function that would show either di-
minishing returns or other interactions of multiple MTDs acting on the same knowledge
blocks.
Next, we determine the probability P (Wk) that an attacker has gained all the knowledge
required to exploit a given weakness Wk. Since each knowledge block is independent, this
is simply the product of the probabilities associated with all knowledge blocks leading to it.
P (Wk) =∏
Kj∈K s.t. (Wk,Kj)∈RWK
P (Kj) (3.2)
In this example, when calculating P (W1) and P (W2) for SQL Injection and Buffer
25
Overflow, respectively, we obtain:
P (W1) = 0.25 · 0.75 = 0.1875
P (W2) = 0.75 · 0.10 = 0.075
Finally, we must determine the defender’s utility U gained by deploying MTD tech-
niques, based on the reduced probability of exploit for each class of weaknesses. One
potential utility measure could be a function of the probability P (Sl) that an attacker
can compromise a service Sl by exploiting any of the weaknesses leading to it. P (Sl) can
be computed as the probability of the union of non-mutually exclusive events, using the
Inclusion-Exclusion principle. With respect to our running example, P (S1) can be com-
puted as follows:
P (S1) = P (W1 ∪W2) = P (W1) + P (W2)− P (W1 ∩W2) (3.3)
Because W1 and W2 are not necessarily independent (as we see in this example), we
cannot assume P (W1 ∩W2) = P (W1) · P (W2). Instead, we must express each P (W ) in
terms of its corresponding independent knowledge blocks Kj ,
P (W1) = P (K1) · P (K2)
P (W2) = P (K2) · P (K3)
P (W1 ∩W2) = P (K1) · P (K2) · P (K3)
and then express P (S1) as a function of probabilities P (Kj):
P (S1) = P (K1) · P (K2) + P (K2) · P (K3)− P (K1) · P (K2) · P (K3)
26
which results in
P (S1) = 0.25 · 0.75 + 0.75 · 0.1− 0.25 · 0.75 · 0.1 = 0.244
For graphs with 3 or more weaknesses, we can expand Eq. 3.3 to the generalized form
of the Inclusion-Exclusion Principle [80]:
P
⋃Wk∈W
Wk
=
|W|∑i=1
(−1)i−1 ·∑
W∗∈2W s.t. |W∗|=i
P
⋂Wj∈W∗
Wj
We can then compute P (Sl) programatically based on the graph model using the fol-
lowing algorithm:
Algorithm 1 ComputeGoalProbability(W,K,RWK)
Input: A set of weaknesses W, a set of knowledge blocks K, and the set RWK of edgesbetween them
Output: P (Sl), the probability of at least one weakness being exploited1: P (Sl)← 02: for i = 1 to |W| do
3: for all W∗ ∈ 2W s.t. |W∗| = i do4: K∗ ← Kj ∈ K|∃Wk ∈ W∗ s.t. (Wk,Kj) ∈ RWK5: prod← ∏
Kj∈K∗P (Kj)
6: P (Sl)← P (Sl) + (−1)i−1 · prod7: end for8: end for9: return P (Sl)
Finding the probability of the union of multiple events is an NP-hard problem that
cannot be solved in better than O(2n) time [80]. However, the general nature of the weak-
nesses represented in layer 2 of the model should naturally limit their number and keep the
running time of the algorithm manageable, as opposed to vulnerabilities which may number
in the thousands.
Once we obtain P (Sl), we can easily calculate the utility function U = 1 − P (Sl) or
use P (Sl) as the input to another utility function, such as a sigmoid with an inflection
27
S1SQL DB
W1SQL
Injection
W2Buffer
Overflow
M1Service
Rotation
M2IP Rotation
M3ASLR
K2Knows(IP)
K1Knows(service)
K3Knows(memory)
P1,1 = 0.25 P2,2 = 0.75 P3,3 = 0.1
P(W2) = 0.075P(W1) = 0.188
P(S1) = 0.244U = 0.756
P(K1) = 0.25 P(K2)= 0.75 P(K3)= 0.1
Figure 3.2: Computing MTD Effectiveness
point centered around a desired effectiveness, such as those commonly used in autonomic
computing [81]. The complete computations for each of the values in this example are
shown in Figure 3.2.
Note that this choice of utility function relies upon the expectation that at least some
measure of protection will be guaranteed on at least one knowledge block for each weakness,
otherwise the attacker will be guaranteed to exploit that weakness and reduce the utility to
0. A utility function that may solve this issue could be a weighted average of the probabilities
to exploit each weakness, similar to a measure of risk.
3.3 Experimental Evaluation
We now present a more complex example which demonstrates the capabilities of the model.
As seen in Figure 3.3, we keep the same basic service but protect it against two additional
classes of weaknesses, OS Injection (from the CWE [77]) and Eavesdropping (related to In-
formation Disclosure from the STRIDE model [78]). Because there are now four weaknesses
that contribute to the utility function, we must perform the additional calculations under
28
the Inclusion-Exclusion Principle to compute the union of events that lead the attacker to
compromise the service.
P1,1
P4,4
P5,5
P8,7
P10,8
P11,10
P(K1) P(K2 ) P(K3 ) P(K10 )P(K4 ) … P(K9 )…P(K8 )
P(W1) P(W2) P(W3) P(W4)
P(S1)U
P2,1 P2,4 P2,5
P3,2 P3,3
P6,4 P6,10
P7,8P7,6
P9,5 P9,9
Additional constraint: each MTD has a cost CSolve for max(U) within a certain budget
S1(SQL DB)
SQL Injection
Buffer Overflow
M1Service
RotationM4
IP Rotation(MOTAG)
M8ASLR
Knows(IP)Knows (application)
Knows(syscall_mapping)
OS Injection
Knows(OS)
M5OS Rotation
Eaves-dropping
Knows(path)Knows (instr_set)
Knows (stack_dir)
Knows(DBschema)
Knows (keyword)
M3SQLRand
M11Distraction
Cluster
M7Multivariant
Systems
M6Mutable
Networks
M2Intrusion-
Tolerant Sys
M9TALENT
M10Reverse Stack
Execution
Knows (mem_address)
P(S1): Chance attacker reaches goalU = Utility factor: chance of 0 exploits occurring
Pi,j = probability of attacker having knowledge, based on disruption from 1 MTDP(Kj) = probability of attacker having knowledge, based on all MTDsP(Wk) = probability of attacker having all required knowledge (exploit occurs)
Figure 3.3: Case Study Quantification Framework
In several cases, the knowledge blocks required to exploit a weakness have been ex-
panded to provide more detail or to fit the specific MTDs selected for the case study. For
example, knowledge block Knows(Memory) has been broken into separate blocks related to
system call mapping, memory address, and stack direction, and SQL Injection now requires
knowledge of keywords appended to SQL commands and some knowledge of the database
schema, both of which are disrupted by SQLRand.
Most importantly, we can now observe the many-to-many relationships between weak-
nesses, knowledge blocks, and MTDs and conclude that finding the optimal solution is no
longer trivial. However, as long as we have accurate values of Pi,j for each MTD and some
cost constraint, we can determine the final utility as a function of selected MTDs using the
steps previously shown and find an optimal solution using a problem solving method of our
choice, such as stochastic hill climbing or evolutionary methods.
29
Table 3.1: Sample Case Study Evaluation
MTD Pi,j Cost Active? Pi,j (effective) Cost (effective)
M1 (Service Rotation) P1,1 0.500 15 No 1.000 0
M2 (Intrusion Tolerant Systems) P2,1 0.900 25 No 1.000 0P2,4 0.900 1.000P2,5 0.900 1.000
M3 (SQLRand) P3,2 0.300 20 No 1.000 0P3,3 0.300 1.000
M4 (IP Rotation/MOTAG) P4,4 0.900 25 No 1.000 0
M5 (OS Rotation) P5,5 0.700 15 No 1.000 0
M6 (Mutable Networks) P6,4 0.500 20 Yes 0.500 20P6,10 0.500 0.500
M7 (Multivariant Systems) P7,6 0.500 20 No 1.000 0P7,8 0.500 1.000
M8 (ASLR) P8,7 0.500 10 Yes 0.500 10
M9 (TALENT) P9,5 0.500 20 No 1.000 0P9,9 0.500 1.000
M10 (Reverse Stack Execution) P10,8 0.500 20 No 1.000 0
M11 (Distraction Cluster) P11,10 0.500 20 No 1.000 0
Knowledge: Total Cost 30Knows(application) 1.000 Total Budget 120Knows(keyword) 1.000Knows(DBschema) 1.000 Cost:Knows(IP) 0.500 High 25Knows(OS) 1.000 Medium 15Knows(syscall mapping) 1.000 Low 5Knows(mem address) 0.500Knows(stack dir) 1.000 Effectiveness:Knows(instr set) 1.000 High 0.3Knows(path) 0.500 Medium 0.5
Low 0.9Chance of attack success:SQL Injection 0.500OS Injection 0.250Buffer Overflow 0.250Easvesdropping 0.250
Chance of attacker success: 0.500Utility 0.500
As a proof of concept, we can take the model in Figure 3.3 and perform all the necessary
computations programatically. For the purpose of this example, we use qualitative values
for Pi,j and cost from an expert survey [3] which estimates the relative effectiveness and
cost of several MTD techniques by grouping them into categories of Low, Medium, or High.
Whether or not an MTD is active can be treated as a Boolean variable, with inactive MTDs
30
implying an attacker’s probability of success of 1 and a cost of 0.
The values from a sample MTD setup are shown in Table 3.1. The interim calculations
for the probabilities of each knowledge block being acquired and each weakness being able
to be exploited are also shown.
3.4 Applications
Now that we have a method to measure the effectiveness of an MTD deployment, we can
compare MTDs even if they affect vastly different aspects of a system by comparing their
overall results on the general classes of exploitable weaknesses. We show examples of two
different ways the framework may be applied to observe how MTDs might affect the overall
security of a service.
3.4.1 Comparing MTDs
Given a setM of MTD techniques, we would like to identify the one which adds the highest
overall utility.
With respect to the example of Figure 3.3, we start from a baseline deployment including
M6 (Mutable Networks) and M8 (ASLR) to ensure we have a starting utility value to
compare with. The starting deployment is the same one shown earlier in Table 3.1. We
then measure the updated utility value after individually adding each of the other MTDs
to the baseline deployment, and examine the results reported in Table 3.2.
Table 3.2: Improvement from Adding MTDs
MTD Utility DeltaM1 (Service Rotation) 0.5625 0.0625
M2 (Intrusion Tolerant Systems) 0.513 0.013M3 (SQLRand) 0.614 0.114
All Others 0.500 0.000
From the results, we find that, given the preexisting condition of M6 and M8 being
31
active, M3 (SQLRand) offers the greatest increase in utility, with M1, M2, and M3 being
the only ones offering any increase at all. To explain these results, we observe that there is
a lower bound on P (S1) that translates into an upper bound on U .
P (S1) ≥ max(P (W1), P (W2), P (W3), P (W4)) (3.4)
In other words, the overall defense can only be as strong as the protection against
exploitation of its most vulnerable weakness, which in turn benefits from the deployment
of multiple MTDs. Therefore, given the baseline conditions, only an MTD that affects the
most vulnerable weakness will yield any improvement in the utility value. This procedure
could be used iteratively in an attempt to find an optimal solution in a greedy manner, but
there would have to be some way to handle cases where no MTD adds any utility (such as
random selection).
3.4.2 Selecting Optimal Defenses
Given that we have a tool that can evaluate the utility of any configuration, we can also
solve for the optimal selection of MTDs, given the constraints that the presence of each
MTD is a Boolean variable (either present or not) and that the sum of the costs of selected
MTDs be under a given budget. Formally, we can express this as:
Maximize U(m1,m2, · · · ,mn)
s.t.n∑i=1
(cost(Mi) ·mi
)≤ budget mi ∈ 0, 1 ∀i (3.5)
For the purpose of evaluating the framework and making the problem interesting, we
select a value for the budget (120) approximately halfway between 0 and the total cost of
deploying all available MTDs (i.e., 210). This choice ensured that a solution with utility
greater than 0 would be found and that approximately half the MTDs would be chosen
32
as part of the optimal solution. This example was solved using the Generalized Reduced
Gradient Non-linear algorithm [82] with random restarts to eliminate finding local maxima.
After solving, we obtain an optimal solution with the selected MTD highlighted graphically
in Figure 3.4 and full results shown in Table 3.3.
P1,1
P4,4
P5,5
P8,7
P10,8
P11,10
P(K1) P(K2 ) P(K3 ) P(K10 )P(K4 ) … P(K9 )…P(K8 )
P(W1) = 0.023 P(W2) = 0.063 P(W3) = 0.063 P(W4) = 0.25
P(S1) = 0.313U = 0.687
P2,1 P2,4 P2,5
P3,2 P3,3
P6,4 P6,10
P7,8P7,6
P9,5 P9,9
S1(SQL DB)
SQL Injection
Buffer Overflow
M1Service
RotationM4
IP Rotation(MOTAG)
M8ASLR
Knows(IP)Knows (application)
Knows(syscall_mapping)
OS Injection
Knows(OS)
M5OS Rotation
Eaves-dropping
Knows(path)Knows (instr_set)
Knows (stack_dir)
Knows(DBschema)
Knows (keyword)
M3SQLRand
M11Distraction
Cluster
M7Multivariant
Systems
M6Mutable
Networks
M2Intrusion-
Tolerant Sys
M9TALENT
M10Reverse Stack
Execution
Knows (mem_address)
Figure 3.4: Case Study Optimal Configuration
From here we can observe that our choice of a utility function forces the selection of
a variety of MTDs such that each weakness has at least one MTD affecting one of its
knowledge blocks and that protection is evenly spread over the 4 weaknesses.
Visually, we can also observe that an MTD with the ability to affect multiple knowledge
blocks is inherently more powerful than one that only affects one. However, if their cost is
too high or effectiveness too low, it will still not be chosen as part of an optimal solution.
Similarly, an MTD that only affects one knowledge block may be chosen if it is effective,
low-cost, or affects a knowledge block that still receives relatively weak protection from
other MTDs.
33
Table 3.3: Case Study Optimal Configuration
MTD Pi,j C Active? Pi,j (effective) C (effective)
M1 (Service Rotation) P1,1 0.500 15 Yes 0.500 15
M2 (Intrusion Tolerant Systems) P2,1 0.900 25 No 1.000 0P2,4 0.900 1.000P2,5 0.900 1.000
M3 (SQLRand) P3,2 0.300 20 Yes 0.300 20P3,3 0.300 0.300
M4 (IP Rotation/MOTAG) P4,4 0.900 25 No 1.000 0
M5 (OS Rotation) P5,5 0.700 15 No 1.000 0
M6 (Mutable Networks) P6,4 0.500 20 Yes 0.500 20P6,10 0.500 0.500
M7 (Multivariant Systems) P7,6 0.500 20 Yes 0.500 20P7,8 0.500 0.500
M8 (ASLR) P8,7 0.500 10 Yes 0.500 10
M9 (TALENT) P9,5 0.500 20 Yes 0.500 20P9,9 0.500 0.500
M10 (Reverse Stack Execution) P10,8 0.500 20 No 1.000 0
M11 (Distraction Cluster) P11,9 0.500 20 No 1.000 0
Knowledge: Total Cost 105Knows (1,application) 0.500 Total Budget 120Knows (1,keyword) 0.300Knows (1,DBschema) 0.300 Cost:Knows (1,IP) 0.500 High 25Knows (1,OS) 0.500 Medium 15Knows (1, syscall mapping) 0.500 Low 5Knows (1, Mem Address) 0.500Knows (1,stack dir) 0.500 Effectiveness:Knows (1,instr set) 0.500 High 0.3Knows (1,path) 0.500 Medium 0.5
Low 0.9Chance of attack success:SQL Injection 0.023OS Injection 0.063Buffer Overflow 0.063Easvesdropping 0.250
Chance of attacker success: 0.313Utility 0.687
3.5 Combining MTDs
However, one of the challenges in implementing the framework is determining the values of
Pi,j for each MTD and capturing the interactions between MTDs.
We present here experiments performed as a proof of concept to show how multiple
34
MTDs might be combined and their effects measured, along with results. Based on this
testbed, we can measure the attacker’s success rate in a scenario with actual attacks being
made against an MTD to observe if there are any interactions between MTDs and validate
future analytical results.
3.5.1 Experimental Testbed
Our experimental testbed uses the open-source Citrix XenServer1 environment to create and
manage all of our virtual machines. We created six instances of the Metasploitable VM2 for
our targets which contain a number of open vulnerabilities to test against. For our attack
platform, we used a separate server using Kali Linux3 which comes with the Metasploit
Framework, a popular penetration testing platform. Metasploit contains a variety of exploits
that can be scripted from the command-line interface, several of which are effective against
the unpatched version of the Apache web service that comes on Metasploitable. We also
created an independent process to monitor the web server from the point of view of a
legitimate user to see the MTD’s effect on system availability. A block diagram of the setup
is shown in Figure 3.5.
The scripted attack against the web server follows a straightforward attack pattern.
First, the attacker scans the network to obtain a list of all reachable IP addresses and open
ports on the network. If a VM appears with more than four open ports, it is determined to
be a candidate for attack. A more detailed scan is then performed against port 80 on each
candidate VM to identify the web service running on it. If the service is determined to be
the vulnerable Apache service, the script then configures and launches an attack against the
service. If the attack is successfully able to achieve a shell session, the attack is considered
a success. In each trial, we record the number of successes (out of six possible attempts).
We implemented service rotation by installing two additional web services on each VM:
1Available at https://xenserver.org/2Available at https://sourceforge.net/projects/metasploitable/3Available at https://www.kali.org/downloads/
35
Attacker
VM Host
VM
VM
VM
VM
VM
Monitor
VM
Figure 3.5: Experimental Setup for Combined MTD Experiments
lighttpd4 and NGINX5. These web services operate mutually exclusive of each other, but
are all capable of serving the same PHP content. A script on each VM independently
reconfigures the VM by stopping the current web service and starting a new one at a
exponentially distributed random interval. The downtime for changing services is relatively
short, and we tested settings with average interarrival rates of 120, 60, 30, and 10 seconds,
in addition to the static case.
We implemented IP rotation by making use of a feature in DHCP to assign new IP
addresses. The IP rotation script on each VM takes the interface down, uses a utility to
randomly change the MAC address, and brings the interface back up again. The DHCP
server, seeing a new MAC address, assigns a new IP address. DHCP still maintains all
address lease records, preventing duplicate IP addresses and re-using abandoned IP leases.
The process of rehoming an IP address is much longer than for a service, so we tested
4https://www.lighttpd.net/5https://www.nginx.com/resources/wiki/
36
settings with average interarrival rates of 120, 60, and 30 seconds, in addition to the static
case.
To prevent conflicts with the MTDs interfering with each other, both MTDs ran in sepa-
rate threads with a locking mechanism to prevent one MTD from starting a reconfiguration
while the other was still performing one. For each combination of settings, we performed
100 trials and measured attacker success using two different metrics: the average number
of successful exploits (out of six possible), and overall chance of attacker’s success, with the
number of times out of 100 the attacker was able to compromise at least one VM. The mon-
itoring node checks the web server every 0.1 sec and calculates availability as the number of
successful requests divided by the total number of requests. The monitor also records the
service running on the node to verify that each service accounts for approximately 33.3%
of the uptime.
3.5.2 Experimental Results and Observations
For service rotations. we measured the average number (out of six) VMs successfully com-
promised per trial and charted their distributions on a histogram, shown in Figure 3.6. As
we can see, in the case where the interarrival rate = 120 sec, the distribution centers around
two VMs, which is similar to what would be expected of a static case with two vulnerable
VMs out of six available. However, as the reconfiguration rate increases, the distribution
moves to the left as more trials result in fewer VMs compromised.
Two charts showing the overall results from the monitor process are shown in Fig-
ures 3.7a and 3.7b. We observe that each chart shows each service had on average the same
uptime as the others, and that the trials with an average interarrival rate of 60 seconds
had a higher overall availability than those with an average interarrival rate of 10 seconds.
Other results from the monitor process for other interarrival rates showed similar results in
the average share of each web service.
Figures 3.8a and 3.8b show the overall attacker success rate (defined as the probability
that one or more VMs are compromised) and availability for each setting of the individual
37
0
5
10
15
20
25
30
35
40
45
50
0 1 2 3 4 5
Co
un
t
Number of Successful Attacks
Interrarival rate = 10 sec Interrarival rate = 30 sec
Interrarival rate = 60 sec Interrarival rate = 120 sec
Figure 3.6: Histogram of Number of VMs Compromised for Service Rotation
31.36%
33.31%
34.44%
0.89%
Apache2
Nginx
Lighttpd
Unavailable
(a) Interarrival Rate = 60 sec
31.48%
30.95%
32.97%
4.60%
Apache2
Nginx
Lighttpd
Unavailable
(b) Interarrival Rate = 10 sec
Figure 3.7: Monitor Results for Service Rotation
38
MTDs. We note that in both cases, the MTDs are effective at reducing the attacker’s success
rate. Service rotation reduces attacker’s success rate to 54%, while IP rotation reduces it
to 29%. However, for each increase in reconfiguration rate, there is also a corresponding
decrease in availability. Service rotation reduces availability as low as 95.4%, while IP
rotation reduces it as low as 74.3%.
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Static IAR = 120 IAR = 60 IAR = 30 IAR = 10
Attacker Sucess Rate Availability
(a) Service Rotation
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Static IAR = 120 sec IAR = 60 sec IAR = 30 sec
Attacker Sucess Rate Availability
(b) IP Rotation
Figure 3.8: Attacker Success Rate and Availability for Service and IP Rotation
Figures 3.9a and 3.9b show the same data while directly comparing values for the two
MTDs side by side. Here we also observe that while IP rotation is much more effective than
service rotation, it also has the drawback of a far greater reduction in availability.
Complete data for each combination of MTD settings is shown in Tables 3.4 and 3.5.
Here we observe that in most cases, the earlier trends of attacker success rate and availabil-
ity both dropping as reconfiguration rates increase hold true here as well. However, we also
observe that when both MTDs are active, the attacker success rate and availability may
not change, or even rise. This may be due to large confidence intervals or the fact that the
locking mechanism that prevents both MTDs from reconfiguring at once means that the
two MTDs are not truly independent. For example, IP rotation is observed to be the more
39
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Static IAR = 120 IAR = 60 IAR = 30 IAR = 10
Service Rotation IP Rotation
(a) Attacker Success Rate
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Static IAR = 120 IAR = 60 IAR = 30 IAR = 10
Service Rotation IP Rotation
(b) Availability
Figure 3.9: Comparison of Service and IP Rotation on Attacker Success Rate and Avail-ability
expensive operation, costing several seconds of downtime for each reconfiguration and im-
pacting availability accordingly. But if there are more frequent service rotations occurring,
the costly IP rotation operations might be delayed, reducing the overall effective reconfig-
uration rate and lessening the effect on attacker success rate and availability. An area for
future research would be to predict these delays and compute the effective reconfiguration
rate, along with more experimental trials to validate that analysis and reduce the observed
error.
Table 3.4: Attacker Success Rates for all Combinations of Interarrival Rates
IP RotationStatic 120s 60s 30s
ServiceRotation
Static 1.000±0.000 0.880±0.064 0.610±0.096 0.290±0.089120s 0.860±0.068 0.640±0.094 0.550±0.098 0.220±0.08160s 0.900±0.059 0.550±0.098 0.430±0.097 0.140±0.06830s 0.750±0.085 0.550±0.098 0.490±0.098 0.150±0.07010s 0.540±0.085 0.280±0.088 0.210±0.080 0.110±0.061
40
Table 3.5: Availability for all Combinations of Interarrival Rates
IP RotationStatic 120s 60s 30s
ServiceRotation
Static 1.000±0.000 0.915±0.003 0.819±0.005 0.743±0.005120s 0.997±0.000 0.909±0.002 0.838±0.003 0.692±0.00360s 0.991±0.001 0.793±0.003 0.782±0.003 0.647±0.00330s 0.981±0.001 0.855±0.002 0.832±0.003 0.711±0.00310s 0.954±0.001 0.807±0.003 0.761±0.003 0.710±0.003
3.6 Conclusions
We have introduced an MTD quantification framework that has the potential to model
numerous classes of MTDs. However, the challenges of fully realizing this framework are
twofold: for each MTD, we must determine the probability of knowledge disruption Pi,j
individually, and we must also model the MTD cost. Cost could be defined as actual cost
of acquisition or in terms of overhead or performance cost.
We have already shown as an experimental proof of concept how the effectiveness of
multiple MTDs might be measured, along with their effect on system availability. This is
explored further in the next chapters where we present an analytical model that can be
used to predict the effectiveness of MTDs that perform periodic reconfigurations, as well as
their impact on system availability and response time, as response time is a frequently used
metric when determining performance cost [73,81].
41
Chapter 4: Performance Modeling of Moving Target
Defenses
4.1 Introduction
To determine the individual effectiveness and cost of an MTD in our quantification frame-
work, we perform a mathematical analysis of the MTD in question to predict its perfor-
mance. While this may vary from technique to technique, in this chapter we present a model
that can accurately predict the effects of MTDs that are based on periodic reconfigurations
and represent a wide class of the available techniques in the literature.
We consider here a reconfiguration scheme with multiple identical resources serving
requests. Occasionally, at random intervals, we reconfigure those resources in some way.
We use Continuous Time Markov Chains (CTMC) to determine the probability distribution
of the number of resources that are being reconfigured and then use this distribution to
determine the probability distribution of the number of service requests in the system. This
distribution is then used to compute the average number of service requests in the system
and the average response time. The distribution of resources being reconfigured can also
be used to compute the average age of a resource (i.e., the average time between the last
reconfiguration event and the next). This interval is used to determine the probability that
an attacker succeeds during that time.
In our simulations and experiments, we also observe a major difference between predicted
steady-state response times and actual response times based on periodic states of instability
and introduce a metric that quantifies that phenomenon, and introduce an optimization
method that find the optimal reconfiguration rate subject to minimum stability and security
requirements.
42
4.2 Quantitative Analysis of MTDs
The computing environment we consider in this chapter consists of c similar resources (e.g.,
VMs) available to serve incoming service requests that arrive at an average rate λ, join a
single queue, and are served by any of the available resources, with an average service time
S. A generic MTD technique consists in each resource occasionally, at random intervals,
reconfiguring itself independently of the other resources. Thus, each resource handles service
requests as well as reconfiguration requests. While a resource is being reconfigured, it is not
available to handle service requests. Without reconfigurations, the system behaves exactly
like an G/G/c queue [83].
Now, assume that each resource is reconfigured at an average rate of α. As an example,
a reconfiguration could entail swapping out a VM with a clean instance, similar to how
SCIT [84] operates: the VM then comes back online with a new IP address, implementing
a form of IP hopping. These reconfigurations make it more difficult for an attacker to learn
about the VMs, and disrupt attacker’s persistence in the system. The attacker’s success
probability is a function of the average reconfiguration rate. The reconfiguration rate α
also affects the average number of resources available to serve requests (see Figure 4.1).
Reconfigurations reduce resource availability and ultimately increase queuing and response
time.
While these qualitative tradeoffs are intuitive and not surprising, there is a need for
quantitative models for determining the impact of the reconfiguration rate on resource
availability, response time of service requests, and attacker’s success probability. We use
Continuous Time Markov Chains (CTMC) to compute the probability distribution of the
number of resources being reconfigured as a function of α and other parameters and then use
that distribution to determine resource availability and response time, among other metrics.
Markov chains have been used for many decades to study various aspects of computer and
communication systems. The novelty in each case is in how the state of a CTMC should
be defined to represent the system to be analyzed.
Figure 4.2, the framework for our analytic models, shows the reconfiguration model R
43
beingreconfigured
inusebyaservicerequest
availableforuse
λ
Figure 4.1: Queuing Representation of the Reference Scenario
at the top and, at the bottom, the performance model S. The reconfiguration model takes
as inputs the rate α at which resources are reconfigured, the average reconfiguration time
S, and the number of resources c, and produces as outputs the availability of resources, the
average number of resources available, and the probability distribution pk of the number
of available resources. This distribution, along with the number of resources, the average
arrival rate of service requests, and the average service time of requests are inputs to the
performance model, which produces the probability distribution Pk of the number of
service requests in the system and the average response time of requests.
We analyze this generic MTD in three steps: (i) analysis of the effect of the reconfigu-
ration rate α on the probability distribution of available resources; (ii) analysis of the effect
of that availability on response time; and (iii) determination of the attacker’s probability of
success based on reconfiguration rate.
4.2.1 Reconfiguration Model
Figure 4.3 helps explain the basic equations that govern an MTD process. Table 4.1 sum-
marizes the names and descriptions of all variables defined here.
44
Reconfiguration
Model (R)
Performance
Model (S)
Reconfiguration rate
Reconfiguration time
Number of resources
probability
distribution of
number of
available resources
Availability
Avg. no. available resources
Request arrival rate
Request avg. service time
Avg. response time
Figure 4.2: Analytic Model Framework
Table 4.1: Summary of Variable Names and Descriptions
Variable Description
Ps(t) Probability that t time units are needed for a successful attack
Ts Time needed for an attacker to succeed. Ps(Ts) = 1
c Number of resources
c Average number of resources not being reconfigured
N Average number of resources being reconfigured
α Reconfiguration rate (in rec/sec)
S Average time to reconfigure a resource
XReconfiguration throughput, i.e., the aggregate rate at which resourcesstart a reconfiguration operation
λ Average arrival rate of requests to use a resource
T Average time a request spends using a resource
R Average response time of requests
45
Consider that there are c resources (e.g., VMs) that are reconfigured at regular time
intervals. Each resource cycles through a period in which it is available for use and a period
in which it is being reconfigured (see Figure 4.3). Resources are reconfigured independently
of one another at a rate of α reconfigurations per time unit. Thus, the average time a
resource is available for use between the end of a reconfiguration operation and the start
of the next is 1/α. We refer to this as the average age of a resource, which is our primary
security metric used in determining the likelihood of attacker’s success, described later in
Section 4.2.3.
Let c be the average number of resources available for use (i.e., not being reconfigured)
and N be the number of resources being reconfigured. Thus,
c = c+ N (4.1)
Applying Little’s Law [83] to the set of resources we obtain
c = X × (1/α) (4.2)
where X is the system’s reconfiguration throughput (or throughput for short), i.e., the
aggregate rate at which resources complete their reconfiguration, which is the collective
rate at which resources are reconfigured.
Let S be the average time it takes for a resource to complete the reconfiguration pro-
cess. For example, a reconfiguration process could include the time to complete all running
transactions in progress at a server, changing its configuration file, shutting down the server,
and re-booting it. Applying Little’s Law [83] to the set of resources being reconfigured, we
obtain:
N = X × S (4.3)
46
Adding Eqs. 4.2 and 4.3 and combining the result with Eq. 4.1 we obtain:
c = c+ N = X(S + 1/α) (4.4)
We can rewrite Eq. 4.4 in order to express the reconfiguration rate α as a function of
the number of resources c, the time to reconfigure S, and the throughput X:
α = X/(c− S ×X) (4.5)
c
1...
resources
reconfigura,onprocess
α
α
X
N
cS
Figure 4.3: Reconfiguration Cycle
We use the CTMC of Figure 4.4 to compute X. The state k (k = 0, · · · , c) in this
CTMC represents the number of resources in the reconfiguration box of Figure 4.3. Thus,
the number of available resources is c− k.
An expression for pk (k = 0, · · · , c) is obtained by using the general birth-death equation
47
0 1 2 k c-1c-2 c... ...
αc α(c-1) α(c-k+1) α(c-k) 2 α α
1/S 2/S k/S (k+1)/S (c-1)/S c/S
Figure 4.4: CTMC for the Reconfiguration Model
for Markov Chains [83]:
pk = p0
k−1∏i=0
γiµi+1
k = 1, · · · , c (4.6)
p0 =
[1 +
c∑k=1
Πk−1i=0
γiµi+1
]−1
(4.7)
where γk = α · (c − k) for k = 0, · · · , c − 1 is the aggregate rate at which resources are
reconfigured when there are k resources being reconfigured and µk = k/S for k = 1, · · · , c
is the aggregate rate at which resources complete their reconfiguration when there are k
resources being reconfigured. Using the expressions for γk and µk in Eqs. 4.6 and 4.7 we
obtain
pk = p0
k−1∏i=0
α · (c− i)(i+ 1)/S
= p0 · (α · S)k
ck
k = 1, · · · , c (4.8)
An expression for p0 is obtained by noting that the sum of all probabilities is equal to 1.
Thus,
p0 =
1 +
c∑k=1
(α · S)k
ck
−1
(4.9)
48
The values of pk can be easily computed because the summation needed to compute p0
is finite. Given pk and p0 one can then compute the average throughput X as
X =c∑
k=1
(k/S) · pk =1
S
c∑k=1
k · pk (4.10)
The average number of available resources can now be computed by combining Eqs. 4.2
and 4.10:
c =X
α=
1
α · Sc∑
k=1
k · pk (4.11)
The availability A of the set of resources is then given by the fraction of resources
available for use, i.e.,
A = c/c = X/(α · c) (4.12)
It turns out that the availability does not depend on the number of resources but only
on the product of the reconfiguration rate and the reconfiguration time. This can be seen
by combining Eqs. 4.2, 4.4, and 4.12.
A = c/c = (X/α)/c = (1 + α · S)−1 (4.13)
When there is no reconfiguration (i.e., α = 0), the availability is 1 as expected. Eq. 4.13
can be used to determine the values of the product α·S necessary to guarantee an availability
greater than or equal to some value Amin.
α · S ≤ 1
Amin− 1 (4.14)
49
4.2.2 Response Time Model
The c resources are used for some computational purpose and requests to use any of them
arrive at a rate of λ requests per unit of time and are served by any of the available
resources. If no resources are available, a request has to wait in a queue. The number of
resources available for service varies from 0 to c due to reconfiguration (see Figure 4.1). The
probability that k resources are available for service is given by the probability pc−k that
c− k resources are being reconfigured (see Eq. 4.8).
We use the CTMC of Figure 4.5 with an infinite number of states, where a state k =
0, 1, 2, · · · represents the number of service requests in the system, either using one of the
available resources or waiting for one. Note that the system of Figure 4.1 is similar to an
M/M/c queuing system with an important difference. In an M/M/c model, the rate at
which transactions complete is kµ for k = 1, · · · , c and cµ for k > c where µ is the average
rate at which a request completes from a resource. In our case, as explained above, the
transaction completion rate has to consider that resources may be in the process of being
reconfigured. Thus, we follow an approach similar to the development of the results for
the M/M/c queue [83], with a modification in the average transaction completion rate.
Consider the following additional notation:
• Pk: probability that there are k requests in the system (either being serviced or in
the waiting line).
• πj : probability that j resources are available for use (i.e., not being reconfigured),
thus πj = pc−j .
• µδk: average request departure rate at state k. The value of δk is
δk =
k∑j=1
j πj k = 1, · · · , c (4.15)
because the departure rate is µ if only one resource is available (which happens with
50
probability π1), 2µ if only two resources are available (which happens with probability
π2), · · · , and kµ if k resources are available (which happens with probability πk). Note
that δc =∑c
j=1 j πj = c.
• µ = 1/T : average service rate of each resource.
• ρ = λ/(µ c): average utilization of the resources.
0 1 2 c k+1k... ...
λ λ λ λ λ
µδ1
...
µδ2 µδc µδc µδc
Figure 4.5: CTMC for the Response Time Model
As Figure 4.5 shows, the transition rate from state k to k + 1 is λ, the average request
arrival rate, and the transition rate βk from a state k to state k − 1 is given by
βk =
µ δk k < c
µ δc k ≥ c(4.16)
We can now use the generalized birth-death equations (see Eqs. 4.6 and 4.7) to solve for
Pk and P0. We have to break down the expression for Pk into two parts (for k = 1, · · · , c
and k > c) because βk has two expressions. Hence,
Pk = P0 Πk−1i=0
λ
µ δi+1= P0
(λµ
)kΠk−1i=0 δi+1
k = 0, · · · , c (4.17)
51
and
Pk = P0 Πc−1i=0
λ
µ δi+1Πk−1i=c
λ
µ δc= P0
ρk cc
Πc−1i=0δi+1
k = c+ 1, · · · (4.18)
P0 can now be computed as
P0 =
1 +
c∑k=1
(λµ
)kΠk−1i=0 δi+1
+
∞∑k=c+1
ρk cc
Πc−1i=0δi+1
−1
(4.19)
If we move cc/Πc−1i=0δi+1 out of the infinite summation in the above expression, we are left
with the following geometric series, which converges for ρ < 1:
∞∑k=c+1
ρk =ρc+1
1− ρ (4.20)
Note that ρ < 1 is a necessary but not sufficient condition for the system to be stable
as discussed is Section 4.4.2. P0 can now be written as the following easily computable
expression:
P0 =
1 +c∑
k=1
(λµ
)kΠk−1i=0 δi+1
+cc
Πc−1i=0δi+1
ρc+1
1− ρ
−1
(4.21)
The average number of requests in the system can be computed as
Ns =
∞∑k=1
k · Pk =
c∑k=1
k · Pk +
∞∑k=c+1
k · Pk (4.22)
52
The infinite summation in Eq. 4.22 can be written as
P0cc
Πc−1i=0δi+1
[ ∞∑k=c+1
k ρk
]= P0
cc
Πc−1i=0δi+1
[ρ∂
∂ρ
∞∑k=c+1
ρk
]
and is equal to
P0cc
Πc−1i=0δi+1
ρc+1
1− ρ
[ρ
1− ρ + 1 + c
](4.23)
Thus, Eqs. 4.22 and 4.23 allow to compute Ns. Finally, using Little’s Law we compute
the average response time R as R = Ns/λ.
4.2.3 Analysis of Attack Success Probability
The time required for an attacker to acquire sufficient knowledge during the reconnaissance
phase is very helpful in the determination of the attacker’s success. We define the probability
that an attacker succeeds as a function of the time available to complete the reconnaissance
phase. The probability Ps(t) that an attacker succeeds to succeed in attacking a resource
in t time units is important in determining the required reconfiguration rate, i.e., the rate
at which resources needs to be reconfigured.
Figure 4.6 shows two examples of Ps(t): linear and exponential functions. The linear
function, Ps(t) = t/Ts, indicates that the probability of attack success increases linearly with
time and reaches 1 (i.e., success) at time Ts. The exponential function (see for instance
Eq. 4.24) indicates a situation in which the attacker initially accumulates knowledge at a
low rate and then becomes exponentially more knowledgeable over time and succeeds at
time Ts.
Ps(t) = 1− 1− e(t−Ts)
1− e−Ts . (4.24)
As an example, consider an IP sweep combined with a port scan, where the attacker’s
53
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 1 2 3 4 5 6 7 8 9 10
Ps
Time (sec)
Linear Ps Exponential Ps
Figure 4.6: Probability of Success Ps vs. Time for Ts = 10
goal is to discover the IP address of the machine running a specific service within target
network. The attack consists in sequentially scanning all IP addresses in a given range.
Assuming an IP space of n addresses and that t∗ time units are required to scan a single IP,
we obtain Ts = n ·t∗ and Ps(t) = tTs
= tn·t∗ . As another example, consider the following DoS
attack. The attacker initially compromises n hosts, which takes t∗ time units. Then, each
of the newly compromised hosts compromises additional n hosts, which takes additional
t∗ time units. At any given time t, the total number of compromised hosts, including the
attacker’s machine, is N(t) = 1 + n + n2 + . . . + nk = 1−nk+1
1−n , where k = bt/t∗c. We
can assume that the attacker’s success probability is proportional to the aggregate amount
of flood traffic that compromised hosts can send to the victim, compared to the victim’s
capacity to handle incoming traffic. Let V denote the volume of traffic the victim can
handle per time unit and let v denote the amount of traffic each compromised node can
send per time unit. Then,
Ps(t) = min
1,N(t) · vV
= min
1,v
V· 1− nbt/t∗c+1
1− n
54
4.3 Simulation and Experimental Testbed
Our analytical results were validated by simulation and by experiments that implemented a
generic MTD. We implemented the simulation using SimPy1, a process-based discrete-event
simulation framework based on standard Python. SimPy supports multiple processes that
contend for access to a resource and automatically handles queuing of events if a resource
is busy, making it ideal for our purposes. Additionally, we used SimPy as a real-time event
generator to control VM reconfigurations and implement a fully operational MTD. For our
VM environment, we used Citrix’s open-source XenServer platform2, which offers pooling of
resources, the ability to quickly clone VMs for reconfiguration, and a command-line interface
that is compatible with our simulation framework.
Our MTD controller runs on a separate server and starts an independent process for
each VM – either a simulated VM or an actual VM in the XenServer pool – that generates
reconfiguration requests. Our experiments show that S is normally distributed, so we used
a normal distribution for S with the same mean and standard deviation as was observed in
the experiments.
Reconfigurations may consist of a number of possible actions, including changing the
IP address or software. In our experiments, we remove a VM instance from the virtual
network and replace it with a fresh copy, similar to how SCIT operates [84]. The fresh copy
also has a new IP address obtained from DHCP, enabling a basic IP-hopping scheme. The
reconfiguration process also collects statistics such as any possible sources of internal delays
within the implementation.
The MTD controller also serves as a traffic generator that creates service requests. Each
service request is an independent process with exponentially distributed interarrival times
with an average arrival rate equal to λ and average service time T in the simulations.
For the experiments, an HTTP request is sent to an idle VM, which has a scripted delay
on the HTTP response with average time T to simulate the time to process a generic
1Available at https://simpy.readthedocs.io/en/3.0/.2Available at https://xenserver.org/
55
service request. Each process records the time at which it was generated, began service,
and completed according to the environment’s internal clock. These records are used to
compute queue time, service time, and response time and are maintained for each request.
We also collect statistics from a separate monitor process that operates at set intervals
to gather information about the number of resources idle, in use, and being reconfigured,
as well as the current queue length. An overview of the system and processes is shown in
Figure 4.7.
. . .
VM1
VMc
VM2MTD Controller /
Traffic Generator
a)
b)
c)
Figure 4.7: Experimental Setup: a) c independent processes to generate reconfigurationrequests (arrival rate α), b) process to generate independent service requests (arrival rateλ), c) monitor process (every 0.01 sec)
The pool of VMs is tracked using three separate states for the VMs: idle, in use (i.e.,
serving a request), or being reconfigured. All requests for a VM must first acquire a shared
resource that gives them access to the pool of idle VMs. A priority queue is used, giving
priority to reconfiguration requests so that reconfiguration is not unnecessarily delayed;
however, reconfiguration requests for a specific VM will not preempt a request currently
being served. Instead, the reconfiguration request flags that VM for reconfiguration and
56
then releases its lock on the idle pool before waiting for that resource to appear in the pool
of VMs to reconfigure. When service requests receive access to the idle pool, they remove
a random VM from the pool and place it in the pool of VMs in use. Once completing the
request, if that VM is flagged for reconfiguration, it is placed in the reconfiguration pool
where the reconfiguration request will pick it up for reconfiguration, otherwise it is placed
back in the idle pool. In the event that a service request finds no VMs in the idle pool, it
waits for one to appear. This additional wait is included in the overall queue time. The
overall flow of control and VM state transitions is shown in Figure 4.8.
d)
e)
b)
a)VM Pools
c)
VM Movement
Requests
f)
Figure 4.8: Control Flow and VM Movement: a) incoming requests, b) priority queue, c)resource lock on idle pool, d) idle VM pool, e) VMs in use, f) reconfiguring VMs
Each iteration of the simulation lasted 6,000 seconds, with no statistics recorded in the
first 1,000 seconds to allow the system to achieve steady-state. Thirty runs were performed
for values of α from 0.001 to 0.050 to obtain the mean, standard deviation, and 95%
confidence intervals for the mean for each statistic. For the experiments, each run is limited
to 600 seconds with statistics recorded after the first 60 seconds for select values of α. The
values of the other input parameters used in the simulations and experiments are given in
57
Table 4.2.
Table 4.2: Values of Variables used in Numerical Results
Variable DescriptionTs 300 secc 20α from 0.001 to 0.050 rec/secS 120 secλ 10 requests/secT 0.5 sec
4.4 Numerical Results and Validation
This section presents several numerical results starting with those obtained from the re-
configuration model along with validation of the model. We then cover the performance
model, including some interesting findings from our implementation of the model. Finally,
we show how the two models can be used to find an optimal value of the reconfiguration
rate that considers tradeoffs between response time and security.
4.4.1 Reconfiguration Model
Figure 4.9 shows the distribution of the number k of resources being reconfigured for four
values of the reconfiguration rate α, out of a total of 20 resources. The graphs show that
as α increases from 0.005 to 0.04 rec/sec, the probability distribution moves to the right.
The average number of resources being reconfigured is 7.50 for α = 0.005 rec/sec, going up
to 16.55 for α = 0.04 rec/sec.
The reconfiguration probabilities pk are used to compute the availability. Figure 4.10
shows three availability curves as a function of the reconfiguration rate α for values of the
reconfiguration time S equal to 60, 90, and 120 seconds, respectively. As the reconfiguration
rate increases, the availability decreases in a non-linear fashion and, as the reconfiguration
time increases, the availability decreases for the same value of α. As the reconfiguration
rate tends to zero, the availability tends to 1 because all resources are available for use.
58
0.00
0.05
0.10
0.15
0.20
0.25
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Pro
ba
bil
iity
# of Resources Reconfiguring
α = 0.005 α = 0.01 α = 0.02 α = 0.04
Figure 4.9: Distribution of the Number of Resources Being Reconfigured for c = 20
Note that, as we indicated in Section 4.2, the availability does not depend on the number
of resources c.
We validated the analytic model using the simulation and experiments, using the mean
and standard deviation of the reconfiguration times measured in the experiments. We find
that the probability distribution generated by the simulation closely matches that of the
analytical model, as seen in Figure 4.11.
The theoretical availability results match very well the results obtained by the simula-
tions and experiments as shown in Table 4.3, which indicates that for the same range of
values of α used in Figure 4.10, the percentage absolute relative error between the model and
the simulation does not exceed 2.29% and the error between the model and the experimental
does not exceed 9.62%.
59
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0
01
0.0
03
0.0
05
0.0
07
0.0
09
0.0
11
0.0
13
0.0
15
0.0
17
0.0
19
0.0
21
0.0
23
0.0
25
0.0
27
0.0
29
0.0
31
0.0
33
0.0
35
0.0
37
0.0
39
0.0
41
0.0
43
0.0
45
0.0
47
0.0
49
Av
ail
ab
ilit
y
α (rec/sec)
S = 60 S = 90 S = 120
Figure 4.10: Availability vs. Reconfiguration Rate α
0.00
0.05
0.10
0.15
0.20
0.25
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Pro
ba
bil
ity
# of Resources Reconfiguring
Analytical Simulation
Figure 4.11: Comparison of Number of Resources being Reconfigured (α = 0.02 rec/sec)
60
Table 4.3: Comparison of Availability Results.
α Model Simulation Results Absolute Experimental Results AbsoluteResults ± 1/2 of 95% CI % Error ± 1/2 of 95% CI % Error
0.005 0.696 0.694 ± 0.004 0.29 0.686 ± 0.015 1.460.010 0.467 0.466 ± 0.004 0.21 0.465 ± 0.017 0.430.015 0.329 0.330 ± 0.003 0.30 0.318 ± 0.017 3.460.020 0.253 0.255 ± 0.003 0.78 0.247 ± 0.011 2.430.030 0.171 0.175 ± 0.002 2.29 0.156 ± 0.007 9.620.040 0.132 0.133 ± 0.002 0.75 0.121 ± 0.007 9.09
4.4.2 Response Time Model
We now summarize our results about the response time and then explain them in detail.
The key conclusions are: (i) the response time model closely matches the simulation results
for a range of values of α for which the system is stable most of the time (we formally
define stability later); (ii) for larger values of α there is a high variation of the utilization
around its average ρ = λ T/c, which causes the system to become unstable (i.e., ρ > 1) for
non-negligible fractions of time: as a consequence, the queue and the response time grow
infinitely.
As indicated in Section 4.2, ρ must be less than 1. As ρ tends to 1, the response time
tends to infinity because of the term (1−ρ) that appears in the denominator of the expression
of the average number of requests in the system. Because λ and T are assumed constant in
this section (λ = 10 requests/sec and T = 0.5 sec), the variation of ρ depends on c, which
decreases with the availability, which in turn decreases as α increases (see Table 4.3). Thus,
ρ < 1⇒ c > λT ⇒ c > 5.
Before we present the variation of the response time as a function of α, it is instructive
to compare graphs of run-time data captured for α = 0.005 and α = 0.015 in Figures 4.12a
and 4.12b respectively. From this data, we observe a higher coefficient of variance (COV)
for the number of available resources and for the response time. For α = 0.005, the COV
for available resources is 0.123 and for response time is 1.007; but for α = 0.015, these
values go up to 0.287 and 1.676, respectively. More importantly, we also notice that in both
61
0
2
4
6
8
10
12
14
16
18
0
2
4
6
8
10
12
14
16
18
# o
f A
va
ila
ble
Re
sou
rce
s
Re
spo
nse
Tim
e (
sec)
Time (sec)
Response Time # Available Avg # Available
(a) α = 0.005
0
2
4
6
8
10
12
14
16
18
0
2
4
6
8
10
12
14
16
18
# o
f A
va
ila
ble
Re
sou
rce
s
Re
spo
nse
Tim
e (
sec)
Time (sec)
Response Time # Available Avg # Available
(b) α = 0.015
Figure 4.12: Number of Available Resources and Response Time for Two Trials with Dif-fering Values of α
cases, c > 5 as denoted by the dashed line in the graphs, thus so ρ < 1. However, for α
= 0.015, there are periods of time where there are 5 or fewer resources available, causing
a spike in response time. During these periods, ρ ≥ 1 and the queue of service requests
grows infinitely. Furthermore, even as the number of available resources returns above the
minimum required, there is a lagging effect on response times returning to normal as there
are built-up service requests in the queue. Thus, a metric such as ρ alone does not capture
well the effect of episodic instability. To better quantify this effect, we introduce a metric
ω, which we call stability, defined as the fraction of time the system is in a stable state (i.e.,
ρ < 1):
ω =∑k∈N
pk (4.25)
where N = k ∈ 0, 1, · · · , c ∧ λT/(c − k) < 1. Because ω depends on the probabilities
pk, it is a function of α, and we use ω(α) to denote that relationship. Then, a system is
stable for a given set of parameters if ω(α) ≈ 1 because it is almost never in a situation
where ρ > 1. The algorithm to compute ω(α) is listed below.
62
Algorithm 2 ComputeStability(c, λ, T, pk)Input: Resource count c, arrival rate λ, service time T , probability distribution pkOutput: Stability ω
1: ω ← 02: for k = 0→ c do3: if λT/(c− k) < 1 then4: ω ← ω + pk5: end if6: end for7: return ω
Figure 4.13 shows ω superimposed over response time results obtained through simula-
tion and from the analytic model. As we can see, for low values of α, ω is very close to 1 and
the simulation matches the analytic results. As α increases, we observe that the response
time is very sensitive to small decreases in stability. When α = 0.015 rec/sec, ω = 0.775,
which means that the system is unstable 22.5% of the time with requests rapidly building
up in the queue, causing a higher than expected value and variance in the response time.
A possible solution is to limit the number of resources reconfiguring at any one time to
ensure that there are sufficient resources available to handle the expected workload, which
is discussed in the next chapter.
4.4.3 Optimal Reconfiguration Rate
The model presented in Section 4.2 allows one to answer a variety of “what-if” questions
such as “How does the resource availability vary with the time needed to reconfigure a re-
source?” or “How does the average response time of service requests vary with the average
reconfiguration rate?” Additionally, one can solve optimization problems such as maximiz-
ing the reconfiguration rate subject to the following constraints: (i) the stability must be
greater than or equal to a threshold ωmin, and (ii) the average response time must be less
than or equal to a threshold Rmax. More precisely,
Maximize α
s.t. ω ≥ ωmin and R ≤ Rmax
63
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0
01
0.0
02
0.0
03
0.0
04
0.0
05
0.0
06
0.0
07
0.0
08
0.0
09
0.0
1
0.0
11
0.0
12
0.0
13
0.0
14
0.0
15
0.0
16
0.0
17
0.0
18
0.0
19
0.0
2
0.0
21
0.0
22
0.0
23
0.0
24
Re
spo
nse
Tim
e (
sec)
Sta
bil
ity
α (rec/sec)
Stability Simulation Analytic
Figure 4.13: Response Time: Simulation vs. Analytical Model with Stability
Because the stability decreases monotonically with α and the response time R increases
monotonically with α (see Figure 4.13), the maximum feasible value αmax of α is
αmax = min(αω, αR) (4.26)
where
αω = argmaxαω ≥ ωmin
αR = argmaxαR ≤ Rmax (4.27)
Consider Figure 4.14 and S = 60 sec, c = 20, ωmin = 0.9, and Rmax = 0.75 sec.
Then, αω = 0.023 rec/sec to satisfy the stability constraint. However, in order to satisfy
the response time constraint, αR = 0.036 rec/sec as illustrated in Figure 4.14. Therefore,
64
α ≤ min (0.023, 0.036) = 0.023 rec/sec. This means that each resource will be available, on
average, for 1/α = 1/0.023 = 43.5 seconds before it is reconfigured.
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
Re
spo
nse
Tim
e (
sec)
Sta
bil
ity
α (rec/sec)
Stability Response Time
Rmax
αR
ωmin
αω
Figure 4.14: Optimization Analysis to Find the Maximum Feasible Reconfiguration Rate(α) for c = 20 and S = 60 sec
We now consider the interplay of the maximum reconfiguration rate α = 0.023 rec/sec
obtained in the optimization example above and the probability of a successful attack.
Consider a linear function for the probability Ps(t) with Ts = 100 sec. In that case, the
probability that an attacker succeeds after 43.5 sec is 43.5/100 = 0.435. However, if the
knowledge accumulation rate has the exponential form of Eq. 4.24, the probability of a
successful attack at t = 43.5 is
Ps(43.5) = 1− 1− e(43.5−100)
1− e−100≈ 0 (4.28)
In other words, the optimal reconfiguration rate yields a relatively large probability
65
that the attacker is successful if knowledge can be accumulated linearly but a close-to-zero
probability of success when knowledge is accumulated at very slow pace initially.
4.5 Conclusions
The results presented in this chapter are encouraging and indicate this is a promising re-
search direction. However, as mentioned earlier, a possible solution to address the instability
issue is to limit the number of resources reconfiguring at any one time to ensure that there
are sufficient resources available to handle the expected workload.
Two potential policies to enforce a minimum number of available resources are described
in the following chapter, followed by a mathematical analysis of both, with simulations and
experiments to validate the updated model. We also define a utility function that combines
the trade-off between response time and attack success probability and use it to find the
optimal reconfiguration rate that maximizes the system’s utility.
66
Chapter 5: Performance Modeling of Moving Target
Defenses With Reconfiguration Limits
5.1 Introduction
In the previous chapter, we used Continuous Time Markov Chains (CTMC) to predict the
distribution of number of available resources and corresponding response time for MTDs
that involve periodic reconfiguration that removes resources from use. Our simulations and
experiments validated part of this analysis but we also observed response times trending
upward far sooner than initially anticipated due to the system being periodically overloaded,
despite having enough resources on average to meet the expected workload.
In this chapter, we introduce two possible policies that enforce a minimum number of
available resources, followed by a mathematical analysis of both, again using CTMCs, and
results of simulations and experiments that validate the updated model. We also introduce a
utility-based method to allow users to control the trade-off between availability and security
and determine the optimal reconfiguration rate.
5.2 Updated Analytic Model Overview
To ensure that there is always a minimum number of resources available to handle service
requests, we consider policies that limit the maximum number c∗ of resources being re-
configured. If c∗ resources are being reconfigured, additional reconfiguration requests may
either be dropped (drop policy) or queued (wait policy). We analyze this generic MTD in
three steps: (i) analysis of the effect of the reconfiguration rate α on the probability distri-
bution of available resources; (ii) analysis of the effect of that availability on response time;
67
and (iii) calculation of the effective reconfiguration rate and determination of the attacker’s
probability of success.
Our analytic model is derived from the queuing representation shown in previous chapter
in Figure 4.1. As seen in Figure 5.1, the model is similar to the model shown previously.
However, the reconfiguration model and performance model also now take as an additional
input the maximum number of resources c∗ that can be reconfigured at the same time.
The reconfiguration and performance models are solved using CTMCs as explained
in the next two sections and there is also a new middle section to the model used to
iteratively combine the results of the reconfiguration and performance models as explained
in Section 5.5. Table 5.1 contains the names and descriptions of all variables, including new
variables used in our analysis of the two new reconfiguration policies.
Reconfigura+onModel(R)
PerformanceModel(S)
Targetreconfigura+onrate
Reconfigura+on+me
Numberofresources
probabilitydistribu+onofnumberofresourcesbeingreconfigured
Availability
Avg.no.availableresources
Requestarrivalrate
Requestavg.service+me
Avg.response+me
Max.no.resourcesthatcanbereconfiguredsimultaneously
Numberofresources
Probabilitydistribu+onofno.ofservicerequestsinthesystem
Max.no.resourcesthatcanbereconfiguredsimultaneously
Effec+vetargetreconfigura+onrate
AdjustReconfigura+onRate
Figure 5.1: Analytic Model Framework
68
Table 5.1: Summary of Variable Names and Descriptions
Variable Description
Ps(t) Probability that an attacker needs t time units to launch a successful attack
Ts Time needed for an attacker to succeed. Ps(Ts) = 1
c Number of resources
c∗ Maximum number of resources than can be in the process of being reconfigured
ca Minimum number of resources that are available for use. ca = c− c∗
c Average number of resources not being reconfigured
nr Average number of resources being reconfigured
nqrAverage number of waiting reconfiguration requests(this number is zero for the dropped policy)
α Target reconfiguration rate (measured in rec/sec)
α′ Effective reconfiguration rate (measured in rec/sec)
S Average time to reconfigure a resource
pk Probability that there are k reconfiguration requests in the system
pk Probability that k resources are being reconfigured
PkProbability that there are k service requests in the system(being served or waiting to be served)
λ Average arrival rate of service requests
T Average time a service request spends using a resource
R Average response time of service requests
5.3 Reconfiguration Models
This section presents the models for the drop and wait reconfiguration policies. The policies
are both closely based on the CTMCs of the previous reconfiguration model, with slight
modifications necessary to account for the new policies. The following subsection describes
core results that apply to both reconfiguration policies, while in later subsections we provide
specific results for each policy.
5.3.1 Core Results
For both polcies, each resource cycles through periods in which it is available for use or is
being reconfigured. We use k, (k = 0, . . . , c) to denote the number of number of resources
being reconfigured. Several useful results can be obtained from the probabilities pk that k
resources are being reconfigured. These probabilities are a function of the reconfiguration
rate α, the average time S to reconfigure a resource, the number of resources c, and the
maximum number c∗ of resources that can be reconfigured at the same time. Note that c∗
69
is a parameter set by the system admins to control the tradeoff between performance and
availability as we discuss later. Thus,
pk = f(k, α, S, c, c∗) (5.1)
To derive the core results, we assume we know the values of pk, and then show in
subsequent sections how these probabilities can be obtained for each reconfiguration policy.
Let c be the average number of resources available for use (i.e., not being reconfigured) and
nr the average number of resources being reconfigured. Thus,
c = c+ nr (5.2)
But, nr can be obtained from the probabilities pk as
nr =c∗∑k=1
k · pk (5.3)
The availability A of the set of resources is given by the fraction of resources available
for use, i.e.,
A =c
c= 1−
∑c∗
k=1 k · pkc
. (5.4)
While α is the target reconfiguration rate, it cannot always be achieved because the
start of a reconfiguration may be delayed for some time d due to a reconfiguration request
being dropped or queued. This is illustrated in Figure 5.2, which shows the effective time
between reconfigurations. This time is also the average age of each of the resources and
defines the effectiveness of the MTD.
1/α′ = 1/α+ d (5.5)
70
Therefore, the effective reconfiguration rate is α′ = α/(1 + α · d). It turns out that the
value of the delay d depends primarily on the results of the reconfiguration model R, but
is also influenced by whether the resource is idle or not when a reconfiguration is scheduled
to start, which depends on the results of the performance model S. The cyclic dependency
between the two models is addressed in detail in Section 5.5. The next section derives the
equations for the reconfiguration model assuming that the effective reconfiguration rate is
equal to the target rate (i.e., d = 0). Then, in Section 5.4, we derive the performance model
for service requests as a function of the probability distribution of the number of resources
being reconfigured.
Endofareconfigura.on Scheduledstartofnewreconfigura.on
Actualstartofnewreconfigura.on
1/α1/α’
d
Figure 5.2: Target and Effective Reconfiguration Rate
5.3.2 Drop Reconfiguration Requests Policy
Figure 5.3 illustrates the flowchart of the reconfiguration process for the drop policy. If
the number of resources k being reconfigured is equal to c∗, a request to reconfigure is
dropped and a new reconfiguration request is generated after 1/α time units on average.
If the threshold c∗ has not been reached and the resource to be reconfigured is idle (i.e.,
not handling a service request), k is incremented by 1, the resource is reconfigured, k is
decremented by 1, and a new reconfiguration request is generated after 1/α time units on
average. If the resource to be reconfigured is not idle, the reconfiguration has to wait for
the resource to become available.
71
Generatereconfigura-onrequest
k≥c*?
istheresourceidle?
kçk+1
kçk-1
reconfiguretheresource
waitfortheresourcetobecomeidle
YES
YES
NO
NO
Figure 5.3: Flowchart of the Reconfiguration Cycle under the Drop Policy
We now use the CTMC of Figure 5.4 to compute pk, (k = 0, . . . , c∗), the probability
that k resources are being reconfigured. The state k in the CTMC of Figure 5.4 represents
the number of reconfiguration requests in the system, which in the case of the drop policy
is also the number of resources being reconfigured, so pk = pk, k = 0, . . . , c∗.
An expression for pk (k = 0, . . . , c∗) is obtained by using the general birth-death equation
for CTMCs [83]:
pk = p0 ·k−1∏i=0
γiµi+1
k = 1, . . . , c∗ (5.6)
p0 =
[1 +
c∗∑k=1
Πk−1i=0
γiµi+1
]−1
(5.7)
where γk = α · (c − k), for k = 0, . . . , c∗ − 1, is the aggregate rate at which resources are
72
0 1 2 k c*-1 c*... ...
αc α(c-1) α(c-k+1) α(c-k) α(c-c*+1)
1/S 2/S k/S (k+1)/S c*/S
MarkovChainforthecaseinwhichreconfigura?onrequestsaredroppedwhenthethresholdismet.
Figure 5.4: State Transition Diagram of the Markov Chain for the Reconfiguration Modelunder the Drop policy
reconfigured when there are k resources being reconfigured and µk = k/S, for k = 1, . . . , c∗,
is the aggregate rate at which resources complete reconfiguration when there are k resources
being reconfigured. Using the expressions for γk and µk in Eqs. 5.6 and 5.7, we obtain
pk=p0 ·k−1∏i=0
α(c− i)(i+ 1)/S
=p0(α·S)k
ck
k = 1, . . . , c∗ (5.8)
An expression for p0 is obtained by noting that the sum of all probabilities is equal to
1. Thus,
p0 =
1 +
c∗∑k=1
(α · S)k
ck
−1
(5.9)
The values of pk can be easily computed because the summation needed to compute p0
is finite.
In the drop policy, a reconfiguration request is dropped if it arrives when the number of
resources being reconfigured is equal to c∗. Thus, the drop probability , pd, can be computed
as the ratio of the rate of reconfiguration requests that arrive at state k = c∗ multiplied by
the probability of being at that state, to the sum of the aggregate rates γk = α(c − k) of
73
reconfiguration requests across all states k = 0, . . . , c∗. Thus,
pd =pc∗ α (c− c∗)∑c∗
k=0 pk α (c− k)=
pc∗ (c− c∗)∑c∗
k=0 pk (c− k)(5.10)
We can now compute the average age of a resource, i.e., the average time it takes
for a resource to be reconfigured after its last reconfiguration. The probability that a
reconfiguration request is dropped exactly j times is pdj · (1 − pd). If a reconfiguration
request is dropped exactly j times, the average age of the resource will be (j + 1) · 1/α
because 1/α is the average time between successive reconfiguration requests. Thus, the
average age of a resource under the drop policy is
aged =∞∑j=0
j + 1
α· pdj · (1− pd) =
1
α · (1− pd)(5.11)
5.3.3 Wait Reconfiguration Requests Policy
The flowchart of the reconfiguration cycle under the wait policy is depicted in Figure 5.5.
This flowchart is very similar to that of Figure 5.3, with the difference that when the
threshold c∗ has been reached, the reconfiguration request is not dropped. Instead, it waits
until the the number k of resources being reconfigured drops below c∗.
To analyze the wait policy, we consider the CTMC of Figure 5.6 in which the state
k (k = 0, . . . , c) represents the number of reconfiguration requests in the system, either
being processed or waiting to be processed. As before, pk is the probability that there are
k reconfiguration requests in the system.
An expression for pk (k = 0, . . . , c) is obtained by using the general birth-death equation
for Markov Chains given by Eqs 5.6 and 5.7, where γk = α(c − k), for k = 0, . . . , c − 1,
is the aggregate rate at which reconfiguration requests are generated when there are k
reconfiguration requests in the system and the aggregate reconfiguration completion rate
µk for k = 1, . . . , c is given by
74
Generatereconfigura-onrequest
k≥c*?
istheresourceidle?
kçk+1
kçk-1
reconfiguretheresource
waitfortheresourcetobecomeidle
YES
YES
NO
NO
waitfork<c*
Figure 5.5: Flowchart of the Reconfiguration Cycle under the Wait Policy
... ... ...0 1 2 k c* c*+1 c-1 c
αc α(c-1) α(c-k+1) α(c-k) α(c-c*+1) α(c-c*) 2α α
1/S 2/S k/S (k+1)/S c*/S c*/S c*/S c*/S
MarkovChainforthecaseinwhichreconfigura?onrequestsarequeuedwhenthethresholdismet.
Figure 5.6: State Transition Diagram of the Markov Chain for the Reconfiguration Modelunder the Wait Policy
75
µk =
k/S k = 1, . . . , c∗
c∗/S k = c∗ + 1, . . . , c(5.12)
Using the expressions for γk and µk in Eqs. 5.6 and 5.7 we obtain
pk=p0 ·k−1∏i=0
α(c− i)(i+ 1)/S
= p0(α · S)k
ck
k = 1, . . . , c∗ (5.13)
and
pk = p0 ·c∗−1∏i=0
α(c− i)(i+ 1)/S
k−1∏i=c∗
α(c− i)c∗/S
= p0(α·S)kc!
c∗!c∗k−c∗(c− k)!
k = c∗+1, . . . , c (5.14)
An expression for p0 is obtained by noting that the sum of all probabilities is equal to
1. Thus,
p0 = (1 + S1 + S2)−1 (5.15)
where
S1 =
c∗∑k=1
(α · S)k
ck
(5.16)
and
76
S2 =c!
c∗!
c∑k=c∗+1
(α · S)k1
c∗k−c∗(c− k)!
(5.17)
The values of pk can be easily computed because the summations needed to compute
p0 are finite. The values of pk, the probability that k resources are being reconfigured, can
be computed as a function of pk as pk = pk for k = 0, . . . , c∗ − 1 and pc∗ =∑c
k=c∗ pk.
In fact, when the number of reconfiguration requests in the system is smaller than c∗, all
reconfiguration requests cause a resource to be reconfigured. When a reconfiguration request
finds the number of resources being reconfigured equal to the threshold, and this happens
with probability∑c
k=c∗ pk, the request has to wait.
One can compute the throughput Xr of reconfiguration requests as a function of pk as
Xr =1
S
c∗∑k=1
k · pk (5.18)
and the average number Nr of reconfiguration requests in the system as
Nr =c∑
k=1
k · pk (5.19)
Using Little’s law, we can the determine the average time in the system for reconfigura-
tion requests as Rr = Nr/Xr. This corresponds to the sum of the average reconfiguration
time S and the average reconfiguration delay d. Thus,
d =Nr
Xr− S (5.20)
We can now determine the average age of each resource under the wait policy as follows.
After a reconfiguration request completes, it takes 1/α time units on average for the next
77
reconfiguration request to arrive. But, the next request may have to wait. The arrival of a
reconfiguration request can occur anytime within the reconfiguration delay d. On average,
that arrival will have to wait d/2 time units. Thus, the average age of a resource agew is
given by
agew = 1/α+ d/2 (5.21)
5.4 Response Time Model
For the performance model, we use the CTMC of Figure 5.7 with an infinite number of
states where a state k = 0, 1, 2, . . . represents the number of service requests in the system,
either using one of the available resources or waiting for one. Service requests are assumed
to come from a Poisson process at an average rate λ and complete at a rate µδk, where
µ = 1/T (the request completion rate at a resource) and δk is derived from the probability
distribution obtained from the reconfiguration model. Note that the queue of Figure 4.1 is
similar to an M/M/c queuing system with an important difference. In an M/M/c model,
the rate at which transactions complete is kµ for k = 1, . . . , c and cµ for k > c. In our case
we need to adapt the transaction completion rate to take into account the resources that
may be in the process of being reconfigured. Thus, we follow an approach similar to the
derivation of the M/M/c queue results [83], with a modification in the average transaction
completion rate.
0 1 2 c k+1k... ...
λ λ λ λ λ
µδ1
...
µδ2 µδc µδc µδc
Figure 5.7: State Transition Diagram for the Response Time Model
Consider the following additional notation: (i) Pk, the probability that there are k
78
requests in the system, either being processed or waiting for an available resource; (ii)
µ = 1/T , the average service rate of each resource; and (iii) ρ = λ/(µ c), the average
utilization of the resources. We now provide and explain an expression for δk, the multiplier
of the resource service rate µ in the CTMC of Figure 4.5. Before providing a general
expression, we discuss a numerical example. Let c = 10 and c∗ = 4. Therefore, there
are ca = c − c∗ = 6 resources always available for service requests because at most 4
resources can be reconfigured at the same time. Thus, when the number of service requests
in the system is at most ca, the average aggregate departure rate is equal to the number of
requests multiplied by µ (i.e., δk = k for k = 1, . . . , 6). Consider, for example, that there
are 8 service requests in the system, thus ca < k < c. If 0, 1, or 2 resources are being
reconfigured, and this happens with probability p0 + p1 + p2, there are enough resources for
all service requests in the system, and the aggregate departure rate is 8µ. If three resources
are being reconfigured, and this happens with probability p3, there is only one resource,
beyond the six, that can be used for service requests. So, the aggregate departure rate is
(6 + 1)µ = 7µ. For the same reason, if four resources are being reconfigured, there are only
6 available resources and the aggregate departure rate is 6µ. Table 5.2 shows the departure
rates for all states in the example considered here.
Table 5.2: Example of the Aggregate Departure Rate for c = 10 and c∗ = 4
State Departure ratek, k = 1, . . . , 6 kµ7 6µ+ µ(p0 + p1 + p2 + p3)8 6µ+ µp3 + 2µ(p0 + p1 + p2)9 6 + µp3 + 2µp2 + 3µ(p0 + p1)10 6µ+ µp3 + 2µp2 + 3µp1 + 4µp0k, k = 11, . . . 6µ+ µp3 + 2µp2 + 3µp1 + 4µp0
The expression for δk can be generalized as shown below. Note that δk = δc for k =
c + 1, . . ., and that δc is the average number of resources that are not being reconfigured
(e.g., see the expression for state 10 in Table 5.2), and can be used to serve service requests.
79
The ratio (ρ · c/δc) = λ/(µ ·δc) can be interpreted as the average utilization of the resources.
δk=
k k = 1, . . . , ca
ca+∑k−ca−1
j=1 j · pc∗−j+
(k − ca)∑c−k
j=0 pj k = ca + 1, . . . , c
ca+∑c∗
j=1 j.pc∗−j k = c+ 1, . . .
(5.22)
As Figure 4.5 shows, the transition rate from state k to k + 1 is λ, the average arrival
rate of requests to the system, and the transition rate βk from a state k to state k − 1 is
given by
βk =
µ δk k < c
µ δc k ≥ c(5.23)
We can now use the generalized birth-death equations (see Eqs. 5.6 and 5.7) to solve for
Pk and P0. We have to break down the expression for Pk into two parts (for k = 1, . . . , c
and k > c) because βk has two expressions. Hence, for k = 1, . . . , c
Pk=P0 Πk−1i=0
λ
µ δi+1=P0
(λ/µ)k
Πk−1i=0 δi+1
=P0(ρ.c)k
Πk−1i=0 δi+1
(5.24)
and, for k = c+ 1, . . .
Pk = P0 Πc−1i=0
λ
µ δi+1Πk−1i=c
λ
µ δc=P0
(ρ · c)kδk−cc Πc−1
i=0δi+1
= P0ρk · δcc
Πc−1i=0δi+1
(5.25)
80
P0 can now be computed as
P0 =
[1 +
c∑k=1
(ρ · c)kΠk−1i=0 δi+1
+∞∑
k=c+1
δccρk
Πc−1i=0δi+1
]−1
(5.26)
If we move δcc/Πc−1i=0δi+1 out of the infinite summation in the above expression, we are left
with the following geometric series, which converges for ρ < 1:
∞∑k=c+1
ρk =ρc+1
1− ρ (5.27)
Hence, P0 can be easily computed as follows:
P0 =
[1 +
c∑k=1
(ρ · c)kΠk−1i=0 δi+1
+δcc
Πc−1i=0δi+1
ρc+1
1− ρ
]−1
(5.28)
Note that Eqs. 5.24, 5.25, and 5.28 simplify to the well-known equations for the M/M/c
queue [83] when c∗ = 0. The average number Ns of requests in the system can be computed
as
Ns =∞∑k=1
k · Pk =c∑
k=1
k · Pk +
∞∑k=c+1
k · Pk (5.29)
The first summation in the expression above is an easy-to-compute finite summation:
P0
c∑k=1
k(ρ · c)k
Πk−1i=0 δi+1
(5.30)
81
The infinite summation in Eq. 5.29 can be written as
P0δcc
Πc−1i=0δi+1
∞∑k=c+1
k · ρk (5.31)
which can be computed as
P0δcc
Πc−1i=0δi+1
[ρ∂
∂ρ
∞∑k=c+1
ρk
](5.32)
and is equal to
P0δcc
Πc−1i=0δi+1
.ρc+1
1− ρ
[ρ
1− ρ + 1 + c
](5.33)
Thus, Eqs. 5.30 and 5.33 allow us to compute Ns. Finally, using Little’s Law we can
compute the average response time R as R = Ns/λ.
5.5 Combined Model
We now consider the fact that when a reconfiguration request arrives to a resource, it may
be busy serving a service request. In this case, the reconfiguration has to wait until the
resource becomes idle. This affects the rate at which reconfigurations occur. Let α′ be the
effective reconfiguration rate, i.e., the rate at which reconfigurations occur. This effective
reconfiguration rate should be used to compute the reconfiguration probabilities pk. The
reconfiguration rate is equal to the inverse of the average time between reconfigurations.
Thus,
1/ α′ = (1/ α) · Pr[idle] + (1/ α+ Tres) · (1− Pr[idle]) (5.34)
where Pr[idle] is the probability that a resource is idle when it is time to reconfigure and
Tres is the average residual service time when the resource is busy. From renewal theory,
Tres = E[T 2]/2E[T ] (5.35)
82
where T is the random variable that represents the service time [83]. If that variable
is exponentially distributed, Tres = E[T ] = T due to the memoryless property of the
exponential distribution. Let Φ = 1 − Pr[idle]. We can then compute Φ as a function
of the probabilities Pk using the law of total probability as indicated by the equation below
that shows values of Φ and their corresponding probabilities.
Φ =
0 P0
k/c Pk, k = 1, . . . , c− 1
1 1−∑c−1k=0 Pk
(5.36)
The explanation behind Eq. 5.36 is the following. When there are no service requests
in the system, and this happens with probability P0, the probability that a reconfiguration
request finds the resource busy is zero. When all resources are busy, and this happens
with probability 1 −∑c−1k=0 Pk, the probability that a reconfiguration request for a specific
resource finds the resource busy is 1. When k (k = 1, . . . , c − 1) resources are busy, the
probability that a reconfiguration request finds a specific resource busy is equal to 1 minus
the probability that it finds the resource idle. Thus, the probability that the resource is
busy is equal to
1−
c− 1
k
/ck
=k
c(5.37)
because the the probability that the resource is idle is equal to the number of ways one
can choose k resources to be busy out of the remaining c− 1 resources divided by the total
number of ways one can select k resources to be busy out of c resources. Thus, using the
83
Law of Total Probability we get,
Φ = 0× P0 +
[1
c
c−1∑k=1
k · Pk]
+ 1× (1−c−1∑k=0
Pk)
=1
c
c−1∑k=1
k · Pk + (1−c−1∑k=0
Pk) (5.38)
We can now rewrite Eq. 5.34 as
α′ =α
1 + αΦT(5.39)
The probabilities Pk depend on the probabilities pk that depend on α′, which depends
on the probabilities Pk. This is a fixed point problem that can be solved iteratively. Let
S(pk, c, c∗, λ, T ) be the service response time model (see Section 5.4) that computes the
probabilities Pk and let R(c, c∗, α, S) be the reconfiguration model (see Section 5.3) that
computes the probabilities pk. The following iterative algorithm, shown in the steps below
and in Algorithm 3, can be used to solve this fixed point problem. The busy probability Φ
is initially set to zero and it is recomputed at Step 5. The difference between the values of
Φ in successive iterations is checked at Step 6 against a given tolerance ξ.
• Step 1. Initialize: i← 0; Φi ← 0;
• Step 2. Compute α′: α′ ← α/(1 + αΦiT );
• Step 3. Compute the reconfiguration probabilities pk: pk ← R(c, c∗, α′, S);
• Step 4. Compute the service request probabilities Pk: Pk ← S(pk, c, c∗, λ, T );
• Step 5. Increment iteration count and compute new value of the busy probability:
i← i+ 1; Φi ← 1c
∑c−1k=1 k · Pk + (1−∑c−1
k=0 Pk);
• Step 6. Check tolerance: if | Φi−Φi−1
Φi|> ξ go to Step 2;
84
• Step 7. Compute the average response time R as a function of the probabilities Pk.
Algorithm 3 AdjustedResponseT ime(c, c∗, λ, α, T, S, ξ)
Input: Resource count c, resource limit c∗, arrival rate λ, reconfiguration rate α, servicetime T , reconfiguration time S, tolerance ξ
Output: Adjusted response time R1: i← 0; Φi ← 02: repeat3: α′ ← α/(1 + αΦiT )4: pk ← R(c, c∗, α′, S)5: Pk ← S(pk, c, c∗, λ, T )
6: i← i+ 1; Φi ← 1c
∑c−1k=1 k · Pk + (1−∑c−1
k=0 Pk)
7: until | Φi−Φi−1
Φi|≤ ξ
8: R← S(pk, c, c∗, λ, T )9: return R
5.6 Simulation and Experimental Testbed
The simulation and experiments were implemented just as before using SimPy and Citrix
XenServer as described in Section 4.3. The additional reconfiguration policies were imple-
mented using the logic described in the flowcharts of Figure 5.3 and Figure 5.5. It should
be noted that when c = c∗, no reconfiguration limits are in place and the system behaves
exactly as it does before in the previous chapter.
The values of the all input parameters used in the simulations and experiments are given
in Table 5.3. The value of c∗ is chosen based on knowing the values of c, λ, and T . We
choose c∗ such that c − c∗ > λ · T , so with λ = 10 and T = 0.5, c − c∗ > 5 and therefore
c∗ = 14. This ensures that at any given point in time, there are at least 6 resources available
to serve requests and resource utilization ρ is always less than 1. In fact, the maximum
value of ρ in this configuration is (λ · T )/6 = 0.833.
5.7 Numerical Results and Validation
This section presents a variety of numerical results starting with results obtained with the
analytic model. Then, simulation is used to validate the analytic results. In what follows,
85
Table 5.3: Values of Variables Used in Simulation Results
Variable Descriptionc 20c∗ 14α from 0.001 to 0.050 req/secS 120 secλ 10 requests/secT 0.5 secTs 300 sec
simulation and experimental results are compared. Finally, we show how the analytic model
derived here can be used to find an optimal value of the reconfiguration rate that considers
tradeoffs between response times and the chance of defending from attacks.
5.7.1 Analytic Model Results
There are some important tradeoffs illustrated by the equations derived in Sections 5.3
and 5.4. First, as the reconfiguration rate α increases, less time is given for an attacker to
succeed, but the resource availability decreases and both the probability that a request for
a resource has to queue and the response time increase. We illustrate these tradeoffs by
using the equations above in a variety of numerical examples for varying values of α and c∗
for the two policies.
Figures 5.8a and 5.8b show the effect of α increasing on availability and resource utiliza-
tion ρ. As α increases, availability decreases and the average utilization increases, converging
to a limit defined by c∗. Figures 5.9a and 5.9b show the response time and average age of
the resources for both policies. As we would expect, a larger value of α results in a increased
response time due to busier resources on average, but also increased protection in the form
of reduced average age of the resources.
While both policies behave similarly, there are still some differences observed. For very
low reconfiguration rates (α → 0) all c resources are available and both policies behave
the same way as illustrated in the figures. As α increases, the wait policy ensures that a
generated reconfiguration request will be immediately honored as soon as k < c∗ whereas
86
the drop policy will drop requests generated when k ≥ c∗. Therefore, the wait policy will
reconfigure more often than the drop policy and therefore its response time will be higher
than that of the drop policy (see Figure 5.9a) and the average age of resources is lower than
that of the drop policy (see Figure 5.9b). For example, for α = 0.05 rec/sec, the average
response time of service requests of the wait policy is 17.5% higher than that of the drop
policy and the average age of a resource for the drop policy is 57% higher than that of
the wait policy. This illustrates the tradeoff we discussed above in the sense that the wait
policy always exhibits a worse response time than the drop policy but it exhibits a lower
resource age, which diminishes the probability of success from an attacker. Note that with
a policy that limits the maximum number of resources reconfiguring at a time, the response
time and resource age are also limited as a function of the average reconfiguration rate α.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Availability
α (rec/sec)
Drop Wait
(a) Availability
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
ρ
α (rec/sec)
Drop Wait
(b) Resource Utilization
Figure 5.8: Average Availability and Resource Utilization for Drop and Wait Policies
Figures 5.10a and 5.10b show availability for various values of c∗. With c∗ = 14, we
observe that availability converges towards a value of 0.3, which represents 6 of the 20
available resources. For larger values of c∗, availability is even lower, but we cannot tell
by number of available resources alone if the system will be able to keep up without also
knowing the demand for those resources.
To determine if resources can keep up with the demand from our incoming service
87
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Re
spo
nse
Tim
e (
sec)
α (rec/sec)
Drop Wait
(a) Response Time
0
20
40
60
80
100
120
140
160
180
200
Ag
e (
sec)
α (rec/sec)
Drop Wait
(b) Average Age
Figure 5.9: Average Response Time and Resource Age for Drop and Wait Policies
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Av
ail
ab
ilit
y
α (rec/sec)
c* = 14 c* = 17 c* = 20
(a) Drop Policy
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Av
ail
ab
ilit
y
α (rec/sec)
c* = 14 c* = 17 c* = 20
(b) Wait Policy
Figure 5.10: Availability for Varying Levels of c∗
88
requests, we calculate the resource utilization ρ = λ/(µ · c), where µ = 1/T , T = 0.5 sec
and incoming service request rate λ = 10 requests/sec. Figures 5.11a and 5.11b show the
resource utilization for the previously selected values of c∗. Note that any state where ρ ≥ 1
denotes an unstable state where on average, the resources cannot handle the incoming
requests and they will queue infinitely. Therefore, we must choose a value of c∗ such
that c − c∗ > λ · T , leading to our choice of c∗ = 14. For this value of c∗, as α → ∞,
ρ → λ · T/(c − c∗) = (10 · 0.5)/(20 − 14) = 0.833 which ensures the system remains stable
for all values of α.
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
ρ
α (rec/sec)
c* = 14 c* = 17 c* = 20
(a) Drop Policy
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
ρ
α (rec/sec)
c* = 14 c* = 17 c* = 20
(b) Wait Policy
Figure 5.11: Average Resource Utilization for Varying Levels of c∗
Figures 5.12a and 5.12b show average response times with other settings of c∗. With
c∗ > 14, we observe that for values of α above a certain point, the system becomes unable
to handle all incoming service requests because of a scarcity of resources and the response
time grows towards infinity, therefore validating our choice of c∗ = 14 for this number of
resources and our predicted demand for service.
Figure 5.13a shows the distribution of the number k of resources being reconfigured for
several values of the reconfiguration rate α out of a total of 20 resources, with the drop policy
and for c∗ = 14. The graphs show that the distribution is bell-shaped for low values of α,
as we might expect. As α increases towards 0.04 rec/sec, the probability distribution moves
89
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
Re
spo
nse
Tim
e (
sec)
α (rec/sec)
c* = 14 c* = 17 c* = 20
(a) Drop Policy
0
1
2
3
4
5
6
Re
spo
nse
Tim
e (
sec)
α (rec/sec)
c* = 14 c* = 17 c* = 20
(b) Wait Policy
Figure 5.12: Average Response Time for Varying Levels of c∗
to the right until it reaches the maximum number of requests that can be reconfigured. At
that point of saturation, the system will tend to spend the majority of its time in that state.
The average number of resources being reconfigured is 7.49 for α = 0.005 rec/sec, growing
up to 13.47 for α = 0.04 rec/sec. When α = 0.04 rec/sec, the system spends 62.2% of the
time with the maximum possible number of resources being reconfigured.
Similarly, Figure 5.13b shows the same phenomenon for the wait policy. Because, as
explained earlier, with the wait policy all reconfiguration requests will eventually be served,
on average there are more such active requests in the system. The average number of
resources being reconfigured is 7.50 for α = 0.005 rec/sec but rises to 13.96 for α = 0.04
rec/sec. When α = 0.04 rec/sec, the system spends 97.2% of the time with the maximum
number of resources being reconfigured.
5.7.2 Validation with Simulation Results
This section shows validation results between the analytic model and the simulation de-
scribed in Section 4.3. All simulation results show error bars representing 95% confidence
intervals. As seen in the figures described in this subsection, the analytic model results
match very closely the simulation ones.
Figures 5.14a through 5.15b show the probability distributions of the resources shuffling.
90
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pro
ba
bil
ity
# of Resources Reconfiguring
α = 0.005 α = 0.01 α = 0.02 α = 0.04
(a) Drop Policy
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pro
ba
bil
ity
# of Resources Reconfiguring
α = 0.005 α = 0.01 α = 0.02 α = 0.04
(b) Wait Policy
Figure 5.13: Probability Distributions of pk and pk for Varying Levels of α
Here we can again see the shape of the distribution changing from a bell-shaped distribution
to one weighted towards k = c∗ and in nearly all cases, the analytic value is within the 95%
confidence interval with a relative error of < 10%.
0.00
0.05
0.10
0.15
0.20
0.25
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pro
ba
bil
ity
# of Resources Reconfiguring
Simulation Analytic
(a) α = 0.005
0.0
0.1
0.1
0.2
0.2
0.3
0.3
0.4
0.4
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pro
ba
bil
ity
# of Resources Reconfiguring
Simulation Analytic
(b) α = 0.020
Figure 5.14: Comparison of pk Between Simulation and Analytical Model for Drop Policy
The probability distributions obtained from the analytic model and the simulation are
used to calculate overall availability, which are shown in Figure 5.16a. Here we see the
simulation results very nearly match the analytic results, with less than 1% relative error
for nearly every value of α. Figure 5.16b shows simulation and analytic results for the
average response time of requests under the drop policy for a range of values of α. The
91
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pro
ba
bil
ity
# of Resources Reconfiguring
Simulation Analytic
(a) α = 0.005
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Pro
ba
bil
ity
# of Resources Reconfiguring
Simulation Analytic
(b) α = 0.020
Figure 5.15: Comparison of pk Between Simulation and Analytical Model for Wait Policy
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Availability
α (rec/sec)
Analytic Simulation
(a) Availability
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Re
spo
nse
Tim
e (
sec)
α (rec/sec)
Analytic Simulation
(b) Response Time
Figure 5.16: Comparison of Availability and Response Time Between Simulation and Ana-lytical Model for Drop Policy
92
maximum absolute percent relative error is 7.7% and occurs at about the midrange of the
values of α.
Figure 5.17a compares the average age of a resource under the drop policy computed
by the analytic model and the simulation. The maximum absolute percent relative error
is below 10% for all values of α. Figure 5.17b compares the percentage of dropped recon-
figuration requests for the drop policy obtained with the analytic model with the results
obtained with the simulation. For most values of α, the maximum absolute percent relative
error is below 5%.
0
20
40
60
80
100
120
140
160
180
200
Ag
e (
sec)
α (rec/sec)
Analytic Simulation
(a) Average Age
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
P(drop)
α (rec/sec)
Simulation Analytic
(b) Drop Percentage
Figure 5.17: Comparison of Average Resource Age and Drop Percentage Between Simulationand Analytical Model for Drop Policy
Based on the drop percentage we can also also determine α′, the effective value of α which
is measured in the simulation as the average time between reconfigurations for each resource,
as opposed to average age which is obtained by sampling each resource periodically to get
its age and averaging the results. We graph α′ in Figure 5.18 for the both the simulation
and the analytic model next to a line denoting the actual value of α. Here we observe for
low values of α, α = α′, but as α increases, the drop policy begins to take effect and limit
α′.
Similar validations were made for the wait policy and show the same degree of validation
between the analytic and simulation results across the entire range of α values.
93
0.00
0.01
0.02
0.03
0.04
0.05
0.06
α'
α (rec/sec)
Simulation Analytic Alpha
Figure 5.18: Comparison of Effective Reconfiguration Rate Between Simulation and Ana-lytical Models for Drop Policy
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Av
ail
ab
ilit
y
α (rec/sec)
Simulation Analytic
(a) Availability
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Re
spo
nse
Tim
e (
sec)
α (rec/sec)
Simulation Analytic
(b) Response Time
Figure 5.19: Comparison of Availability and Response Time Between Simulation and Ana-lytical Model for Wait Policy
94
0
20
40
60
80
100
120
140
160
180
200
Ag
e (
sec)
α (rec/sec)
Simulation Analytic
(a) Average Age
0
5
10
15
20
25
30
35
De
lay
(se
c)
α (rec/sec)
Simulation Analytic
(b) Reconfiguration DelayDrop Percentage
Figure 5.20: Comparison of Average Resource Age and Reconfiguration Delay BetweenSimulation and Analytical Model for Wait Policy
0.00
0.01
0.02
0.03
0.04
0.05
0.06
α'
α (rec/sec)
Simulation Analytic Alpha
Figure 5.21: Comparison of Effective Reconfiguration Rate Between Simulation and Ana-lytical Models for Wait Policy
95
5.7.3 Validation of the Simulation with Experimental Results
We describe here a validation of the simulation model with experimental results ob-
tained with the setup described in Section 4.3. Tables 5.4 and 5.5 compare simula-
tion and experimental results for availability and response time for the drop and wait
policies, respectively, for values of α ranging from 0.005 to 0.050 reconfigurations/sec.
The average values and corresponding 95% confidence intervals are shown in the tables.
The last column in each table shows the absolute percent relative error computed as
100 × (simulation − experiment)/simulation. As we can see from the tables, the errors
are low and are below 5% in all cases except for one case in which the error was 7.8%. This
finding, along with the findings in the above subsection, validates the analytic results with
simulation and experimentation.
Table 5.4: Comparison of Simulation and Experimental Results for Availability
Drop Policy Wait Policyα Simulation Experimental Error Simulation Experimental Error0.005 0.706 ± 0.013 0.701 ± 0.013 0.71% 0.703 ± 0.014 0.701 ± 0.015 0.28%0.010 0.487 ± 0.012 0.495 ± 0.02 1.64% 0.470 ± 0.015 0.473 ± 0.018 0.64%0.015 0.391 ± 0.006 0.381 ± 0.007 2.56% 0.357 ± 0.009 0.359 ± 0.011 0.56%0.020 0.360 ± 0.003 0.350 ± 0.004 2.78% 0.319 ± 0.005 0.311 ± 0.003 2.51%0.025 0.342 ± 0.003 0.335 ± 0.002 2.05% 0.308 ± 0.002 0.305 ± 0.002 0.97%0.030 0.335 ± 0.003 0.331 ± 0.003 1.19% 0.303 ± 0.001 0.301 ± 0.000 0.66%0.040 0.322 ± 0.001 0.321 ± 0.001 0.31% 0.301 ± 0.001 0.300 ± 0.000 0.33%0.050 0.317 ± 0.001 0.316 ± 0.001 0.32% 0.301 ± 0.000 0.300 ± 0.000 0.33%
5.7.4 Determining the Optimal Reconfiguration Rate
The model presented in Section 4.2 allows one to predict the response time and the average
age of each resource in the system. We can then use these results to answer questions such as
“Given objective values for response time and level of protection, what is the reconfiguration
rate that maximizes overall utility?”
We can solve this by estimating the attacker’s chance of success using the method
96
Table 5.5: Comparison of Simulation and Experimental Results for Response Time
Drop Policy Wait Policyα Simulation Experimental Error Simulation Experimental Error0.005 0.503 ± 0.002 0.506 ± 0.003 0.60% 0.507 ± 0.004 0.508 ± 0.002 0.20%0.010 0.533 ± 0.008 0.532 ± 0.01 0.19% 0.544 ± 0.01 0.558 ± 0.017 2.57%0.015 0.605 ± 0.014 0.594 ± 0.014 1.82% 0.664 ± 0.025 0.673 ± 0.029 1.36%0.020 0.636 ± 0.013 0.651 ± 0.017 2.36% 0.731 ± 0.023 0.788 ± 0.030 7.80%0.025 0.667 ± 0.016 0.683 ± 0.017 2.40% 0.801 ± 0.027 0.793 ± 0.024 1.00%0.030 0.679 ± 0.012 0.687 ± 0.021 1.18% 0.791 ± 0.025 0.805 ± 0.020 1.77%0.040 0.718 ± 0.019 0.713 ± 0.021 0.70% 0.806 ± 0.022 0.816 ± 0.029 1.24%0.050 0.725 ± 0.019 0.758 ± 0.025 4.55% 0.798 ± 0.029 0.819 ± 0.031 2.63%
described in Section 4.2.3 in the previous chapter. As an example, we estimate the attacker’s
chance of success using the linear method with Ts = 300 sec, with Ps(t) = 1 when t ≥ Ts.
Next, we assign utility values to the response time and attacker’s chance of success using
the following sigmoid functions:
UR(tr) =eσ(−tr+βR)
1 + eσ(−tr+βR)(5.40)
US(Ps) =eσ(−Ps+βS)
1 + eσ(−Ps+βS)(5.41)
where tr is the response time, βR is the objective response time, Ps is the attacker’s
chance of success, βS is the objective attacker’s success rate, and σ is a steepness parameter
for the sigmoid. We can now compute a global utility function Ug as:
Ug = wR · UR(tr) + wS · US(Ps) (5.42)
where wR and wS are weight factors chosen such that wR +wS = 1. Different values of
wR and wS influence the optimal reconfiguration rate. For example, Figure 5.22 shows the
overall utility values for the drop policy where TS = 300 sec, βR = 55 sec, βS = 0.2, and σ
= 10. When wR = wS , the optimal value is found at α = 0.018 rec/sec. When wR = 0.75,
97
denoting an emphasis on response times at the cost of protection, the optimal value can be
found at α = 0.018 rec/sec, and when wS = 0.75, denoting an emphasis on protection at
the cost of response times, the optimal value is 0.041 rec/sec.
0.30
0.32
0.34
0.36
0.38
0.40
0.42
0.44
0.46
0.48
0.50
Uti
lity
α (rec/sec)
wR = 0.5, wS = 0.5 wR = 0.75, wS = 0.25 wR = 0.25, wS = 0.75
Figure 5.22: Utility Values of Various Weight Combinations for Drop Policy
98
Chapter 6: Conclusions and Future Work
6.1 Conclusions
Moving Target Defense (MTD) has recently emerged as one of the potentially game-changing
themes in cyber security. While the typical asymmetry of the security landscape tends
to favor the attacker, MTD holds promise to change the game in favor of the defender.
Thus, MTD has received significant attention in the last decade, prompting researcher and
practitioners to develop a myriad of different MTD techniques. Unfortunately, most such
techniques are designed to address a very narrow set of attack vectors. Additionally, despite
the significant progress made in this area, the problem of studying and quantifying the cost
and benefits associated with the deployment of MTD techniques has not received sufficient
attention, and shared metrics to assess the performance of MTD techniques are still lacking.
This dissertation has introduced a framework for quantifying moving target defenses.
Our approach to quantifying the benefits of MTDs yields a single, probability-based utility
measure that can accommodate any existing or future MTD, regardless of their nature. Our
multi-layered approach captures the relationship between MTDs and the knowledge blocks
they are designed to protect and the relationship between knowledge blocks and generic
classes of weaknesses that can be exploited using that knowledge.
Furthermore, we have also proposed a quantitative analytic model for assessing the
resource availability and performance of MTDs, and a method for determining the recon-
figuration rate that minimizes the attack success probability subject to availability and
performance constraints.
To demonstrate the usefulness of this framework, we have shown through case studies
that we can compute the joint effectiveness of multiple MTDs as a function of their individ-
ual effectiveness and, by doing so, we can make informed decisions about which MTD or set
99
of MTDs provide better protection based on the security requirements or cost constraints.
We have also carried out simulations and experiments validating the formulations of the
analytical model that allows us to predict the security and response time of an MTD based
on its configuration.
6.2 Future Work
Although the work presented in this dissertation represents a significant step towards effec-
tive MTD quantification, this line of research can continue to be expanded in a number of
directions:
• Application to multiple cyber attack phases: The model works primarily on
disrupting attacker’s knowledge in the reconnaissance phase of the cyber attack chain.
While this may be the most cost-effective way to approach cyber-security, no defense
is perfect. MTDs can also prevent an attacker from gaining a foothold by periodi-
cally refreshing systems with a fresh VM instance. We can model this by treating
persistence as an additional block for exploiting weaknesses, with a probability value
that can also be disrupted by MTD. However, when calculating probabilities using the
model, we must ensure that prevention steps taken during the reconnaissance phase
are evaluated first. This might be realized by using recursion or adding more layers
to the graphical model.
• Application to multiple (dependent) services: The quantification framework
currently only calculates the probability of exploit for a single service or multiple
independent services. Similar to network attack graphs, an attacker may have to
follow a path of exploits to reach a final goal state. We may be able to model this
by treating each step in the attack as a service to be exploited, encapsulating each
service into its own MTD graph, then probabilistically determining the likelihood of
the final goal state being reached by exploiting all the services, similar to how attack
graphs already work. An MTD may apply to the knowledge blocks of a single service
100
or all services across the network, depending on their nature and individual settings.
• Choice of utility function: The way that utility functions are currently combined
in the quantification framework requires there to be sufficient MTDs to at least cover
all weaknesses. If not, then we assume that a weakness not covered by MTD will
be compromised at some point which reduces utility to 0. If the risk from leaving
a weakness unprotected by MTD can be accepted, than some other utility function
such as one based on a weighted average of the probabilities of each weakness being
exploited may suffice.
• Autonomic Controllers: With regard to the analytical model, autonomic con-
trollers can be designed that dynamically control the trade-off between availability
and security by automatically and dynamically adjusting the reconfiguration rate α
or the maximum reconfiguration value of c∗ to achieve target response times. Under
peak load conditions, the system should ensure that a sufficient number of resources
are available to serve requests, and the need to guarantee baseline availability in these
circumstances may, at the user’s discretion, override the need to achieve a target re-
configuration rate. Our work in this direction is motivated by the observation that
system overload due to low resource availability can generate large peaks in response
times.
101
Bibliography
[1] S. Jajodia, A. K. Ghosh, V. Swarup, C. Wang, and X. S. Wang, Moving Target Defense:Creating Asymmetric Uncertainty for Cyber Threats, 1st ed. Springer PublishingCompany, Incorporated, 2011.
[2] H. Okhravi, M. Rabe, T. Mayberry, W. Leonard, T. Hobson, D. Bigelow, andW. Streilein, “Survey of cyber moving target techniques,” DTIC Document, Tech.Rep., 2013.
[3] K. A. Farris and G. Cybenko, “Quantification of moving target cyber defenses,” inSPIE Defense+ Security. International Society for Optics and Photonics, 2015, pp.94 560L–94 560L.
[4] S. Jajodia, A. K. Ghosh, V. Subrahmanian, V. Swarup, C. Wang, and X. S. Wang,“Moving target defense ii,” Application of game Theory and Adversarial Modeling.Series: Advances in Information Security, vol. 100, p. 203, 2013.
[5] R. Colbaugh and K. Glass, “Moving target defense for adaptive adversaries,” in Intel-ligence and Security Informatics (ISI), 2013 IEEE International Conference on, June2013, pp. 50–55.
[6] H. Okhravi, T. Hobson, D. Bigelow, and W. Streilein, “Finding focus in the blur ofmoving-target techniques,” Security & Privacy, IEEE, vol. 12, no. 2, pp. 16–26, 2014.
[7] M. I. Husain, K. Courtright, and R. Sridhar, “Lightweight reconfigurable encryptionarchitecture for moving target defense,” in Military Communications Conference, MIL-COM 2013 - 2013 IEEE, Nov 2013, pp. 214–219.
[8] Y. Li, R. Dai, and J. Zhang, “Morphing communications of cyber-physical systemstowards moving-target defense,” in Communications (ICC), 2014 IEEE InternationalConference on, June 2014, pp. 592–598.
[9] V. Casola, A. D. Benedictis, and M. Albanese, “A moving target defense approach forprotecting resource-constrained distributed devices,” in Information Reuse and Inte-gration (IRI), 2013 IEEE 14th International Conference on, Aug 2013, pp. 22–29.
[10] M. Albanese, A. D. Benedictis, S. Jajodia, and K. Sun, “A moving target defense mech-anism for manets based on identity virtualization,” in Communications and NetworkSecurity (CNS), 2013 IEEE Conference on, Oct 2013, pp. 278–286.
[11] H. Shacham, M. Page, B. Pfaff, E.-J. Goh, N. Modadugu, and D. Boneh, “On the effec-tiveness of address-space randomization,” in Proceedings of the 11th ACM conferenceon Computer and communications security. ACM, 2004, pp. 298–307.
102
[12] S. Bhatkar, D. C. DuVarney, and R. Sekar, “Efficient techniques for comprehensiveprotection from memory error exploits.” in Usenix Security, 2005.
[13] L. V. Davi, A. Dmitrienko, S. Nurnberger, and A.-R. Sadeghi, “Gadge me if youcan: Secure and efficient ad-hoc instruction-level randomization for x86 and arm,”in Proceedings of the 8th ACM SIGSAC Symposium on Information, Computer andCommunications Security, ser. ASIA CCS ’13. New York, NY, USA: ACM, 2013,pp. 299–310. [Online]. Available: http://doi.acm.org/10.1145/2484313.2484351
[14] E. D. Berger and B. G. Zorn, “Diehard: Probabilistic memory safety for unsafelanguages,” in Proceedings of the 27th ACM SIGPLAN Conference on ProgrammingLanguage Design and Implementation, ser. PLDI ’06. New York, NY, USA: ACM,2006, pp. 158–168. [Online]. Available: http://doi.acm.org/10.1145/1133981.1134000
[15] G. Novark and E. D. Berger, “Dieharder: Securing the heap,” in Proceedingsof the 17th ACM Conference on Computer and Communications Security, ser.CCS ’10. New York, NY, USA: ACM, 2010, pp. 573–584. [Online]. Available:http://doi.acm.org/10.1145/1866307.1866371
[16] G. S. Kc, A. D. Keromytis, and V. Prevelakis, “Countering code-injection attackswith instruction-set randomization,” in Proceedings of the 10th ACM Conference onComputer and Communications Security, ser. CCS ’03. New York, NY, USA: ACM,2003, pp. 272–280. [Online]. Available: http://doi.acm.org/10.1145/948109.948146
[17] K. Onarlioglu, L. Bilge, A. Lanzi, D. Balzarotti, and E. Kirda, “G-free:Defeating return-oriented programming through gadget-less binaries,” in Proceedingsof the 26th Annual Computer Security Applications Conference, ser. ACSAC’10. New York, NY, USA: ACM, 2010, pp. 49–58. [Online]. Available:http://doi.acm.org/10.1145/1920261.1920269
[18] E. G. Barrantes, D. H. Ackley, T. S. Palmer, D. Stefanovic, and D. D. Zovi,“Randomized instruction set emulation to disrupt binary code injection attacks,” inProceedings of the 10th ACM Conference on Computer and Communications Security,ser. CCS ’03. New York, NY, USA: ACM, 2003, pp. 281–289. [Online]. Available:http://doi.acm.org/10.1145/948109.948147
[19] E. G. Barrantes, D. H. Ackley, S. Forrest, and D. Stefanovic, “Randomized instructionset emulation,” ACM Trans. Inf. Syst. Secur., vol. 8, no. 1, pp. 3–40, Feb. 2005.[Online]. Available: http://doi.acm.org/10.1145/1053283.1053286
[20] M. Thompson, N. Evans, and V. Kisekka, “Multiple os rotational environment animplemented moving target defense,” in Resilient Control Systems (ISRCS), 2014 7thInternational Symposium on, Aug 2014, pp. 1–6.
[21] R. Zhuang, S. Zhang, A. Bardas, S. A. DeLoach, X. Ou, and A. Singhal, “Investigatingthe application of moving target defenses to network security,” in Resilient ControlSystems (ISRCS), 2013 6th International Symposium on, Aug 2013, pp. 162–169.
[22] B. Cox, D. Evans, A. Filipi, J. Rowanhill, W. Hu, J. Davidson, J. Knight, A. Nguyen-Tuong, and J. Hiser, “N-variant systems: A secretless framework for security through
103
diversity,” in Proceedings of the 15th Conference on USENIX Security Symposium -Volume 15, ser. USENIX-SS’06. Berkeley, CA, USA: USENIX Association, 2006.[Online]. Available: http://dl.acm.org/citation.cfm?id=1267336.1267344
[23] H. Okhravi, A. Comella, E. Robinson, and J. Haines, “Creating a cyber moving targetfor critical infrastructure applications using platform diversity,” International Journalof Critical Infrastructure Protection, vol. 5, no. 1, pp. 30–39, 2012.
[24] D. Arsenault, A. Sood, and Y. Huang, “Secure, resilient computing clusters:Self-cleansing intrusion tolerance with hardware enforced security (scit/hes),” inProceedings of the The Second International Conference on Availability, Reliability andSecurity, ser. ARES ’07. Washington, DC, USA: IEEE Computer Society, 2007, pp.343–350. [Online]. Available: http://dx.doi.org/10.1109/ARES.2007.134
[25] A. K. Bangalore and A. K. Sood, “Securing web servers using self cleansing intrusiontolerance (scit),” in Dependability, 2009. DEPEND ’09. Second International Confer-ence on, June 2009, pp. 60–65.
[26] Y. Huang, D. Arsenault, and A. Sood, “Incorruptible system self-cleansing for intrusiontolerance,” in 2006 IEEE International Performance Computing and CommunicationsConference, April 2006, pp. 4 pp.–496.
[27] T. Roeder and F. B. Schneider, “Proactive obfuscation,” ACM Trans. Comput.Syst., vol. 28, no. 2, pp. 4:1–4:54, Jul. 2010. [Online]. Available: http://doi.acm.org/10.1145/1813654.1813655
[28] A. J. O’Donnell and H. Sethu, “On achieving software diversity for improvednetwork security using distributed coloring algorithms,” in Proceedings of the11th ACM Conference on Computer and Communications Security, ser. CCS’04. New York, NY, USA: ACM, 2004, pp. 121–131. [Online]. Available:http://doi.acm.org/10.1145/1030083.1030101
[29] B. Salamat, A. Gal, and M. Franz, “Reverse stack execution in a multi-variant execu-tion environment,” in Workshop on Compiler and Architectural Techniques for Appli-cation Reliability and Security, 2008, pp. 1–7.
[30] M. Azab, R. Hassan, and M. Eltoweissy, “Chameleonsoft: A moving target defense sys-tem,” in Collaborative Computing: Networking, Applications and Worksharing (Col-laborateCom), 2011 7th International Conference on, Oct 2011, pp. 241–250.
[31] S. Vikram, C. Yang, and G. Gu, “Nomad: Towards non-intrusive moving-target de-fense against web bots,” in Communications and Network Security (CNS), 2013 IEEEConference on, Oct 2013, pp. 55–63.
[32] S. W. Boyd and A. D. Keromytis, “SQLrand: Preventing SQL injection attacks,” inInternational Conference on Applied Cryptography and Network Security. Springer,2004, pp. 292–302.
[33] K. Trovato, “Ip hopping for secure data transfer,” Apr. 10 2003, uS Patent App.09/973,311. [Online]. Available: https://www.google.com/patents/US20030069981
104
[34] M. Carvalho and R. Ford, “Moving-target defenses for computer networks,” IEEESecurity Privacy, vol. 12, no. 2, pp. 73–76, Mar 2014.
[35] A. Clark, K. Sun, and R. Poovendran, “Effectiveness of ip address randomization indecoy-based moving target defense,” in Decision and Control (CDC), 2013 IEEE 52ndAnnual Conference on, Dec 2013, pp. 678–685.
[36] Q. Jia, K. Sun, and A. Stavrou, “Motag: Moving target defense against internet denialof service attacks,” in Computer Communications and Networks (ICCCN), 2013 22ndInternational Conference on, July 2013, pp. 1–9.
[37] E. Al-Shaer, Q. Duan, and J. H. Jafarian, “Random host mutation for moving targetdefense,” in Security and Privacy in Communication Networks. Springer, 2012, pp.310–327.
[38] J. Jafarian, E. Al-Shaer, and Q. Duan, “An effective address mutation approach fordisrupting reconnaissance attacks,” Information Forensics and Security, IEEE Trans-actions on, vol. 10, no. 12, pp. 2562–2577, Dec 2015.
[39] J. H. H. Jafarian, E. Al-Shaer, and Q. Duan, “Spatio-temporal address mutation forproactive cyber agility against sophisticated attackers,” in Proceedings of the FirstACM Workshop on Moving Target Defense, ser. MTD ’14. New York, NY, USA: ACM,2014, pp. 69–78. [Online]. Available: http://doi.acm.org/10.1145/2663474.2663483
[40] J. Jafarian, E. Al-Shaer, and Q. Duan, “Adversary-aware ip address randomizationfor proactive agility against sophisticated attackers,” in Computer Communications(INFOCOM), 2015 IEEE Conference on, April 2015, pp. 738–746.
[41] J. Yackoski, P. Xie, H. Bullen, J. Li, and K. Sun, “A self-shielding dynamic networkarchitecture,” in 2011 - MILCOM 2011 Military Communications Conference, Nov2011, pp. 1381–1386.
[42] J. Yackoski, J. Li, S. A. DeLoach, and X. Ou, “Mission-oriented moving targetdefense based on cryptographically strong network dynamics,” in Proceedings of theEighth Annual Cyber Security and Information Intelligence Research Workshop, ser.CSIIRW ’13. New York, NY, USA: ACM, 2013, pp. 57:1–57:4. [Online]. Available:http://doi.acm.org/10.1145/2459976.2460040
[43] J. Yackoski, H. Bullen, X. Yu, and J. Li, Moving Target Defense II: Applicationof Game Theory and Adversarial Modeling. New York, NY: Springer New York,2013, ch. Applying Self-Shielding Dynamics to the Network Architecture, pp. 97–115.[Online]. Available: http://dx.doi.org/10.1007/978-1-4614-5416-8 6
[44] M. Albanese, E. Battista, S. Jajodia, and V. Casola, “Manipulating the attacker’s viewof a system’s attack surface,” in Communications and Network Security (CNS), 2014IEEE Conference on, Oct 2014, pp. 472–480.
[45] G. A. Fink, J. N. Haack, A. D. McKinnon, and E. W. Fulp, “Defense on the move:Ant-based cyber defense,” IEEE Security Privacy, vol. 12, no. 2, pp. 36–43, Mar 2014.
105
[46] M. Dunlop, S. Groat, W. Urbanski, R. Marchany, and J. Tront, “Mt6d: A movingtarget ipv6 defense,” in MILITARY COMMUNICATIONS CONFERENCE, 2011 -MILCOM 2011, Nov 2011, pp. 1321–1326.
[47] O. Hardman, S. Groat, R. Marchany, and J. Tront, “Optimizing a network layermoving target defense for specific system architectures,” in Proceedings of theNinth ACM/IEEE Symposium on Architectures for Networking and CommunicationsSystems, ser. ANCS ’13. Piscataway, NJ, USA: IEEE Press, 2013, pp. 117–118.[Online]. Available: http://dl.acm.org/citation.cfm?id=2537857.2537877
[48] S. Groat, M. Dunlop, W. Urbanksi, R. Marchany, and J. Tront, “Using an ipv6 movingtarget defense to protect the smart grid,” in 2012 IEEE PES Innovative Smart GridTechnologies (ISGT), Jan 2012, pp. 1–7.
[49] S. Groat, R. Moore, R. Marchany, and J. Tront, “Securing static nodes in mobile-enabled systems using a network-layer moving target defense,” in Engineering ofMobile-Enabled Systems (MOBS), 2013 1st International Workshop on the, May 2013,pp. 42–47.
[50] J. Xu, P. Guo, M. Zhao, R. F. Erbacher, M. Zhu, and P. Liu, “Comparing differentmoving target defense techniques,” in Proceedings of the First ACM Workshop onMoving Target Defense, ser. MTD ’14. New York, NY, USA: ACM, 2014, pp. 97–107.[Online]. Available: http://doi.acm.org/10.1145/2663474.2663486
[51] T. Carroll, M. Crouse, E. Fulp, and K. Berenhaut, “Analysis of network address shuf-fling as a moving target defense,” in Communications (ICC), 2014 IEEE InternationalConference on, June 2014, pp. 701–706.
[52] M. P. Collins, “A cost-based mechanism for evaluating the effectiveness of movingtarget defenses,” in Decision and Game Theory for Security. Springer, 2012, pp.221–233.
[53] W. Peng, F. Li, C.-T. Huang, and X. Zou, “A moving-target defense strategy for cloud-based services with heterogeneous and dynamic attack surfaces,” in Communications(ICC), 2014 IEEE International Conference on, June 2014, pp. 804–809.
[54] R. Zhuang, S. Zhang, S. A. DeLoach, X. Ou, and A. Singhal, “Simulation-based ap-proaches to studying effectiveness of moving-target network defense,” in National sym-posium on moving target research, 2012.
[55] K. Zaffarano, J. Taylor, and S. Hamilton, “A quantitative framework for movingtarget defense effectiveness evaluation,” in Proceedings of the Second ACM Workshopon Moving Target Defense, ser. MTD ’15. New York, NY, USA: ACM, 2015, pp.3–10. [Online]. Available: http://doi.acm.org/10.1145/2808475.2808476
[56] F. Gillani, E. Al-Shaer, S. Lo, Q. Duan, M. Ammar, and E. Zegura, “Agile virtualizedinfrastructure to proactively defend against cyber attacks,” in Computer Communica-tions (INFOCOM), 2015 IEEE Conference on, April 2015, pp. 729–737.
106
[57] Y. Han, W. Lu, and S. Xu, “Characterizing the power of moving target defense viacyber epidemic dynamics,” in Proceedings of the 2014 Symposium and Bootcamp onthe Science of Security, ser. HotSoS ’14. New York, NY, USA: ACM, 2014, pp.10:1–10:12. [Online]. Available: http://doi.acm.org/10.1145/2600176.2600180
[58] C. Phillips and L. P. Swiler, “A graph-based system for network-vulnerability analysis,”in Proceedings of the 1998 workshop on New security paradigms. ACM, 1998, pp. 71–79.
[59] S. Jha, O. Sheyner, and J. Wing, “Two formal analyses of attack graphs,” in ComputerSecurity Foundations Workshop, 2002. Proceedings. 15th IEEE, 2002, pp. 49–63.
[60] O. Sheyner, J. Haines, S. Jha, R. Lippmann, and J. M. Wing, “Automated generationand analysis of attack graphs,” in Security and privacy, 2002. Proceedings. 2002 IEEESymposium on. IEEE, 2002, pp. 273–284.
[61] S. Jajodia, S. Noel, and B. OBerry, “Topological analysis of network attack vulnera-bility,” in Managing Cyber Threats. Springer, 2005, pp. 247–266.
[62] L. Wang, T. Islam, T. Long, A. Singhal, and S. Jajodia, “An attack graph-basedprobabilistic security metric,” in IFIP Annual Conference on Data and ApplicationsSecurity and Privacy. Springer, 2008, pp. 283–296.
[63] D. M. Chess, C. Palmer, and S. R. White, “Security in an autonomic computingenvironment,” IBM Syst. J., vol. 42, no. 1, pp. 107–118, Jan. 2003. [Online]. Available:http://dx.doi.org/10.1147/sj.421.0107
[64] M. Atighetchi, P. Pal, F. Webber, and C. Jones, “Adaptive use of network-centricmechanisms in cyber-defense,” in Object-Oriented Real-Time Distributed Computing,2003. Sixth IEEE International Symposium on, May 2003, pp. 183–192.
[65] M. Carvalho, T. C. Eskridge, L. Bunch, A. Dalton, R. Hoffman, J. M. Bradshaw, P. J.Feltovich, D. Kidwell, and T. Shanklin, “Mtc2: A command and control frameworkfor moving target defense and cyber resilience,” in Resilient Control Systems (ISRCS),2013 6th International Symposium on, Aug 2013, pp. 175–180.
[66] P. Pal, R. Schantz, A. Paulos, and B. Benyo, “Managed execution environment as amoving-target defense infrastructure,” IEEE Security Privacy, vol. 12, no. 2, pp. 51–59,Mar 2014.
[67] B. Schmerl, J. Camara, G. Moreno, D. Garlan, and A. O. Mellinger, “Architecture-based self-adaptation for moving target defense (cmu-isr-14-109),” 2014.
[68] B. Schmerl, J. Camara, J. Gennari, D. Garlan, P. Casanova, G. A. Moreno, T. J.Glazier, and J. M. Barnes, “Architecture-based self-protection: Composing andreasoning about denial-of-service mitigations,” in Proceedings of the 2014 Symposiumand Bootcamp on the Science of Security, ser. HotSoS ’14. New York, NY, USA: ACM,2014, pp. 2:1–2:12. [Online]. Available: http://doi.acm.org/10.1145/2600176.2600181
107
[69] E. Yuan, S. Malek, B. Schmerl, D. Garlan, and J. Gennari, “Architecture-based self-protecting software systems,” in Proceedings of the 9th InternationalACM Sigsoft Conference on Quality of Software Architectures, ser. QoSA’13. New York, NY, USA: ACM, 2013, pp. 33–42. [Online]. Available:http://doi.acm.org/10.1145/2465478.2465479
[70] J. Camara, G. A. Moreno, and D. Garlan, “Stochastic game analysis and latencyawareness for proactive self-adaptation,” in Proceedings of the 9th InternationalSymposium on Software Engineering for Adaptive and Self-Managing Systems, ser.SEAMS 2014. New York, NY, USA: ACM, 2014, pp. 155–164. [Online]. Available:http://doi.acm.org/10.1145/2593929.2593933
[71] K. M. Carter, J. F. Riordan, and H. Okhravi, “A game theoretic approach to strategydetermination for dynamic platform defenses,” in Proceedings of the First ACMWorkshop on Moving Target Defense, ser. MTD ’14. New York, NY, USA: ACM,2014, pp. 21–30. [Online]. Available: http://doi.acm.org/10.1145/2663474.2663478
[72] T. Glazier, J. Camara, B. Schmerl, and D. Garlan, “Analyzing resilience proper-ties of different topologies of collective adaptive systems,” in Self-Adaptive and Self-Organizing Systems Workshops (SASOW), 2015 IEEE International Conference on,Sept 2015, pp. 55–60.
[73] M. Alia, M. Lacoste, R. He, and F. Eliassen, “Putting together qos andsecurity in autonomic pervasive systems,” in Proceedings of the 6th ACMWorkshop on QoS and Security for Wireless and Mobile Networks, ser. Q2SWinet’10. New York, NY, USA: ACM, 2010, pp. 19–28. [Online]. Available:http://doi.acm.org/10.1145/1868630.1868634
[74] M. Q. Ali, E. Al-Shaer, H. Khan, and S. A. Khayam, “Automated anomalydetector adaptation using adaptive threshold tuning,” ACM Trans. Inf. Syst.Secur., vol. 15, no. 4, pp. 17:1–17:30, Apr. 2013. [Online]. Available: http://doi.acm.org/10.1145/2445566.2445569
[75] F. B. Alomari and D. A. Menasce, “Self-protecting and self-optimizing databasesystems: Implementation and experimental evaluation,” in Proceedings of the2013 ACM Cloud and Autonomic Computing Conference, ser. CAC ’13. NewYork, NY, USA: ACM, 2013, pp. 18:1–18:10. [Online]. Available: http://doi.acm.org/10.1145/2494621.2494631
[76] F. Alomari and D. Menasce, “An autonomic framework for integrating security andquality of service support in databases,” in Software Security and Reliability (SERE),2012 IEEE Sixth International Conference on, June 2012, pp. 51–60.
[77] S. Christey. (2011) 2011 CWE/SANS top 25 most dangerous software errors. MITRE.[Online]. Available: http://cwe.mitre.org/top25/
[78] M. Howard and D. LeBlanc, “The STRIDE threat model,” in Writing Secure Code.Microsoft Press, 2002.
108
[79] N. Soule, B. Simidchieva, F. Yaman, R. Watro, J. Loyall, M. Atighetchi, M. Carvalho,D. Last, D. Myers, and B. Flatley, “Quantifying minimizing attack surfaces containingmoving target defenses,” in Resilience Week (RWS), 2015, Aug 2015, pp. 1–6.
[80] S. G. Chen, “Reduced recursive inclusion-exclusion principle for the probability ofunion events,” in 2014 IEEE International Conference on Industrial Engineering andEngineering Management, Dec 2014, pp. 11–13.
[81] F. Alomari and D. A. Menasce, “An autonomic framework for integrating security andquality of service support in databases,” in Software Security and Reliability (SERE),2012 IEEE Sixth International Conference on, June 2012, pp. 51–60.
[82] L. S. Lasdon, R. L. Fox, and M. W. Ratner, “Nonlinear optimization using the gener-alized reduced gradient method,” Revue francaise d’automatique, d’informatique et derecherche operationnelle. Recherche operationnelle, vol. 8, no. 3, pp. 73–103, 1974.
[83] L. Kleinrock, Queueing Systems. Volume 1: Theory. Wiley-Interscience, 1975.
[84] A. K. Bangalore and A. K. Sood, “Securing web servers using self cleansing intrusiontolerance (SCIT),” in Dependability, 2009. DEPEND’09. Second International Confer-ence on. IEEE, 2009, pp. 60–65.
109
Curriculum Vitae
Warren Connell is a member of the U.S. Air Force’s Civilian Institution Program. He hashad a variety of assignments in his 20-year career, including installing network intrusiondetection systems at Air Force network operation centers and managing information as-surance activities for all Air Force Mission Planning software. He received his Bachelor ofScience in Computer Engineering from the University of Nebraska in 2007 and went on toreceive his Master of Science in Computer Engineering at Wright State University in 2011.After finishing his Doctor of Philosophy in Information Technology at George Mason in2017, he is slated for assignment to the F-35 Joint Program Office, followed by a positionon the faculty at the Air Force Institute of Technology.
110