challenges in managing uncertainty during cyber events: lessons...
TRANSCRIPT
Matthieu Branlat, Alexander Morison, David Woods
Cognitive Systems Engineering Laboratory, The Ohio State University
Challenges in managing uncertainty during cyber events:
Lessons from the staged-world study of a
large-scale adversarial cyber security exercise
ABSTRACT In spite of the recognized challenge and
importance of developing knowledge of the
domain of cyber security, human-centered
research to uncover and address the difficulties
experienced by network defenders is to a large
extent lacking. Such research is needed to
influence the design of the joint systems facing
cyber attacks.
Network defense depends on the capacity to
successfully identify, investigate and respond
to suspicious events on the network. Such
events correspond to fragmented micro
phenomena that occur in a background of
overwhelming amounts of very similar
network activity. One key question in order to
understand and better support cyber defense is:
how can cyber defense systems manage the
high level of uncertainty they face?
Human operators are essential in network
analysis. Network analysts operate within
teams, and within larger organizations. These
are essential dimensions of cyber security that
remain under-researched. They are also at the
heart of challenges typical of joint activity in
complex work systems: tasks conducted by
distinct teams put them at risk of working at
cross-purposes and security goals conflict with
the organization‟s production goals. In
addition, cyber events are fundamentally
adversarial events, a dimension of cyber
security also under-researched.
This paper presents findings from a staged-
world study of a large-scale adversarial cyber
security exercise. It describes the challenges
and management of uncertainty in this context
and discusses their implications for the design
and development of better defense systems.
INTRODUCTION The continuously growing connectivity of
systems creates increasingly complex digital
infrastructures that enable critical and valued
services. This source of performance also
constitutes a source of vulnerability to cyber
threats, a growing concern expressed in
military, financial and industrial domains. In
particular, the potential impact of cyber attacks
on critical infrastructures and services societies
depend on daily is worrying. Industrial control
systems, seldom designed with cyber security
in mind, also exist in a competitive economical
context in which proprietary information
becomes decisive. These characteristics make
industries high-value targets for cyber
terrorism (Finco, Lee, Miller, Tebbe and
Wells, 2007). Importantly, cyber security
experts observe that, at the same time, the
knowledge cost for hackers is getting
considerably lower (Goodall, Lutters and
Komlodi, 2004), especially because of the
large availability of information,
documentation and even ready-to-use software.
On the other hand, cyber defense remains a
highly demanding task. Numerous efforts exist
to improve cyber defense, typically focused on
the search for technological solutions. But in
spite of the recognized challenge and
importance of developing knowledge of this
critical domain, human-centered research to
uncover and address the difficulties
experienced by network defenders is recent
and still limited. Moreover, understanding
cyber security, a fundamentally adversarial
domain, requires investigations of the
interrelated defense and attack processes, but
such studies are rare. While research has
produced models of cyber attack or defense
processes, simultaneous investigations of both
processes do not appear to exist (studies
usually rely more or less explicitly on
hypothesized attacker or defender behavior).
Such research is needed to influence the design
of the joint systems facing cyber attacks.
Common publications about cyber defense are
how-to resources that focus on technological
dimensions of the domain and associated
knowledge and skills (e.g., firewalls and their
management). In this type of literature,
network analysts are expected to follow good
practices in order to ensure network security.
However, other authors recognize that, in spite
of significant technological progress, human
analysts continue to be key elements of
network security. Based in part on cognitive
task analysis methods, detailed accounts of
network defense analysts’ work do exist, but
are largely focused on this single perspective
within the larger context of cybersecurity
(Goodall et al., 2004; D’amico and Whitley,
2008). More recently, publications from a
group of researchers at the University of
British Columbia has described the
collaborative nature of cyber defense and its
processes within the larger organizational
framework (Werlinger, Muldner, Hawkey and
Beznosov, 2010; Hawkey, Muldner and
Beznosov, 2008).
Cyber attacks have been described based on
after-the-fact investigations or expert
interviews. These accounts are informed
interpretations at best, since available data
often are scarce and highly ambiguous. Most
studies have focused on defense relying more
or less explicitly on hypothesized attacker
behavior. A notable exception is Jonsson and
Olovsson’s study (1997) of cyber attack
dynamics (but this study made assumptions
that limited its realism).
The focus of this paper will be primarily on
cyber defense. However, understanding cyber
defense requires considering the dynamics of
cyber attack and of the interplay between
attack and defense. These dynamics will,
therefore, be presented here; they are described
in greater detail elsewhere (Branlat et al.,
2011; Branlat, 2011). Insights from these
processes of cyber security result in directions
for the improvement of cyber defense.
STUDY CONDUCTED The research described in this document stems
from an on-going collaboration between the
Cognitive Systems Engineering Laboratory
(CSEL) at the Ohio State University and the
Idaho National Laboratory (INL). It consists of
a large-scale staged-world study of an
adversarial cyber security exercise. The
exercise was part of a weeklong training
organized by INL for the energy sector and
aimed at raising awareness about cyber
security. Forty people coming from this
industrial domain participated in the exercise,
among which a majority were IT specialists
(with various competences such as database or
network management). Some participants had
pre-existing knowledge in cyber security, but
no participant was an expert in this domain.
The environment of the exercise consisted of
the simulation of a typical industrial facility in
charge of producing some product and relying
on a large network in order to control the
production of the physical process. During the
12-hour competitive Red vs. Blue exercise, the
Red team attempted to take advantage of the
openness of the network in order to attack it
from the outside and, ultimately, perturb the
process. On the other hand, the Blue team
operated within the organization and was in
charge of protecting the network and,
ultimately, of maintaining the production.
Figure 1 – General exercise environment
The study conducted is a staged-world study
(Woods and Hollnagel, 2006, Chap. 5) based
on a found scenario designed by domain
experts from INL. It actually does not rely on a
typical scripted scenario to provide interesting
learning situations for participants. It rather
consists of a high validity exercise
environment (configuration of network and
assets, production and organizational
environment) in which the activity of the Red
team serves as the main pacer and source of
perturbations for the Blue team.
Data capture involved 4 observers with
relevant knowledge background (2 Cognitive
System Engineers with computer science
background and 2 Human Factors specialists
with energy sector expertise). Observers were
distributed in the various physical spaces to
capture teams‟ activities through hand notes.
They were supplemented by various fixed and
targeted audio/video recording devices.
Analyses implemented a process-tracing
methodology (Woods, 1993) based on the
transcription of physical behavior and verbal
communications recorded on each side.
Preparatory work based on domain literature
initially informed the study and its
methodology by identifying domain
characteristics. These anticipated
characteristics also allowed for the
identification of relevant literature, such as that
related to distributed anomaly response or
adversarial interplay. The relevant literature
provided a theoretical framework for the
various phases of the study.
CYBER ATTACK The general goal of the Red team was to
defame, perform reconnaissance, invade, and
eventually break or destroy the physical
process defended. The essential goal of attack
is to connect their network to the blue team
network with sufficient permission (e.g., root
access) with the end result being the ability to
modify the internal network, processes, and
external connections.
The recurring pattern of attack observed during
the exercise is the following:
through some intelligence phase, the team
discovers a vulnerability on the network,
members attempt to exploit it,
one member gets access,
access can be used to: conduct further
intelligence, get further access, and
compromise host to do damage or secure
presence (e.g., modify system‟s settings).
eventually, access is lost, often
unintentionally (e.g., a bad operation kills
the connection) or as a result of Blue team‟s
actions (e.g., the host is rebooted).
Cyber attack is characterized by path
dependency: potential actions and access
depend on what a location affords due to the
configuration of the networks and their assets
(e.g., access to machine A will afford different
actions and further access than access to
machine B). The pattern of attack presented
above risks giving a false sense of incremental
advances towards a well-defined goal and
through a pre-identified general path. The
general process is more akin to a laborious
exploration in the dark. This corresponds to a
type of behavior that is opportunistic and based
on trial-and-error, not entirely planful. This
pattern of attack behavior relates to the
difficulty of maintaining strategic planning
while focusing on tactical or technical
challenges. Also, the adversarial environment
is the source of trade-offs between efficiency
and exposure to detection while choosing
courses of actions. Core goals of the attack,
therefore, include avoiding detection.
Disturbances to their progress (e.g., machine
they had compromised is rebooted) are
perceived as indications of detection and
actions by the defense. They create sources of
urge to “do something” before all access is
completely lost. This type of pressure creates
further threats to their strategic goals as
compromising actions are typically more
detectable and risk revealing their presence.
Progressing in the network through various
actions, therefore, implies the risk of being
detected and denied opportunities to conduct
the attacking plan. On the other hand, because
of the nature of networks and network assets,
and because of the need for networks to be
partially opened in order to provide valuable
services, there are always opportunities for
attackers to get in somewhere.
CYBER DEFENSE Defending a valuable digital infrastructure
requires pursuing two interrelated goals:
maintaining production while preventing
hackers from gaining access and from acting
on the network (e.g., stealing or corrupting
data, interfering with process production).
First, despite the potential disruptions, the
team needs to maintain the process that
supports the organization‟s production and
service activities. Second, the defense team
needs to provide security on their local
network. This involves preventing illegitimate
activity by removing potential vulnerabilities
and stopping illegitimate access when
discovered.
While monitoring network activity, the central
question for the defense team is: Is the activity
observed legitimate or illegitimate? Faced with
high amounts of traces of activity on the
network, the defense team needs to distinguish
valid traffic from traces that are indicative of
attack processes. Answering this deceptively
simple question requires the ability to detect
suspicious traces of activity, as well as to
implement investigation processes that aim at
validating their nature. Characteristics of
network traffic (type, source, target, or other
aspects such as volume of exchanges) define
what might be considered suspicious activity.
More elaborate investigations consider the
relationships between the characteristics of
activity observed, i.e., about the type of
activity in relation to its source and/or target.
These investigations aim at addressing whether
the activity is to be expected in the context of
the network and its assets. Unfortunately,
characteristics of the cyber security domain
make processes of sensemaking (Klein, Moon
and Hoffman, 2006) particularly challenging.
These characteristics are various forms of
uncertainty associated with network activity:
Defenders cannot observe attacking actions
directly, but only through traces available
on the network.
Elementary traces of activity are often
ambiguous, i.e., the same traces can mean
different things.
Furthermore, these traces are incomplete
accounts of Red‟s behavior, especially
since Blue mostly has access to data
presented by technological systems that
filter and interpret basic network activity.
Meaningful units of attack activity are also
scattered: they are based on multiple micro-
events that require different perspectives to
be analyzed as a whole (a process labeled
correlation in the literature).
Finally, these challenges exist in an
incompletely known and evolving
environment, where knowledge is key to
understanding traces observed.
Once events are identified with sufficient
understanding and certainty, response actions
intend to correct vulnerabilities revealed by the
attacks, and ultimately at impairing their
progression. However, this phase of anomaly
response (Woods and Hollnagel, 2006, Chap.
8) is where the defense team is confronted with
the difficulty to provide security within a
production environment. Providing network
security while maintaining production is a
fundamental trade-off that defenders have to
manage. Essentially, this trade-off is about
network access. From the perspective of
providing security, characteristics of network
configuration such as open ports, weak
external firewall rules, susceptible services,
and unprotected network paths between
internal sub-networks, represent vulnerabilities
that need to be eliminated. However, from the
perspective of maintaining production, these
same network properties facilitate network-
based activities. In the context of this trade-off,
network defense is faced with difficulties to
provide evidence of attack. The production of
evidence suffers from the same early detection
(“early warning”) problems as other domains
(Woods, 2009): evidence is often ambiguous
and uncertain when sought early and
proactively; it is usually clearer after the fact,
once an adverse event has occurred and can be
traced back to earlier events. What constitutes
evidence becomes a subject of debate and
negotiation because of the impact of
corresponding measures on production goals.
Defense processes require ample knowledge
about the network configuration and assets.
They are also highly collaborative, therefore,
require knowledge and understanding of
actions conducted by other team members. For
instance, patching one machine (to reduce
vulnerability) generates temporary unusual
(i.e., suspicious) traffic while services are
restarted.
INTERPLAY No single perspective on the events can
capture both sides simultaneously. A cyber
event is nonetheless fundamentally adversarial
and therefore needs to be analyzed through the
interplay of both sides‟ decisions and actions.
The interplay plays out in the network itself,
which can be seen generically as a highly
organized medium that connects clients and
providers of services. Network activity
becomes the central frame of reference to
study actors‟ decisions and actions.
The following figures adopt a network-centric
view to contrast the attack and defense
perspectives. This contrast aims at highlighting
important similarities and differences.
Figure 2 – Attack perspective
The figures are simplified representations of
the processes the Red and Blue teams engaged
in throughout the exercise observed. For each
perspective, a number of their actions represent
sources of network activity (represented by the
arrows pointing to the types of network
activity). Also, each side operates with the
knowledge that some other actors exist on the
network, which are an important part of their
own activity but of which they have limited
knowledge or observability (these other actors
are presented by the gray elements in the
figures).
Figure 3 – Defense perspective
Cyber attackers and defenders share the same
technological environment – the network and
its applications – and, to a large extent, require
very similar competences and frames of mind.
In addition, some of the primary tasks that both
sides conduct are the same, for instance:
identifying and understanding network
vulnerabilities and vulnerable machines;
developing and maintaining sufficient
knowledge of the network; understanding
network activity. Through similar tools and
actions, both sides generate similar-looking
activity traces on the network.
Interestingly, both sides are conscious of the
other side‟s presence, and act accordingly. The
way in which they conduct their activity
integrates the threat that the other side‟s
actions represent to their own mission: the
attacking team knows the network is
monitored, and the defending team knows the
types of actions attackers are trying to
implement. However, neither side is in a
position to actually observe the other directly.
Each side therefore relies on what can be
observed or experienced on the network. Such
information allows them to infer the other‟s
behavior and take corresponding measures
(e.g., adapt their plan). Data from the exercise
show that, on both sides, inferences can be
correct or not. Inferences are in fact often
incorrect. The tendency on both sides seems to
be to interpret unexplained adverse events as
results of the adversary‟s actions.
Analysts on the defensive side monitor
network activity to detect anomalous traffic.
They are essentially concerned with being able
to distinguish between legitimate and
illegitimate activity. However, due to scale and
low observability, they have necessarily
limited and/or fragmented knowledge of the
potential sources of activity. To develop
knowledge of their network, they use both
active ways to probe the network and passive
ways to monitor the traffic. Active probing
constitutes another source of activity, one that,
in addition, appears quite similar to hacking –
a source of interaction and goal conflicts. As
attackers probe the network with similar
traffic, using similar tools in the same space, it
becomes difficult for network analysts to
efficiently sort the traces generated, and their
own activity risks providing a mask to the
traffic they want to detect. The main tactic
used during the exercise observed to recognize
suspicious exploratory traffic is the
identification of the source IP addresses. IP-
based recognition nonetheless represents a
brittle mechanism: it assumes an extensive
low-level knowledge of the network as well as
relative network stability; and it relies heavily
on human memory and string recognition for a
type of information that is probably not the
most conducive to these cognitive processes.
Defenders experience a challenging situation
of data overload where few mechanisms
efficiently support the necessary organization
of network traffic, context sensitivity, and
control of attention (i.e., focusing and
reorienting).
Attack and defense participate in a
fundamentally asymmetric relationship. From
the perspective of the defense, the main task is
to make sense of the attacking team‟s
behavior. From the perspective of the attacking
team, this is an important but secondary
objective. Attackers are indeed more focused
on understanding the unknown environment so
that they can make progress. In a sense, the
other side‟s perception underlies their activity
as well, but more as a potential for hindering
their progress. In addition, both teams are not
equal when faced with inadequate knowledge
or actions. The defending team cannot afford
to have an approximate understanding of the
situation or an inadequate response
implementation. On the other hand, the
attacking team will have numerous
opportunities to do damage. The
implementation of a pre-defined strategy is
what can be difficult for them, but their
process is more opportunistic by nature.
One of the characteristics that emerge from the
observations is the frequent delay between the
initial events and their detection. If the delays
correspond to actual latencies in the detection
process, this suggests that an attacking team
commonly has a window of opportunity during
which they can implement various actions
before being detected and disconnected. On the
other hand, attackers‟ progression is hindered
by the fact that they are „fumbling in the dark‟,
i.e., access gained to a particular machine does
not translate immediately into further access to
more sensitive data or assets. The intelligence
or compromising actions attackers are required
to perform to build further knowledge or
secure access risks uncovering them. This
means that defenders also have a window of
opportunity to act during which more signs of
attackers‟ presence become available. The
situation therefore resembles a chasing game
between the teams, attackers being most often
in a position to set the pace.
TOWARDS RESILIENT CYBER
DEFENSE
A control problem Since disturbances will occur that challenge
the way systems normally operate, it becomes
necessary to think of organizations seeking
network security as adaptive systems. Because
of the scale and high degree of functional
interdependencies (in the network
configuration or in the distribution of tasks),
such organizations are also complex systems.
As complex adaptive systems, they need to
manage trade-offs in the face of uncertainty,
complexity and production pressures.
Difficulties in the management of these trade-
offs risk exposing the organizations to the
three basic patterns identified in domains
sharing similar core characteristics (Woods
and Branlat, 2011):
They need to adapt so as to keep up with
the pace of events.
They need to adapt while managing
interdependencies and avoiding working at
cross-purposes.
They need to modify their response
strategies when these prove ineffective.
These basic patterns define high-level goals
that represent what it means to “be in control”
(Woods and Branlat, 2010) in the domain of
cyber defense. Based on the description of the
activities on the attacking and defensive sides
and of their interplay, and in order to avoid the
three basic patterns of adaptive failure, being
in control means:
anticipating how adverse events may
evolve in order to take advantage of the
window of opportunity on the defensive
side,
understanding and managing the impact of
adverse events on the system (network,
production), and
understanding and managing the impact of
the response on production goals
From the understanding of what in means to be
in control, it is possible to discuss ways to
amplify control for cyber defense. Resilience
Engineering emphasizes how resilient control
is related to adaptive capacity and its
management (Woods and Branlat, 2010,
2011). Amplifying control essentially means
transforming systems so as to help them avoid
“failure[s] to adapt or adaptations that fail”
(Dekker, 2003), i.e., situations where
adaptations are not successful, either because
systems fail to recognize the need for
adaptation, or because the adaptive processes
themselves produce undesired consequences.
This section will explore potential directions of
investigation and development to support
cyber defense in avoiding maladaptive patterns
based on principles underlying resilience
(Hollnagel, Woods and Leveson, 2006; Woods
and Branlat, 2011). These directions are
ultimately related to the general problem of
managing uncertainty.
Sensemaking, anticipation, adversarial
interplay: adapting in time In the observations conducted, the detection of
elementary and potentially suspicious traces of
activity does not seem to be the main problem,
apart from the latency mentioned above. The
bigger issues are determining what actually
happens, i.e., what it means in terms of
purposeful actions perpetrated by the attacking
team. Detecting anomalies from its dispersed
symptoms does not equate correctly adding up
elements gathered separately (Klein, Pliske,
Crandall and Woods, 2005). Each trace of
activity in itself, taken in isolation, is
ambiguous or insufficient in order to infer the
general problem. Isolated traces nonetheless
raise suspicion. A set of traces corresponds to a
pattern relevant to the domain of work and
recognized by the expert practitioner (it is
more difficult for the novice). Anomaly
detection results from the construction of
meaning through a “mental model” that builds
on initial cues, guides further actions and
evolves as more indications become available.
A computer network and its activity are prime
examples of complex phenomena that cannot
be observed from a single (all-encompassing)
perspective without risking committing
oversimplifications (Smith, Branlat, Stephens
and Woods, 2008). Multiple perspectives are
necessary to provide the diverse conceptual
views (structural and functional properties,
representations) of the multifaceted work
situation. In other words, the different roles
engaged in cyber defense are not all interested
in and focused on the same aspects of the
situation. The different perspectives on the
network represent different ways to
characterize the information that a defending
team might need to acquire during an event to
conduct their operations. Since multiple
perspectives cannot simply add to one another
and can even conflict (e.g., physical location
vs. logical organization of network assets,
highly detailed view vs. global picture), it
becomes necessary to have multiple
representations for the various ways to
visualize and seek information in the network.
These representations can then constitute a
space that needs to be organized in order to
facilitate a meaningful navigation, i.e.,
coherent transitions between perspectives
(ibid.).
Supporting cyber defense: Multiple
perspectives are required to efficiently make
sense of situations at hand. These perspectives
are defined by the type of competences
required to understand particular aspects of the
situation (e.g., network or database activity),
but also by their more tactical or strategic
focus. Supporting cyber defense then means
supporting each of these perspectives as well
as their interaction.
Anticipation is the fundamental projective
dimension of human cognition, and a
characteristic of expert behavior. For
anticipation to be accurate, it requires a
sufficient understanding of the situation at
hand. As a form of feed-forward control,
anticipation is especially critical to keep with
the pace of events, by avoiding failures to
adapt in time, before perturbations cascade. In
the adversarial context of cyber security,
anticipation means understanding and making
projections regarding what the other side is
targeting. In the context of cyber defense, this
includes:
vulnerabilities, especially described in
terms of targets and paths,
patterns of attack, in order to foresee
particular elements of network activity (and
validate the current mental model of the
situation), and
the attackers‟ intent, in order to identify
their plan and specific targets of interest.
Anticipation helps focus attention on the
elements associated with the vulnerability
path, especially for operators in charge of
monitoring network activity.
Supporting cyber defense: The notion of path,
whether a path actually taken by attackers or a
potential path based on their current access, as
well as network‟s connectivity and
vulnerabilities, constitutes an interesting
leverage point to support anticipation in cyber
defense. In order to support anticipatory
processes, cyber defenders benefit from means
to navigate the network in order to identify
paths and capture those relevant to the
situations at hand.
Cyber security is an adversarial domain, which
implies that participants are engaged in co-
adaptive processes based on their perception of
one another. Several aspects of the perceptual
processes are particularly noteworthy:
Adaptations will be based on the inferences
made, whether they are accurate or not (see
Trent, Smith, Zelik, Grossman, and Woods,
2009 in intelligence analysis).
Especially when there is uncertainty about
the other actors, their behavior will be
interpreted in the light of stereotypes
associated with the group(s) they seem to
belong to (a result long described in social
psychology through attribution theory).
The understanding constructed is transient
and dynamic, and orients expectations and
information seeking mechanisms.
The situation can be described from a defense
perspective as a control problem where a
primary goal is to avoid being outpaced by
events. If the attacking team is given ample
time without being hindered in their progress,
it will be able to conduct a variety of actions,
multiplying its opportunities to establish
connections or compromise assets. Falling
behind the curve therefore means that adverse
events can grow exponentially into a cascade
of disturbances that will spread thin defensive
efforts and resources. Observations support the
idea that the perception of the attack‟s
intention, because of its anticipatory nature, is
a central element of network defense. The
asymmetry described above between attack
and defense appears as a source of greater
challenges for the defensive team. That being
said, it might due to the fact that defense was
primarily reactive during the exercise
observed. If a team adopts a more elaborate
adversarial defense strategy, it seems likely
that the attacking team would face equivalent
challenges that may hinder its progression.
Supporting cyber defense: The attacking team
is “fumbling about in the dark”, a
characteristic of their process that impairs their
pursuit of strategic goals. An adversarial
defensive strategy could consist of actively
probing sources of suspicious activity. By
giving the attacking team the impression that
they have been detected, network defense
would put them under pressure to act, thereby
forcing them to reveal more of their presence
(or sacrifice it entirely). From the perspective
of Signal Detection Theory, this corresponds to
making weak signals standout more relative to
the „noise‟ of the environment (separate further
Noise and Signal+Noise curves), thereby
reducing uncertainty. Such a strategy takes
advantage of the understanding of typical
challenges of attack and toughens their trade-
offs. One difficulty is to ensure that such a
strategy does not compromise legitimate
network usage, including network defense
itself (e.g., create confusion from ambiguous
for the network analysts).
Impact of event: adapting in a complex
environment An important aspect of cyber defense is what
exactly a cyber event (real or perceived as
such) means in terms of threats to the mission
and requirements for the response. The
complexity with which the defensive team
needs to cope is two-fold: it exists in the
network itself, i.e., in the operational
environment, but also in the response system,
i.e., within the defensive team and organization
it belongs to.
The general progression process of the
attacking team was described previously:
access gained to a machine is a source of new
opportunities to conduct intelligence or
disruptive actions and to obtain further access.
The numerous relationships between assets on
the network, therefore, risk creating the
conditions for a cascade of disturbances. On
the other hand, these relationships are defined
by network configuration and are part of the
knowledge the defensive team possesses and
maintains of the network. They correspond to
path dependencies that can be utilized by
defense to understand, anticipate and/or hinder
the attack‟s progress.
A previous section describes how cyber
defense is distributed and how this creates
challenges for successfully coordinating
operations among the team. However, authors
note that neither guidelines, nor technology
appropriately support the highly collaborative
nature of cyber defense (Werlinger et al.,
2010). Part of the response to these challenges
traditionally lies in the expertise of
practitioners and such capacity is expected
(often implicitly) by the organizations to which
they belong. The management of functional
interdependencies is a central issue identified
by the framework of Resilience Engineering.
One of the issues highlighted by the exercise
concerns the scale of response to events, where
„response‟ is understood widely from the
detection phase to the actual response. The
central question is: what constitutes an
appropriate unit of adaptive behavior, i.e.,
what role(s) should be implicated in the
response when an event occurs? Team
members might be involved because:
they need to participate in the response
because of their particular role and form of
competence: to understand the nature of the
event (e.g., correlation) or to act upon it,
functional interdependencies require that
they participate in a coordinated response,
e.g., to devise a common plan, or
they simply need to be informed: the event
is of interest for their role; their tasks are
related to actions planned and will therefore
experience effects from the response.
Different types of events correspond to
different demands in terms of scale of
response, for which there is an appropriate
match. The appropriateness of this match can
be discussed in relation to the maladaptive
patterns identified by Woods and Branlat
(2011).
If the scale is too small, i.e., members of
the team that should have been involved do
not participate in the response:
uncoordinated parts of the system risk
working at cross-purposes, and larger
phenomena might be missed, creating risks
to adapt inappropriately or undermine the
need to adapt (risk of stale adaptive
processes).
If the scale is too large, and all members
participating in the response did not need to
be involved, resources are unnecessarily
committed and the system is slowed down
by higher costs of coordination (risk of
falling behind the tempo if new
disturbances arise).
Supporting cyber defense: The scale of
response is an important determinant of the
activity, and needs to be managed. It is an
indicator of resilient or brittle adaptive process.
Systems built around sharing all information
with every single role commit a fallacy relative
to this issue: they assume that, since people
have (technical) access to information, they
will see it and recognize that they are
concerned by an event (even when it is
occurring outside of the regular boundaries of
their role). Rather than relying on agents
directly experiencing the event, it puts the
burden of the management of scale on the
external agents and risks putting them in a
situation of data overload. A more supportive
approach would consist of highlighting
interdependencies between roles, thus making
them more visible. This would help agents
understand the impact of their actions on
functionally related roles.
Throughout the exercise, the Blue team tried to
implement changes they thought were needed
in order to respond to events they perceived.
Due to the design of the exercise, they needed
to produce „request for change‟ forms that
were transmitted, along with the evidence they
had gathered, to a White cell, which
represented the network owners. Almost
systematically, changes were not implemented
and the rejection was motivated by lack of
sufficient evidence. In addition, such responses
typically arrived after significant amounts of
time had passed. During the process, mid-level
management in the Blue team was busy
transmitting requests and responses. They
quickly experienced a workload bottleneck,
when this became their primary task. This
situation led them to abandon their roles as
supervisors, who were supposed to keep track
of the team‟s progress and difficulties. The
bottleneck illustrates issues associated with
purely hierarchical control structures:
operators more directly in contact with the
controlled process lack authority and
autonomy, and the required transmission of
information between layers of the system is
inefficient and is a source of bottlenecks.
Because of the limited window of opportunity
the defensive team has in order to act upon
detected adverse events, it is important that the
decision making process occur without delay.
Similarly to other domains related to
emergency response, cyber defense would
benefit from implementing polycentric control
architectures (Ostrom, 1999; Woods and
Branlat, 2010). This research emphasizes that
lower echelons, through their more specific set
of competences and direct contact with the
controlled process than remote managers,
develop a much finer knowledge of the process
behavior. This knowledge allows them to
detect anomalies early, thereby making them
more able to adjust their actions to meet
security or safety goals. That being said, both
purely centralized and decentralized
approaches are likely to fail (Andersson and
Ostrom, 2008); they simply fall victim to
different forms of adaptive challenges. In the
domain of cyber security, in particular,
systematically reducing vulnerabilities
identified is not a viable strategy since it
threatens other, production-related, goals. Such
decisions therefore need to result from
negotiations that confront security and
production goals.
Discussions with cyber security experts
revealed how the management of the security
vs. production trade-off can be complicated by
factors that are outside of the sole context of
the event, and even counterintuitive. In some
situations, security goals are purposefully
abandoned to meet larger objectives. For
instance, from the perspective of the CERT
(Computer Emergency Readiness Team),
situations exist where organizations maintain
open access in spite of their knowledge of on-
going attacks. When attacks are unusual,
sacrifices are made to allow more elaborate
forensics and investigations to be conducted
(e.g., by FBI or others) in order to learn from
the events (e.g., about the attackers or about an
innovative strategy). The knowledge produced
serves the longer-term security goals of a
larger community rather than an effective
response to the unique events experienced.
More commonly, the idea of systematically
patching systems in the face of threats is far
from obvious or convenient in actual
production settings. Organizations typically
use custom-designed applications to fulfill
their particular needs. And often, these tailored
applications have been developed on specific
platforms at a given time and have hardly
evolved since. Patching or updating the
underlying platforms would risk preventing the
applications from working correctly. For
service purposes, organizations knowingly
accept vulnerabilities associated with older
platforms for which fixes exist. Since the time
to actually address these vulnerabilities is
typically long (months or years are required to
develop new versions of the custom
applications), the cost of disruptions due to
improper security is perceived as smaller than
the cost associated with the disruption of
services.
Supporting cyber defense: Cyber defense
requires the implementation of a polycentric
decision architecture. Such an architecture
empowers layers of the system in direct
contact with the controlled process while more
distant layers are in charge of both monitoring
the evolution of the situation and the
coordination of operations. In particular, the
management of trade-offs between security
and production goals needs to be the result of
negotiation between these perspectives.
Because of the complex nature of these trade-
offs, negotiation processes need to be more
direct and better supported, not simply through
exchanges of information along the
management line.
CONCLUSION Domains involving adversarial dynamics can
be essentially competitive (e.g., military
operations, games like chess, or cyber security)
or occur in mixed cooperative-competitive
environments (e.g., driving, board or card
games). In all cases, activities exist in the
context of their interplay, and the interplay
cannot be understood by focusing on its
isolated parts (e.g., solely considering the
„anomaly response‟ side of cyber defense). The
interplay corresponds to continuous processes
of co-adaptation transforming the system in
ways that create both challenges and
opportunities for the adversary. The study
described here is an exploration of the domain
of cyber security from a human-centered
perspective. Research studying cyber security
as a whole is lacking. In spite its limitations
(see Branlat, 2011), the study emphasizes core
characteristics of the domain that are under-
represented in the literature. Cyber security is
adversarial, highly collaborative, and occurs in
an operational environment where it is not the
first priority, but a highly desired feature of a
larger system pursuing production goals. These
core characteristics are especially important on
the defensive side. Consideration of these
dimensions is needed in order to develop
further knowledge of the domain, and design
and conduct future studies. Overlooking or
oversimplifying them risks undermining
results obtained.
REFERENCES Andersson, K., & Ostrom, E. (2008). Analyzing
decentralized resource regimes from a
polycentric perspective. Policy Sciences, 41(1),
71-93.
Branlat, M. (2011). Challenges to Adversarial
Interplay Under High Uncertainty: Staged-
World Study of a Cyber Security Event (PhD
Dissertation). Ohio State University, Columbus,
OH.
Branlat, M., Morison, A. M., Finco, G. J., Gertman,
D. I., Le Blanc, K., & Woods, D. D. (2011). A
study of adversarial interplay in a cybersecurity
event. In S. M. Fiore & M. Harper-Sciarini
(Eds.), Proceedings of the 10th International
Conference on Naturalistic Decision Making
(NDM 2011). May 31st to June 3rd, 2011,
Orlando, FL. Orlando, FL: University of Central
Florida.
D‟Amico, A., & Whitley, K. (2008). The Real
Work of Computer Network Defense Analysts.
In VizSEC 2007: Proceedings of the Workshop
on Visualization for Computer Security.
Springer-Verlag, Sacramento, CA.
Dekker, S. (2003). Failure to adapt or adaptations
that fail: contrasting models on procedures and
safety. Applied Ergonomics, 34(3), 233-238.
Finco, G., Lee, K., Miller, G., Tebbe, J., & Wells,
R. (2007). Cyber Security Procurement
Language for Control Systems Version 1.6. INL
Critical Infrastructure Protection/Resilience
Center, Idaho Falls, USA.
Goodall, J. R., Lutters, W. G., & Komlodi, A.
(2004). I know my network. In Proceedings of
the 2004 ACM conference on Computer
supported cooperative work - CSCW '04 (p.
342). Presented at the 2004 ACM conference,
Chicago, Illinois, USA.
Hawkey, K., Muldner, K., & Beznosov, K. (2008).
Searching for the Right Fit: Balancing IT
Security Management Model Trade-Offs. IEEE
Internet Computing, 12(3), 22-30.
Hollnagel, E., Woods, D. D., & Leveson, N. (Eds.).
(2006). Resilience Engineering: Concepts and
Precepts. Adelshot, UK: Ashgate.
Jonsson, E., & Olovsson, T. (1997). A Quantitative
Model of the Security Intrusion Process Based
on Attacker Behavior. IEEE Transactions on
Software Engineering, 23, 235–245.
Klein, G., Moon, B., & Hoffman, R. R. (2006).
Making Sense of Sensemaking 2: A
Macrocognitive Model. Intelligent Systems,
IEEE, 21(5), 88-92.
Klein, G., Pliske, R., Crandall, B., & Woods, D. D.
(2005). Problem detection. Cognition,
Technology & Work, 7(1), 14-28.
Ostrom, E. (1999). Coping with Tragedies of the
Commons. Annual Reviews in Political Science,
2(1), 493-535.
Smith, M. W., Branlat, M., Stephens, R. J., &
Woods, D. D. (2008). Collaboration Support Via
Analysis of Factions. NATO RTO HFM-142
Symposium on Adaptability in Coalition
Teamwork, Copenhagen, Denmark, 21-23 April
2008.
Trent, S. A., Smith, M. W., Zelik, D., Grossman, J.,
& Woods, D. D. (2009). Reading Intent and
Other Cognitive Challenges in Intelligence
Analysis. In R. McDermott & L. Allender (Eds.),
Advanced Decision Architectures for the
Warfigher: Foundations and Technology (pp.
307-321). Partners of the Army Research
Laboratory Advanced Decision Architectures
Collaborative Technology Alliance.
Werlinger, R., Muldner, K., Hawkey, K., &
Beznosov, K. (2010). Preparation, detection, and
analysis: the diagnostic work of IT security
incident response. Information Management &
Computer Security, 18(1), 26-42.
Woods, D. D. (1993). Process-tracing methods for
the study of cognition outside of the
experimental psychology laboratory. In G. A.
Klein, J. Orasanu, R. Calderwood, & C. E.
Zsambock (Eds.), Decision making in action:
Models and methods (pp. 228-251). Norwood,
N.J.: Ablex Publishing Corporation.
Woods, D. D. (2009). Escaping failures of
foresight. Safety Science, 47(4), 498-501.
Woods, D. D., & Branlat, M. (2010). Hollnagel‟s
test: being „in control‟ of highly interdependent
multi-layered networked systems. Cognition,
Technology & Work, 12(2), 95-101.
Woods, D. D., & Branlat, M. (2011). Basic Patterns
in How Adaptive Systems Fail. In E. Hollnagel,
J. Pariès, D. D. Woods, & J. Wreathall (Eds.),
Resilience Engineering in Practice (pp. 127-
144). Farnham, UK: Ashgate.
Woods, D. D., & Hollnagel, E. (2006). Joint
Cognitive Systems: Patterns in Cognitive
Systems Engineering. Boca Raton, FL: Taylor &
Francis/CRC Press.
Matthieu Branlat is a Research Assistant at
the Ohio State University, in the Cognitive
Systems Engineering Lab. Through the study
of socio-technical work environments, his
research interests include resilience
engineering, system safety, decision making
and collaborative work. Recent projects are
conducted in domains such as cyber security
and intelligence analysis, urban firefighting
and disaster management, medical care and
patient safety.
Alexander Morison is a Research Scientist in
the Integrated Systems Engineering
Department at the Ohio State University
studying the growing challenge of coupling
human observers to remote sensor systems.
Inspired by models of human perception and
attention, he has invented solutions to the
image overload, keyhole effect, and multiple
feeds problems associated with layered sensing
systems and mobile sensor platforms.
David Woods is a professor at the Ohio State
University, and the co-director of the
Cognitive Systems Engineering Lab. From his
initial work following the Three Mile Island
accident in nuclear power, to studies of
coordination breakdowns between people and
automation in aviation accidents, to his role in
founding and developing the Resilience
Engineering field, he has studied how human
and team cognition contributes to success and
failure in complex, high risk systems.