association and centrality in criminal networks
TRANSCRIPT
Association and Centrality in Criminal Networks
Rasmus Rosenqvist PetersenUniversity of Southern Denmark
The Maersk Mc-Kinney Moeller Institute
Email: [email protected]
Abstract—Network-based techniques are widely used in crim-inal investigations because patterns of association are actionableand understandable. Existing network models with nodes as firstclass entities and their related measures (e.g., social networksand centrality measures) are unable to capture and analyze thestructural richness required to model and investigate criminalnetwork entities and their associations. We demonstrate a needto rethink entity associations with one specific case (inspiredby The Wire, a tv series about organized crime in Baltimore,United States) and corroborated by similar evidence from othercases. Our goal is to develop centrality measures for fragmentedand non-navigational states of criminal network investigations.A network model with three basic first class entities is presentedtogether with a topology of associations between network entities.We implement three of these associations and extend and test twocentrality measures using CrimeFighter Investigator, a novel toolfor criminal network investigation. Our findings show that theextended centrality measures offer new insights into criminalnetworks.
I. INTRODUCTION
Network-based techniques are widely used in crime inves-
tigations because patterns of association are actionable and
understandable. Target-centric investigation where a group
of people shares and restructures information in a common
information space in order to coordinate or reach consensus is
a special type of investigation. Criminal network information
structures are by nature emergent and evolving, and a target-
centric and iterative approach to tool support of this informa-
tion domain is therefore suitable. Existing criminal network
models with nodes as first class objects and their related
measures (e.g., social networks and centrality measures) are
unable to capture the structural richness required to model
and investigate criminal network entities and their associations.
Our target-centric model for criminal network investigation is
based on a model for intelligence analysis [1] and involves five
processes: acquisition, synthesis, sense-making, dissemination,
and cooperation (see [2] for a detailed description of the
model). All individuals in the target-centric model are stake-
holders: from information collectors (e.g., undercover agents
and automated web crawlers) over information analysts (inves-
tigators) to decision-makers (intelligence customer). We found
that a target-centric approach is best for the investigations
we have analyzed. The traditional alternative is a sequential
approach where investigative processes guide the investigation.
This sequential model is appealing to intelligence agencies and
law enforcement since the exchange of information between
individuals responsible for different processes can be con-
trolled. However, such compartmentalization has been found
to cause intelligence failures for a number of high-profile
investigations. Examples include the interrogations of the Iraqi
defector Curveball who sought asylum in Germany and the
subsequent invasion of Iraq in 2003 [3], [4], the investigation
of links between Operation Crevice and the July 7th 2005
bombings in United Kingdom [5], [6], and the investigation
into the al-Qaeda organization prior to the September 11th
2001 attacks on United States [7], [8].
In this paper, we present a criminal network model with
three first class entities (node, link, and group) that supports
emerging and evolving information structures. Based on a
study of criminal network investigations we present a topology
of entity associations that occur in these networks. We argue
that relevant entity associations are not only direct (relation-
ship) links, but could also be based on more semantic associ-
ations such as the spatial co-location of entities. Together, the
network model and the topology of associations, has guided
our development of support for dealing with the uncertainty
present in fragmented and partial networks. In this paper we
use that ability to dynamically extend two measures of entity
centrality in a network, degree and betweenness, and our
results show that our approach provide investigators with new
insights into criminal networks.
The CrimeFighter Investigator tool supports a target-centric
and iterative, approach to criminal network investigation.
CrimeFighter Investigator is part of the CrimeFighter Toolbox
for counterterrorism [9]. Besides the Investigator tool, Crime-
Fighter consists of the Explorer tool targeted at open source
collection and processing and the Assistant tool targeted at
advanced structural analysis and visualization. The remainder
of this paper is organized as follows: Section II discusses
and defines the concepts on which our work is based. First a
conceptual model defining three first class entities is presented.
Then, we review a criminal network investigation from TheWire, followed by a review of entity association and centrality.
The section is concluded with a topology of criminal network
entity associations. In Section III we describe how Crime-
Fighter Investigator supports dynamic extension of centrality
algorithms with associations from our topology. Section IV
tests and evaluates extensions of degree and betweenness
centrality measures. Section V concludes the paper.
II. ENTITY ASSOCIATION AND CENTRALITY
The building blocks of criminal networks are information
entities. Our network model (Figure 1) defines three such en-
tities, namely information elements (nodes), relations (links),
2012 European Intelligence and Security Informatics Conference
978-0-7695-4782-4/12 $26.00 © 2012 IEEE
DOI 10.1109/EISIC.2012.63
232
and composites (groups). Nodes hold information about real-
world objects. Investigators basically think in terms of people,
places, things, and their relationships. We use rectangles as
visual abstractions here for simplicity, but any symbol (circles,
triangles, etc.) could have been used to illustrate different types
of real-world objects. Links of different types and weights
can associate information entities directly. Links have two
endpoints, they can be both directed and undirected, and
they have different visual abstractions (see Figure 1, middle).
Composites are used to associate entities in sub groups. We
work with two types of composites [2]: Reference composites
are used to group entities in the common information space.
Inclusion composites can collapse and expand information to
let investigators work with subspaces. The circles in Figure 1
indicate connection points for direct association of entities.
Fig. 1. Our network model’s three first class entities: Information elements(left), relations (middle), and composites (right). Points of direct associationare indicated using circles.
Information entities are normally synthesized in a classic
nodes-and-links way before visualization. Typical network
structures that form during include hierarchical structures,
cellular structures comprised of cohesive subgroups (cliques)
connected by bridges, and flat (or fluid) structures where
individual entities are distributed in some (more or less)
random manner, maybe based on factions or their relationship
with nearby nodes, or simply because of a more desirable
visual layout.
But criminal network structures are emergent and evolving
and the networks go through many iterations after a target
is selected until the structure types mentioned above emerge.
A large organization like al-Qaeda has evolved many entity
structures. Sageman depicts al-Qaeda as four clusters with
one leadership cluster, the Central Staff. “After 1996, the
Central Staff was no longer directly involved in terrorist
operations, but the other three major clusters were connected
to their Central Staff contacts by their lieutenants in the field”
[10]. Two of the al-Qaeda clusters are comprised of several
cohesive subgroups, while the southeast Asian cluster is more
hierarchically structured, with a leader and a consultative
council at the top. When the cluster was created it was
divided into four geographical regions, and each region had
several branches. All the network information was gathered
from public domain sources: “documents and transcripts of
legal proceedings [. . . ], government documents, press and
scholarly articles, and Internet articles” [10]. The synthesis of
the elaborate list of data set attributes alone must have been
quite a tedious and time consuming task.
After 10 years of investigative journalism the Pearl Project
published a report on the kidnapping and murder of Daniel
Pearl depicting five cells responsible for various tasks, with
all cells connecting to the mastermind behind the kidnapping
[11]. However, from the account of the official investigation
we know how fragmented and inconsistent information about
the kidnappers initially was [12], and from another account
we get a vivid description of how investigations faced “the
eternal problem of any investigation into Islamist groups or Al-
Qaida in particular: the extreme difficulty of identifying, just
identifying, these masters of disguise, one of whose techniques
is to multiply names, false identities, and faces” [13]. Krebs’s
almost iconic network of 9/11 hijackers has been referenced
widely [14]. It was aggregated based on open sources, but we
don’t know the intermediate states of the network prior to the
published version. And we don’t know the exact evidence that
formed the links between the hijackers.
When investigations start, criminal network entities are
often associated in other ways than through well established
relationships to other entities. First, the entities are randomly
positioned in the information space and maybe only a few
are directly linked (e.g., the known accomplishes of the
target). Later, more entities are linked, groups are created,
and structures emerge. During the first iterations, spatial
associations like entity co-location play an important role.
A spatial association with certain semantics could be entities
placed in close proximity of each other to indicate a subgroup
in the network or snippets of information about a certain
individual. Or entities might be placed above and below each
other to indicate hierarchical importance. And it may take
many synthesis-sense-making iterations before it is clear what
attributes (node meta data) are relevant as input for sense-
making algorithms. In other words, “semantics happen” [15].
The network visualizations we see in magazines, news
papers and scientific journals and proceedings are often created
specifically for dissemination purposes. It tells very little about
the investigative efforts required to synthesize and making
sense of the respective networks. The networks therefore
convey limited information to the reader about what processes,
tasks and techniques that a tool for criminal network investi-
gation should support.
A. The Wire: investigating organized crime
The Wire is a tv series, renowned for its authentic depiction
of urban life on each side of the law1. In the first season it is
drug dealers on one side and law enforcement officers on the
other [17]. The Wire is interesting as a security informatics
case study for a number of reasons. First of all, the target-
centric, board-based approach2 chosen by the investigative
1The primary writers are David Simon and Ed Burns. Burns has workedas a Baltimore police detective for the homicide and narcotics divisions.Simon is an author and journalist who worked for the Baltimore Sun citydesk for twelve years. He authored homicide: a year on the killing streetsand co-authored the corner: a year in the life of an inner-city neighborhoodwith Burns [16]. We have previously focused on policing and investigativejournalism as two investigation types that could benefit from the concepts wedevelop and implement in CrimeFighter Investigator [2].
2We have previously described the advantages of a board-based approachfor the planning domain, where information structures are also emergent andevolving (see [18]).
233
team maps well onto our criminal network investigation model
[2]. Secondly, Analyst’s Notebook [19], a commercial software
tool for visualization and analysis of criminal networks, is
used to narrow down a list of suspects, based on a large
number of intercepted phone calls. Finally, the shows ability
to describe investigative context is exceptional. By context,
we mean factors such as power, law enforcement culture,
resources, and politics that ultimately can decide the success
or failure of investigations [20].
The Barksdale organization is a hierarchical and somewhat
flat structure that maintains a top-down chain of command (see
[16], [21]). The top consists of the leader Avon Barksdale,
his second-in-command Stringer Bell who administrates and
manages the organization, and, Avon’s sister Briana Barks-
dale, who is responsible for the financial side together with
Stringer. Maurice Levy is the organizations lawyer who offers
legal advice and acts as defense lawyer for members of the
organization. At the bottom of the organization are the drug
selling crews: typically a crew is responsible for a high-rise,
an area in the low-rises, or a street corner (so called open-air drug markets [22]). Each crew has a chief, one or more
high ranking lieutenants who control a number of dealers and
runners, responsible for arranging a buy, getting the money,
retrieving the drugs from a nearby location and handing it over
to the buyer. For communicating strategies and commands to
the crews, the leadership (primarily Stringer) has lieutenants to
enforce his commands (in season one Anton Artis and Roland
Brice work as the lieutenants), and they in turn have their
enforcers who they forward tasks to. But Stringer Bell also
shows up in person in the pit (nickname for the low rises) to
ask the crew chief to solve a specific task or follow a new
strategy.
The first season begins with narcotics lieutenant Cedric
Daniels being ordered “to organize a detail of narcotics and
homicide cops to take down Avon Barksdale’s drug crew
which runs the distribution of heroin in several of Baltimore’s
projects. Realizing that low-level buy-and-busts are getting
them nowhere3, the detail of cops [. . . ] add visual and audio
surveillance to their law enforcement tools” [20]. The team is
provided with office space in a the basement, from where they
can work the case and monitor the many wires they set up in an
attempt to map out the network of individuals in the Barksdale
organization. A senior police officer, recognizing that “all the
pieces matter” is put in charge of information collection and
processing and he starts adding snippets of information on
to the investigation board shown in Figure 2a functioning as
the team’s common information space. Figure 2b shows some
of the information entities used on the investigation board.
3“After years of random buy-and-bust interventions, law-enforcement con-trols of serious crime networks have gradually come to follow the key playerstrategy” [23]. Morselli follows up by stating that “a more accurate appraisalof the social organization of drug-trafficking [. . . ] would follow a resource-sharing model in which collaboration among resourceful individuals wouldbe at the base of coordination in such operations” [23]. We find that thisis also the approach taken by the investigators in The Wire by targeting notonly Avon Barksdale but a range of important individuals in and around thedecision-making body of the organization.
There are polaroid close-ups of individuals, and two types of
text cards: one with meta information about entities and one
functioning as headers. In the middle there is a surveillance
photo and at the bottom a newspaper clipping.
We have defined the following four information entities
used on the investigation board and use colored rectangles to
represent them in Figure 2c: portrait pictures are blue, large
surveillance photos are orange, text cards with meta data about
individuals are green, and header text cards with red text are
dark red. Based on this augmentation of the investigation board
we observe a number of semantics. Most obviously all portrait
polaroid pictures are placed below a meta data text card.
Sometimes a surveillance photo is placed next to the portraits.
Finally, the investigation board is divided horizontally into
areas by the header text cards placed at the top.
Based on The Wire and other reviewed cases4, we define
three tool requirements describing investigative needs that we
aim to support:
1) When node-link-node associations are not dominant,then semantic associations will reduce investigationuncertainty by computation of extended centrality mea-sures.
2) Centrality measures for criminal network entities, mustsupport empty endpoint associations for more accurateresults.
3) A combination of several direct and semantic associ-ations can be necessary to support when computingcentrality measures for criminal network entities.
B. Entity association
During target-centric criminal network investigations, the
investigative team adds information pieces as they are discov-
ered and step-by-step information structures emerge as entities
are associated. We have observed that initially the information
entities are placed randomly in an information space. If a new
entity is somehow associated with an entity already in the
shared information space, then it is positioned next to that
entity (co-located). Later, some co-located entities are directly
associated using link entities, because the investigators have
learned the nature of the relationship between the entities.
Depending on the level of time criticality (e.g., high security
risk), a decision has to be made at some point. When the
network is fragmented and incomplete such decision-making
can be a challenging task due to the uncertainty. Sense-making
4Several criminal network investigations have inspired our work. Theinvestigation of Daniel Pearl’s kidnapping and murder was target-centric andused large pieces of paper on a wall to synthesize information entities as theywere discovered [11]–[13]. The investigation to locate and arrest the 9/11mastermind Khalid Sheikh Mohammed (both before and after the attacks),was, by the Federal Bureau of Investigation, conducted in a target-centricmanner and always with a focus on gathering evidence both for later potentialtrials but also to map and understand the network of individuals, events, andplaces that was emerging [7]. Researchers and writers Strick van Linschotenand Kuehn have been mapping a network of Afghan Talibans to investigatetheir associations with the Afghan Arabs from 1970 to 2010 [24]. They useTinderbox for their mapping efforts [25]. Tinderbox is a software tool thattakes a board-based approach to synthesis of networks and supports multiplestructures [26].
234
(a) investigation board (b) information entities (c) augmented investigation board
Fig. 2. The Wire case - a shared information space, in this case a physical board (left), with different types of information entities (right). Close-up picturesare blue, surveillance photos are orange, text cards with meta information about individuals are green and text cards functioning as headers are dark red.
algorithms are often applied to assist investigators in making
these decisions and we discuss measures of centrality for
individual network entities below.
Information entity associations form information structures
and centralities are computed based on these associations.
Subsequently, associations impact the measures of centrality
we want to calculate. Criminal network investigation has to
a large degree so far focused on the direct association of
nodes. Links are seldom first class objects in the terrorism
domain models with the same properties as nodes. This is in
contrast to the fact that the links between the nodes provide
at least as much relevant information about the network as
the nodes themselves [27]. The nodes and links of criminal
networks are often laid out at the same level in the information
space when the network is visualized. Composites (groups)
are first class entities that add depth to the information space.
For investigative purposes navigable structures and entities
(including composites) are useful for synthesis tasks such
as manipulating, re-structuring, and grouping entities. Our
understanding of information links (relations) and groups
(composites) is based on hypertext research [2].
C. Entity centrality
Measures of centrality have been developed for different
types of networks. Most prominent are social network analysis
techniques (see [23], [28], [29]) that can measure the centrality
of entities in criminal networks based on their direct and indi-
rect associations to other entities in the network. But “although
the premise that centrality is an indication of importance,
influence, or control in a network may appear valid, it is also
contestable, particular in criminal contexts. [. . . ] What does
it mean to be central in a criminal network?” [23]. We argue
that centrality is dependent on the specific criminal network
being investigated. It depends on the associations between
entities that investigators deem important, and it depends on
the weights of those associations. Furthermore, the accuracy
of centrality measures depends on the investigator’s ability
to embed their tacit knowledge and novel associations into
centrality algorithms. We review a selection of techniques
below, which we find to be relevant for criminal network
analysis on the above mentioned premises.
An entity is central when it has many associations to other
entities in the network. This kind of centrality is measured
by the degree of the entity and is also known as local
centrality since only entities at a distance of 1 or 2 links
are included. The higher the degree, the more central the
entity. For networks with directed links, both in-degree and
out-degree centrality can be measured, meaning to the number
of incoming and outgoing links an entity has. A network with
high degrees of both is a highly cohesive network. Usually,
not all entities are connected to each other in a network.
Therefore, a path from one entity to another may go through
one or more intermediate entities. Betweenness centrality is
measured as the frequency of occurrence of an entity on the
geodesic connecting other pairs of entities. A high frequency
indicates a central entity. These entities bridge networks,
clusters, and subgroups: “betweenness centrality fleshes out
the intermediaries or the brokers within a network” [23].
Closeness, also known as global centrality, indicates
whether or not an entity has easy access to other entities in
the network. Eigenvector centrality is like a recursive version
of degree centrality where an entity is central to the extent
that the entity is connected to other entities that are central.
Specific techniques for terrorist network analysis often take the
mentioned centrality measures as input to their computations.
Examples include measures of link importance based on
secrecy and efficiency [9], the prediction of covert network
structure, missing links, and missing key players [30], and
custom-made techniques developed by investigators to target
network-specific analysis tasks, such as the node removal
technique described in [31].
D. Hypertext and semantic web technology
Hypertext systems aim at augmenting human intellect, i.e.,
increasing the capability of man to approach a complex
problem situation, to gain comprehension to suit particular
needs, and to derive solutions to problems [32]. CrimeFighter
Investigator supports a range of domain-independent hypertext
structures that are used to support synthesis of information
entities: navigational structures allow arbitrary pieces of in-
235
formation (entities) to be linked (associated, see discussion
above); spatial structures were designed to deal with emergent
and evolving structures of information which is a central
task in information analysis; taxonomic structures can support
various classification tasks.
In the context of criminal network investigation, spatial
structures are useful in various synthesis, sense-making, and
dissemination tasks such as re-structuring, brainstorming, re-
tracing the steps, creating alternative interpretations, and story-
telling. Taxonomic structures are in essence hierarchical (tree)
structures. Hierarchical structures are also known from other
structuring domains (such as composites from the associative
domain and collections from the spatial domain). In the context
of investigation, taxonomic structures can provide a different
visual (hierarchical) perspective of associative and spatial
structures hence supporting the exploring perspectives task
of sense-making. See [2] for further details on the application
of hypertext structures to criminal network investigation.
Semantic web concepts have many characteristics in com-
mon with our understanding of criminal network entities and
their associations. Similar to centrality measures for criminal
networks, semantic web concepts have been developed to
measure the centrality of entities in online social networks. We
are interested in analysis of complex systems in which nodes
could be any object, relations (links) could be of any nature,
and structures are generated by the users (investigators). Se-
mantic web technology can explicitly model the interactions
between individuals, places and things in complex systems
of information entities, but classical social network analysis
methods are typically applied to “these semantic representa-
tions without fully exploiting their rich expressiveness” [33].
A short summary of semantic web technology and a social
network analysis example is given in [34]:
Semantic web [technologies] provide a graph model,a query language and type and definition systemsto represent and exchange knowledge online. These[technologies] provide a [. . . ] way of capturingsocial networks in much richer structures than rawgraphs. Several ontologies can be used to representsocial networks. The most popular is FOAF5, usedfor describing people, their relationships and theiractivity. A large set of properties is dedicated to thedefinition of a user profile: “family name”, “nick”,“interest”, etc. The “knows” property is used toconnect people and to build a social network. [. . . ]The properties in the RELATIONSHIP6 ontologyspecialize the “knows” property of FOAF to typerelationships in a social network more precisely(familial, friendship, or professional relationships).For instance the relation “livesWith” specializes therelation “knows”.
We believe that the outlined approach can be adopted
and extended to support other association types such as the
5http://www.foaf-project.org/6http://vocab.org/relationship/
Fig. 3. “Queries that extract the degree centrality of [individuals] linked bythe property foaf:knows and its specialization relationship:worksWith” [34].
semantic associations described below.
E. Topology of associations
Based on the concepts of centrality and association, we
outline a topology of associations between criminal network
entities which impact the centrality of individual entities
with varying degree. Our topology is divided into direct and
semantic associations (see Figure 4 and 5). Direct associationsare expressed using link entities. The link may be weak
by weight (low), by type (rumor, acquaintance, one-visit-to,
etc.), or by evidence (uncorroborated, questionable news paper,
etc.), but it is nonetheless interpreted as a direct association
by sense-making algorithms and in visualizations.Semanticassociations between criminal network entities are build in-
crementally based on the tacit knowledge of investigators and
the investigation domain their target operates within. Initially,
investigators express information “via visual or textual means
and later formalize that [information] in the form of attributes,
values, types, and relations” [15].
The visual symbol for direct associations is a thick solid
line, and thin solid circles indicate entity connection points.
The visual symbol for semantic associations is a dashed line
and dashed circles indicate connection points. We realize that
some of these associations are more relevant than others, and it
is exactly this relevance of alternative associations that we are
investigating in this paper. In Figure 4a to 4c, we show three
classic associations: the node-link-node association is the most
frequently used (4a), together with the less frequently used
node-link-group (4b) and group-link-group (4c) associations.
Figure 4d to 4g shows four examples of direct associations
that occur in criminal network investigations, but are not in-
cluded when entity centrality is computed. A link could be the
target of an investigation, e.g., Daniel Pearl was investigating
whether or not there was a link between Richard Reid (the shoe
bomber) and the leader of a local radical Islamist group [12].
Other examples include knowledge about the money transfer
between two individuals or that one individual had seen them
talk at the same location on numerous occasions (Figure 4d).
The empty endpoint is another example of a direct association
that occurs in criminal network investigations, but is not (di-
rectly) addressed by traditional centrality algorithms. The need
to include empty endpoints in centrality is straightforward: if
investigators know that someone is distributing drugs to three
individuals, e.g., based on wire taps, but they don’t know who
those individuals are, then an empty endpoint can be used until
it is clear. This could be the case for both nodes and groups
(see Figure 4e and 4g). Finally, direct associations between
entities outside groups to entities inside groups are needed
236
(a) node-node (b) node-group (c) group-group (d) link-link (e) empty endpoint I (f) node-sub node (g) empty endpoint II
Fig. 4. Direct associations in our topology includes classic associations (a-c) and novel associations in terms of centrality measures (d-g).
(a) clique I (b) clique II (c) meta data (d) sequential (e) group-subgroup (f) node-subnode (g) node below
Fig. 5. Semantic associations in our topology include spatial associations (a-d) and hierarchical associations (e-g).
(both for reference and inclusion composites, see Figure 4f).
When criminal network investigators start grouping entities,
structures where entities outside the group are linked to entities
inside the group might emerge. But the relation still has
association to that entity in the subgroup.
The semantic co-location association should be used care-
fully by investigators. If the investigators position entities
near each other spatially because they are assumed to be
related somehow, then it will make sense to use spatially based
associations. But if not, then it will simply clutter the network
with non-relevant relations. If entities are placed near each
other or as overlapping entities it could mean that they are
forming a sort of clique (Figure 5a and 5b). Also, as it is the
case in the analyzed The Wire investigation board, position
entities next to or around a (centered) entity could mean that
the information entities are meta data about the centered entity
(Figure 5c). Entities positioned next to each other horizontally
or vertically, could mean that the entities represent a sequence
(Figure 5d).
Semantic hierarchical associations can occur either when
composites are used or when information entities are posi-
tioned spatially in a manner that resembles that of a hierarchy.
If a group contains single information entities and subgroups,
the single entities must have some sort of relationship to the
entities in the subgroups since their overall classification is
the same (Figure 5e). Also it could be that a single entity
is associated with a composite (group) and therefore might
have some sort of relation with entities within that composite
(Figure 5f). Finally, positioning entities in spatial hierarchies
as shown in Figure 5g indicates entities below other entities
represent sub entities.
The topology of associations can be seen as a wish list
of requirements for what an investigative tool should support
in this regard. The topology is not exhaustive; we expect
to uncover additional associations over time. Especially new
semantic associations based on temporal distance (when in-
dividuals appear on an investigation time line together with
other individuals and events etc.), distance between entities in
the real world, distance in family ties, and so on.
III. CRIMEFIGHTER INVESTIGATOR
CrimeFighter Investigator [2], [35] is based on a number of
concepts (see Figure 6). At the center is a shared information
space. Spatial hypertext research has inspired the features of
the shared information space including the support of inves-
tigation history [2]. The view concept provides investigators
with different perspectives on the information in the space
and provides alternative interaction options with information
(hierarchical view to the left (top); satellite view to the left
(bottom); spatial view at the center; algorithm output view
to the right). Finally, a structural parser assists the investiga-
tors by relating otherwise unrelated information in different
ways, either based on the entities themselves or by applying
algorithms to analyze them (see the algorithm output view to
the right). In the following, central CrimeFighter Investigator
features supporting measures of centrality are presented.
A. Extending centrality algorithms with new associations
The classic centrality algorithms have been extended by
adding some analysis prior to the existing steps. Our imple-
mented betweenness algorithm (described in [31]) with the
extra step for the selected centrality extension(s) works as
follows:
1) Pre-analysis; In this step the algorithm analyzes
whether or not the included association types appear
in the criminal network. If they do then changes are
temporarily made to the network accordingly.
2) List all entity pairs; This step creates a list of all
entity pairs that exists in the network, again based on
the included associations. This means that if the direct
node-group association is included, then all entities
that are directly or indirectly (by association through
intermediary entities) associated to the group with links
are added to the list of entity pairs.
3) List all shortest path(s) for each entity pair; We
calculate the shortest path(s) for all entity pairs without
considering the cost-efficiency of our algorithm: we take
a breadth first, brute-force approach [36], visiting all
nodes at depth d before visiting nodes at depth d + 1,
removing all loops and all paths to the destination node
237
Fig. 6. CrimeFighter Investigator showing an altered version of the investigation board from The Wire.
longer than the shortest path(s) in the set, until only the
shortest path(s) remain.
4) Node occurrence; We calculate the ratio by which each
node in the network appear in the accumulated set of
shortest path(s).
5) Bubble sort; The results are sorted according to the
user’s choice, usually descending with the highest cen-
trality first.
6) Generate report; If the user requests it, a pdf report
is generated for easy dissemination of the results of
the centrality measure. The user can decide what report
elements to include.
Pre-analysis is the algorithm step of primary interest to the
work presented here. For the direct empty endpoint association,
pre-analysis involves adding temporary information elements
as placeholders of empty endpoints. For the semantic co-location association, we create a temporary relation between
two entities if they are not already related and they are within
the user-defined boundaries of each other (see Figure 7).
B. Customizing sense-making and sense-making algorithms
CrimeFighter Investigator algorithms are managed using
a structural parser, where investigators can select different
algorithms to run and control the order in which they are
executed, for example either simultaneously or sequentially.
Figure 8 (left) shows how individual centrality algorithms
can be customized by the user. The user must decide how
to run an algorithm (Figure 8a) and what entities to include
for the respective centrality algorithm (Figure 8b). This is
done using drag and drop between two defined areas as
shown in Figure 8 (right, top frame). For included entities
the user can set a weight (maybe a location counts less than
a person for a measure of betweenness centrality) and for
excluded entities the user how the algorithm should deal with
it, e.g., when tracing a shortest path. Should it not include the
shortest path or simply ignore this entity and continue along
the path? Direct and semantic associations are included
or excluded using the same drag and drop approach as for
(a) without (b) with (c) without (d) with
Fig. 7. The two implemented algorithm extensions, the empty endpointassociation and the co-location association are explained. Without the emptyendpoint association, the link from the empty endpoint to the connected entityis not included in measures of betweenness centrality and degree centrality isnot calculated for the empty endpoint (a) and with that association the linkis included (b). Without the co-location association entities positioned neareach other in the information space are not included in measures of centrality(c), but if entities fall within the boundaries defined by the investigators andthe association is included, then those entities are included in measures ofcentrality (d).
238
entities (see Figure 8c and 8d). Again, weights can be setup for
included associations and the algorithms action(s) for excluded
associations. Finally, we imagine many settings for how to
format and list results (Figure 8e). Typically, normalization
is important for comparison of results. If an investigation has
many of the included entities it can be useful only to display
for example 10 results based on some parameter, e.g., highest
centrality.
Fig. 8. Setting up centrality algorithms using structural parser windows: thecentrality algorithm settings window is shown on the left, and the window forinclusion and exclusion of entities together with specific settings for each ofthose entities is shown on the right.
It is currently possible to set the visual symbols for the
information space and the algorithm view (see Figure 8f). For
the information space the user can decide whether or not
to overlay entities with a geometric shape (circle, square, or
rectangle) containing the calculated centrality (instead of just
showing the results in the algorithm view). The color, size
and outline of the shape can be decided together with the font
and font size of the printed centrality. For the algorithm viewit can be decided how to display the results textually in a
list. Maybe a certain attribute should be printed (e.g., person
’name’ or email ’date’). And the font (type, size and color)
can be set.
IV. EVALUATION
We have tested CrimeFighter Investigator’s support of three
tool requirements on a filtered version of the investigation from
The Wire and a semi-altered version of the same investigation.
We calculate two centrality measures, degree and between-
ness, for two conditions, with and without two designed and
implemented associations.
We test the co-location association on an investigation
inspired by The Wire to evaluate the requirement for support
of semantic associations. The investigation had no direct
associations between entities prior to the test. We have filtered
out all entities except the close-up photos (i.e., the blue
rectangles) and created an investigation using CrimeFighter
Investigator where individuals are positioned with the same
relative distance. All individuals are given numbers or letters
as name, except for the two lieutenants Anton Artis (A.A.)
and Roland Brice (R.B.). The network with the semantic co-
location association included is shown in Figure 9a and the
calculated centralities are shown in Figure 9b.
Prior to testing the empty endpoint association we found
that empty endpoints rarely occurred in the investigation we
analyzed. Links are used to connect two entities, and even if
the contents of one entity is unknown it is still created as a
placeholder. It is unclear whether this is simply because it does
not make sense to work with empty endpoints or if it is because
of a structural bias toward links as simple entity connectors.
To test the influence of the empty endpoint association we
have used some of the links from the previous test to create
a new test case (see Figure 6). We assume that a number of
subgroups have been detected (the four colored composites)
and that the investigators know there is some connection from
the main network to each of these subgroups but it is unclear
how and therefore an empty endpoint is positioned next to
each subgroup.
To test the requirement for centrality measures to consider
multiple associations, we use the same network as for the
empty endpoint requirement (see Figure 6). However, this
time we test both the empty endpoint association and the
co-location association together. The with condition therefore
means that the algorithm replaces empty endpoints with actual
nodes (placeholders) and creates links between co-located
nodes that are not already directly associated.
A. Discussion and summary of results
Testing the requirement for semantic associations illustrated
how centrality measures can be applied to spatial network
structures using a co-location association. It is evident that
when no relations exist in an investigation prior to analysis,
there is a need to define associations between entities in a dif-
ferent way if the investigators need to calculate node centrality
to deal with the uncertainty of an ongoing investigation. We
see that degree centrality indicates the individuals on the right
hand side in Figure 9b as central to the network (e.g., 9, 6, 8,
and 10), but they are of little importance. At the same time
degree doesn’t point to the two lieutenants A.A. or R.B. as key
players like we expected. We therefore find that one should
be careful with considering spatial co-location as a measure
for network degree centrality. Betweenness centrality clearly
points to A.A. and R.B. as key players in the network together
with individual 2. Given the results of our two other tests it is
also interesting that individual 5 is placed in top four in terms
of betweenness.
When we tested the empty endpoints requirement we found
that the measure of degree centrality provides investigators
with no clear tendencies, although it more strongly indicates
individual F, D, A.A., and 3 as central to the network.
The betweenness results more distinctly point to A.A. and
239
(a) test scenario 1 (b) colocation results
(c) empty endpoint results (d) two associations results
Fig. 9. The Wire investigation with links representing colocation associations (a). The degree and betweenness centralities for each of three tests: colocationassociation (b), empty endpoints association (c), and both colocation and empty endpoints associations (d).
R.B. when including the empty-endpoints association. We
also observe that individual 2 is ranked as fourth instead of
seventh which is a more realistic depiction of this individual’s
betweenness in the network. Individual 5 has the highest
change in betweenness when including empty endpoints,
making him an interesting subject for further investigation.
As mentioned earlier, it would be possible to model empty
endpoints using information element placeholders until the
content of the empty endpoint is known. This also means that
traditional social network analysis measures of centrality could
be applied. We therefore recommend to test if empty endpoints
have higher value for restructuring tasks during synthesis than
for centrality algorithms.
Our test of the requirement for support of multiple asso-
ciations was successful in terms of extending two measures
of centrality with more than one association from our topol-
ogy. But for the test investigation the test results did not
add much investigative value. The inclusion of both emptyendpoint and co-location associations connects all entities
in the criminal network through the empty endpoints (indi-
vidual 5 is connected to individual 6 and 12, individual F
to individual H, and individual A.A. to individual M). This
makes the degree and betweenness centrality of key nodes
without the associations less distinctive. The numbers are
flattened because the information elements in the subgroups
achieve higher measures of betweenness centrality with the
associations included. The most interesting result for this
final test was that the degree and betweenness centrality of
individual 5 is increased considerably when the associations
are added. Together, our three requirement tests have shown
that measures of centrality extended with novel types of
associations provided new insights into two organized crime
networks that traditional centrality measures could not provide.
Most important result was that the centrality of individual 5
was increased in all three tests. Individual 5 was not known
to be a central entity in the network before the tests.
V. CONCLUSION
We have presented two novel sense-making algorithms
based on new interpretations of information entity association
and centrality. The algorithms are extensions of classic social
network analysis algorithms where the user can include and
240
exclude specific entities and associations for analysis to match
it with the structures they have build when investigating a
criminal network. More specifically , this paper has three main
contributions:
1) A novel network model with nodes, links, and groups
as first class entities.
2) A topology of direct and semantic network entity associ-
ations based on an analysis of various criminal network
investigations following a target-centric approach.
3) An implementation that supports three of these associa-
tions: the traditional node-link-node association and the
novel empty endpoint association and the semantic co-
location association. Both associations have been tested
on a criminal network investigation from The Wire and
an altered version of that same investigation.
We can conclude that target-centric criminal network in-
vestigation creates structures not clear from the beginning of
an investigation and in order to apply traditional centrality
measures, associations other than node-link-node have to be
supported. We plan to implement support of the other asso-
ciations in our topology in the near future. We would like
to test them on real-world investigations (either post-crime or
ongoing) to learn if and how they could provide useful insights
into the investigated criminal networks.
As an alternative to manually applying a specific radius or
geometric shape to decide co-location association, it could be
interesting to apply a standard machine learning algorithm
that suggests co-location, not in terms of position on the
investigation board, but based on temporal distance, physical
distance in the real world, or distance in family ties.
REFERENCES
[1] R. Clark, Intelligence analysis: a target-centric approach. CQ Press,2007.
[2] R. R. Petersen and U. K. Wiil, “Hypertext structures for investigativeteams,” in proceedings of the 22nd ACM conference on hypertext. ACMPress, 2011, pp. 123–132.
[3] B. Drogin, Curveball. Ebury Press, 2008.[4] T. Weiner, Legacy of Ashes: The History of the CIA. Anchor Books,
2008.[5] Could 7/7 have been prevented? Review of the intelligence on the
London terrorist attacks on 7 July 2005, Intelligence and SecurityCommittee, United Kingdom, 2009.
[6] R. R. Petersen, “Presentation of crimefighter investigator.” British HomeOffice, London, United Kingdom: Presented and demonstrated work onprediction of covert network structure and missing links to a group ofBritish intelligence analysts, March 2011.
[7] T. McDermott and J. Meyer, The Hunt for KSM - Inside the Pursuitand Takedown of the Real 9/11 Mastermind, Khalid Sheikh Mohammad.Little, Brown and Company, 2012.
[8] The 9/11 Commission Report (Executive Summary), Nationalcommission on terrorist attacks upon the United States, UnitedStates, 2004. [Online]. Available: http://www.9-11commission.gov/report/911Report Exec.pdf.
[9] U. K. Wiil, N. Memon, and J. Gniadek, “Crimefighter: A toolbox forcounterterrorism,” Lecture notes in communications in computer andinformation science (Knowledge discovery, knowledge engineering andknowledge management), vol. 128, pp. 337–350, 2011.
[10] M. Sageman, Understanding terrorist networks. Philadelphia, Pensyl-vania: University of Pennsylvania Press (PENN), 2004.
[11] B. F. Todd and A. Nomani, The truth left behind: inside the kidnappingand murder of Daniel Pearl, 2011.
[12] M. Pearl, A mighty heart. Virago Press, 2004.
[13] B. H. Levy, Who killed Daniel Pearl? Melville House Publishing, 2003.[14] V. Krebs, “Mapping networks of terrorist cells,” CONNECTIONS,
vol. 24, no. 3, pp. 43–52, 2002.[15] F. Shipman, J. M. Moore, P. Maloor, H. Hsieh, and R. Akkapeddi, “Se-
mantics happen: knowledge building in spatial hypertext,” in Proceed-ings of the thirteenth ACM conference on Hypertext and hypermedia,ser. HYPERTEXT ’02. ACM, 2002, pp. 25–34.
[16] R. Alvarez and D. Simon, The Wire: Truth Be Told. Pocket Books,2004.
[17] R. Penfold-Mounce, D. Beer, and R. Burrows, “The wire as socialscience-fiction?” Sociology, vol. 45, no. 1, pp. 152–167, Feb. 2011.
[18] R. R. Petersen and U. K. Wiil, “Analysis of emergent and evolvinginformation: the agile planning case,” in Software and data technologies,ser. Communications in computer and information science, J. Cordeiro,A. Ranchordas, and B. Shishkov, Eds. Springer Berlin Heidelberg,2011, vol. 50, pp. 263–276.
[19] (2012) Ibm i2 analyst’s notebook. [Online].Available: http://www.i2group.com/us/products/analysis-product-line/ibm-i2-analysts-notebook
[20] B. Capers, “Crime, legimaticy, our criminal network, and the wire,” Ohiostate journal of criminal law, vol. 8, pp. 459–471, 2011.
[21] D. Simon and E. Burns, “The wire (the complete first season),” 2002.[22] T. A. Taniguchi, J. H. Ratcliffe, and R. B. Taylor, “Gang set space, drug
markets, and crime around drug corners in camden,” Journal of researchin crime and delinquency, vol. 48, pp. 327–363, 2011.
[23] C. Morselli, “The criminal network perspective,” in Inside criminalnetworks, ser. Studies of organized crime. Springer New York, 2009,vol. 8, pp. 1–21.
[24] A. S. Linschoten and F. Kuehn, An enemy we created: the myth of theTaliban/Al-Qaeda merger in Afghanistan, 1970-2010. Hurst, 2012.
[25] R. R. Petersen, “Interview with alex strick van linschoten.” TrafalgarSquare, London, United Kingdom: A discussion of CrimeFighter Inves-tigator, Tinderbox, Gephi, Analyst’s Notebook in relation to Alex’s workwith mapping the temporal evolution of Afghan Taliban., March 2011.
[26] M. Bernstein, The Tinderbox way. Eastgate Systems, 2006.[27] P. A. Gloor and Y. Zhao, “Analyzing actors and their discussion topics
by semantic social network analysis,” in Proceedings of informationvisualization, 2006, pp. 130–135.
[28] J. Scott, Social network analysis, a handbook (second edition). Sage,2000.
[29] L. R. Irons, “Recent patterns of terrorism prevention in the unitedkingdom,” homeland security affairs, vol. 4, 2008.
[30] C. J. Rhodes and P. Jones, “Inferring missing links in partially observedsocial networks,” Journal of the operational research society, vol. 60,no. 10, pp. 1373–1383, 2009.
[31] R. R. Petersen, C. J. Rhodes, and U. K. Wiil, “Node removal incriminal networks,” in Proceedings of european intelligence and securityinformatics conference. IEEE, 2011, pp. 360–365.
[32] D. C. Engelbart, “A conceptual framework for the augmentation of man’sintellect,” in Computer-supported cooperative work. Kaufmann, 1988,pp. 35–65.
[33] G. Ereteo, F. Limpens, F. Gandon, L., O. Corby, M. Buffa, M. Leitzel-man, and P. Sander, “Semantic social network analysis: a concrete case,”in Handbook of Research on Methods and Techniques for StudyingVirtual Communities: Paradigms and Phenomena. IGI Global, 2011,pp. 122–156.
[34] G. Ereteo, M. Buffa, F. Gandon, P. Grohan, M. Leitzelman, andP. Sander, “A state of the art on social network analysis and itsapplications on a semantic web,” 2008.
[35] R. R. Petersen and U. K. Wiil, “Crimefighter investigator: a noveltool for criminal network investigation,” in Proceedings of europeanintelligence and security informatics conference. IEEE, 2011, pp. 360–365.
[36] M. Sipser, Introduction to the theory of computation. PWS PublishingCompany, 1997.
241