association and centrality in criminal networks

Association and Centrality in Criminal Networks

Rasmus Rosenqvist PetersenUniversity of Southern Denmark

The Maersk Mc-Kinney Moeller Institute

Email: [email protected]

Abstract—Network-based techniques are widely used in crim-inal investigations because patterns of association are actionableand understandable. Existing network models with nodes as firstclass entities and their related measures (e.g., social networksand centrality measures) are unable to capture and analyze thestructural richness required to model and investigate criminalnetwork entities and their associations. We demonstrate a needto rethink entity associations with one specific case (inspiredby The Wire, a tv series about organized crime in Baltimore,United States) and corroborated by similar evidence from othercases. Our goal is to develop centrality measures for fragmentedand non-navigational states of criminal network investigations.A network model with three basic first class entities is presentedtogether with a topology of associations between network entities.We implement three of these associations and extend and test twocentrality measures using CrimeFighter Investigator, a novel toolfor criminal network investigation. Our findings show that theextended centrality measures offer new insights into criminalnetworks.

I. INTRODUCTION

Network-based techniques are widely used in crime inves-

tigations because patterns of association are actionable and

understandable. Target-centric investigation where a group

of people shares and restructures information in a common

information space in order to coordinate or reach consensus is

a special type of investigation. Criminal network information

structures are by nature emergent and evolving, and a target-

centric and iterative approach to tool support of this informa-

tion domain is therefore suitable. Existing criminal network

models with nodes as first class objects and their related

measures (e.g., social networks and centrality measures) are

unable to capture the structural richness required to model

and investigate criminal network entities and their associations.

Our target-centric model for criminal network investigation is

based on a model for intelligence analysis [1] and involves five

processes: acquisition, synthesis, sense-making, dissemination,

and cooperation (see [2] for a detailed description of the

model). All individuals in the target-centric model are stake-

holders: from information collectors (e.g., undercover agents

and automated web crawlers) over information analysts (inves-

tigators) to decision-makers (intelligence customer). We found

that a target-centric approach is best for the investigations

we have analyzed. The traditional alternative is a sequential

approach where investigative processes guide the investigation.

This sequential model is appealing to intelligence agencies and

law enforcement since the exchange of information between

individuals responsible for different processes can be con-

trolled. However, such compartmentalization has been found

to cause intelligence failures for a number of high-profile

investigations. Examples include the interrogations of the Iraqi

defector Curveball who sought asylum in Germany and the

subsequent invasion of Iraq in 2003 [3], [4], the investigation

of links between Operation Crevice and the July 7th 2005

bombings in United Kingdom [5], [6], and the investigation

into the al-Qaeda organization prior to the September 11th

2001 attacks on United States [7], [8].

In this paper, we present a criminal network model with

three first class entities (node, link, and group) that supports

emerging and evolving information structures. Based on a

study of criminal network investigations we present a topology

of entity associations that occur in these networks. We argue

that relevant entity associations are not only direct (relation-

ship) links, but could also be based on more semantic associ-

ations such as the spatial co-location of entities. Together, the

network model and the topology of associations, has guided

our development of support for dealing with the uncertainty

present in fragmented and partial networks. In this paper we

use that ability to dynamically extend two measures of entity

centrality in a network, degree and betweenness, and our

results show that our approach provide investigators with new

insights into criminal networks.

The CrimeFighter Investigator tool supports a target-centric

and iterative, approach to criminal network investigation.

CrimeFighter Investigator is part of the CrimeFighter Toolbox

for counterterrorism [9]. Besides the Investigator tool, Crime-

Fighter consists of the Explorer tool targeted at open source

collection and processing and the Assistant tool targeted at

advanced structural analysis and visualization. The remainder

of this paper is organized as follows: Section II discusses

and defines the concepts on which our work is based. First a

conceptual model defining three first class entities is presented.

Then, we review a criminal network investigation from TheWire, followed by a review of entity association and centrality.

The section is concluded with a topology of criminal network

entity associations. In Section III we describe how Crime-

Fighter Investigator supports dynamic extension of centrality

algorithms with associations from our topology. Section IV

tests and evaluates extensions of degree and betweenness

centrality measures. Section V concludes the paper.

II. ENTITY ASSOCIATION AND CENTRALITY

The building blocks of criminal networks are information

entities. Our network model (Figure 1) defines three such en-

tities, namely information elements (nodes), relations (links),

2012 European Intelligence and Security Informatics Conference

978-0-7695-4782-4/12 $26.00 © 2012 IEEE

DOI 10.1109/EISIC.2012.63

232

and composites (groups). Nodes hold information about real-

world objects. Investigators basically think in terms of people,

places, things, and their relationships. We use rectangles as

visual abstractions here for simplicity, but any symbol (circles,

triangles, etc.) could have been used to illustrate different types

of real-world objects. Links of different types and weights

can associate information entities directly. Links have two

endpoints, they can be both directed and undirected, and

they have different visual abstractions (see Figure 1, middle).

Composites are used to associate entities in sub groups. We

work with two types of composites [2]: Reference composites

are used to group entities in the common information space.

Inclusion composites can collapse and expand information to

let investigators work with subspaces. The circles in Figure 1

indicate connection points for direct association of entities.

Fig. 1. Our network model’s three first class entities: Information elements(left), relations (middle), and composites (right). Points of direct associationare indicated using circles.

Information entities are normally synthesized in a classic

nodes-and-links way before visualization. Typical network

structures that form during include hierarchical structures,

cellular structures comprised of cohesive subgroups (cliques)

connected by bridges, and flat (or fluid) structures where

individual entities are distributed in some (more or less)

random manner, maybe based on factions or their relationship

with nearby nodes, or simply because of a more desirable

visual layout.

But criminal network structures are emergent and evolving

and the networks go through many iterations after a target

is selected until the structure types mentioned above emerge.

A large organization like al-Qaeda has evolved many entity

structures. Sageman depicts al-Qaeda as four clusters with

one leadership cluster, the Central Staff. “After 1996, the

Central Staff was no longer directly involved in terrorist

operations, but the other three major clusters were connected

to their Central Staff contacts by their lieutenants in the field”

[10]. Two of the al-Qaeda clusters are comprised of several

cohesive subgroups, while the southeast Asian cluster is more

hierarchically structured, with a leader and a consultative

council at the top. When the cluster was created it was

divided into four geographical regions, and each region had

several branches. All the network information was gathered

from public domain sources: “documents and transcripts of

legal proceedings [. . . ], government documents, press and

scholarly articles, and Internet articles” [10]. The synthesis of

the elaborate list of data set attributes alone must have been

quite a tedious and time consuming task.

After 10 years of investigative journalism the Pearl Project

published a report on the kidnapping and murder of Daniel

Pearl depicting five cells responsible for various tasks, with

all cells connecting to the mastermind behind the kidnapping

[11]. However, from the account of the official investigation

we know how fragmented and inconsistent information about

the kidnappers initially was [12], and from another account

we get a vivid description of how investigations faced “the

eternal problem of any investigation into Islamist groups or Al-

Qaida in particular: the extreme difficulty of identifying, just

identifying, these masters of disguise, one of whose techniques

is to multiply names, false identities, and faces” [13]. Krebs’s

almost iconic network of 9/11 hijackers has been referenced

widely [14]. It was aggregated based on open sources, but we

don’t know the intermediate states of the network prior to the

published version. And we don’t know the exact evidence that

formed the links between the hijackers.

When investigations start, criminal network entities are

often associated in other ways than through well established

relationships to other entities. First, the entities are randomly

positioned in the information space and maybe only a few

are directly linked (e.g., the known accomplishes of the

target). Later, more entities are linked, groups are created,

and structures emerge. During the first iterations, spatial

associations like entity co-location play an important role.

A spatial association with certain semantics could be entities

placed in close proximity of each other to indicate a subgroup

in the network or snippets of information about a certain

individual. Or entities might be placed above and below each

other to indicate hierarchical importance. And it may take

many synthesis-sense-making iterations before it is clear what

attributes (node meta data) are relevant as input for sense-

making algorithms. In other words, “semantics happen” [15].

The network visualizations we see in magazines, news

papers and scientific journals and proceedings are often created

specifically for dissemination purposes. It tells very little about

the investigative efforts required to synthesize and making

sense of the respective networks. The networks therefore

convey limited information to the reader about what processes,

tasks and techniques that a tool for criminal network investi-

gation should support.

A. The Wire: investigating organized crime

The Wire is a tv series, renowned for its authentic depiction

of urban life on each side of the law1. In the first season it is

drug dealers on one side and law enforcement officers on the

other [17]. The Wire is interesting as a security informatics

case study for a number of reasons. First of all, the target-

centric, board-based approach2 chosen by the investigative

1The primary writers are David Simon and Ed Burns. Burns has workedas a Baltimore police detective for the homicide and narcotics divisions.Simon is an author and journalist who worked for the Baltimore Sun citydesk for twelve years. He authored homicide: a year on the killing streetsand co-authored the corner: a year in the life of an inner-city neighborhoodwith Burns [16]. We have previously focused on policing and investigativejournalism as two investigation types that could benefit from the concepts wedevelop and implement in CrimeFighter Investigator [2].

2We have previously described the advantages of a board-based approachfor the planning domain, where information structures are also emergent andevolving (see [18]).

233

team maps well onto our criminal network investigation model

[2]. Secondly, Analyst’s Notebook [19], a commercial software

tool for visualization and analysis of criminal networks, is

used to narrow down a list of suspects, based on a large

number of intercepted phone calls. Finally, the shows ability

to describe investigative context is exceptional. By context,

we mean factors such as power, law enforcement culture,

resources, and politics that ultimately can decide the success

or failure of investigations [20].

The Barksdale organization is a hierarchical and somewhat

flat structure that maintains a top-down chain of command (see

[16], [21]). The top consists of the leader Avon Barksdale,

his second-in-command Stringer Bell who administrates and

manages the organization, and, Avon’s sister Briana Barks-

dale, who is responsible for the financial side together with

Stringer. Maurice Levy is the organizations lawyer who offers

legal advice and acts as defense lawyer for members of the

organization. At the bottom of the organization are the drug

selling crews: typically a crew is responsible for a high-rise,

an area in the low-rises, or a street corner (so called open-air drug markets [22]). Each crew has a chief, one or more

high ranking lieutenants who control a number of dealers and

runners, responsible for arranging a buy, getting the money,

retrieving the drugs from a nearby location and handing it over

to the buyer. For communicating strategies and commands to

the crews, the leadership (primarily Stringer) has lieutenants to

enforce his commands (in season one Anton Artis and Roland

Brice work as the lieutenants), and they in turn have their

enforcers who they forward tasks to. But Stringer Bell also

shows up in person in the pit (nickname for the low rises) to

ask the crew chief to solve a specific task or follow a new

strategy.

The first season begins with narcotics lieutenant Cedric

Daniels being ordered “to organize a detail of narcotics and

homicide cops to take down Avon Barksdale’s drug crew

which runs the distribution of heroin in several of Baltimore’s

projects. Realizing that low-level buy-and-busts are getting

them nowhere3, the detail of cops [. . . ] add visual and audio

surveillance to their law enforcement tools” [20]. The team is

provided with office space in a the basement, from where they

can work the case and monitor the many wires they set up in an

attempt to map out the network of individuals in the Barksdale

organization. A senior police officer, recognizing that “all the

pieces matter” is put in charge of information collection and

processing and he starts adding snippets of information on

to the investigation board shown in Figure 2a functioning as

the team’s common information space. Figure 2b shows some

of the information entities used on the investigation board.

3“After years of random buy-and-bust interventions, law-enforcement con-trols of serious crime networks have gradually come to follow the key playerstrategy” [23]. Morselli follows up by stating that “a more accurate appraisalof the social organization of drug-trafficking [. . . ] would follow a resource-sharing model in which collaboration among resourceful individuals wouldbe at the base of coordination in such operations” [23]. We find that thisis also the approach taken by the investigators in The Wire by targeting notonly Avon Barksdale but a range of important individuals in and around thedecision-making body of the organization.

There are polaroid close-ups of individuals, and two types of

text cards: one with meta information about entities and one

functioning as headers. In the middle there is a surveillance

photo and at the bottom a newspaper clipping.

We have defined the following four information entities

used on the investigation board and use colored rectangles to

represent them in Figure 2c: portrait pictures are blue, large

surveillance photos are orange, text cards with meta data about

individuals are green, and header text cards with red text are

dark red. Based on this augmentation of the investigation board

we observe a number of semantics. Most obviously all portrait

polaroid pictures are placed below a meta data text card.

Sometimes a surveillance photo is placed next to the portraits.

Finally, the investigation board is divided horizontally into

areas by the header text cards placed at the top.

Based on The Wire and other reviewed cases4, we define

three tool requirements describing investigative needs that we

aim to support:

1) When node-link-node associations are not dominant,then semantic associations will reduce investigationuncertainty by computation of extended centrality mea-sures.

2) Centrality measures for criminal network entities, mustsupport empty endpoint associations for more accurateresults.

3) A combination of several direct and semantic associ-ations can be necessary to support when computingcentrality measures for criminal network entities.

B. Entity association

During target-centric criminal network investigations, the

investigative team adds information pieces as they are discov-

ered and step-by-step information structures emerge as entities

are associated. We have observed that initially the information

entities are placed randomly in an information space. If a new

entity is somehow associated with an entity already in the

shared information space, then it is positioned next to that

entity (co-located). Later, some co-located entities are directly

associated using link entities, because the investigators have

learned the nature of the relationship between the entities.

Depending on the level of time criticality (e.g., high security

risk), a decision has to be made at some point. When the

network is fragmented and incomplete such decision-making

can be a challenging task due to the uncertainty. Sense-making

4Several criminal network investigations have inspired our work. Theinvestigation of Daniel Pearl’s kidnapping and murder was target-centric andused large pieces of paper on a wall to synthesize information entities as theywere discovered [11]–[13]. The investigation to locate and arrest the 9/11mastermind Khalid Sheikh Mohammed (both before and after the attacks),was, by the Federal Bureau of Investigation, conducted in a target-centricmanner and always with a focus on gathering evidence both for later potentialtrials but also to map and understand the network of individuals, events, andplaces that was emerging [7]. Researchers and writers Strick van Linschotenand Kuehn have been mapping a network of Afghan Talibans to investigatetheir associations with the Afghan Arabs from 1970 to 2010 [24]. They useTinderbox for their mapping efforts [25]. Tinderbox is a software tool thattakes a board-based approach to synthesis of networks and supports multiplestructures [26].

234

(a) investigation board (b) information entities (c) augmented investigation board

Fig. 2. The Wire case - a shared information space, in this case a physical board (left), with different types of information entities (right). Close-up picturesare blue, surveillance photos are orange, text cards with meta information about individuals are green and text cards functioning as headers are dark red.

algorithms are often applied to assist investigators in making

these decisions and we discuss measures of centrality for

individual network entities below.

Information entity associations form information structures

and centralities are computed based on these associations.

Subsequently, associations impact the measures of centrality

we want to calculate. Criminal network investigation has to

a large degree so far focused on the direct association of

nodes. Links are seldom first class objects in the terrorism

domain models with the same properties as nodes. This is in

contrast to the fact that the links between the nodes provide

at least as much relevant information about the network as

the nodes themselves [27]. The nodes and links of criminal

networks are often laid out at the same level in the information

space when the network is visualized. Composites (groups)

are first class entities that add depth to the information space.

For investigative purposes navigable structures and entities

(including composites) are useful for synthesis tasks such

as manipulating, re-structuring, and grouping entities. Our

understanding of information links (relations) and groups

(composites) is based on hypertext research [2].

C. Entity centrality

Measures of centrality have been developed for different

types of networks. Most prominent are social network analysis

techniques (see [23], [28], [29]) that can measure the centrality

of entities in criminal networks based on their direct and indi-

rect associations to other entities in the network. But “although

the premise that centrality is an indication of importance,

influence, or control in a network may appear valid, it is also

contestable, particular in criminal contexts. [. . . ] What does

it mean to be central in a criminal network?” [23]. We argue

that centrality is dependent on the specific criminal network

being investigated. It depends on the associations between

entities that investigators deem important, and it depends on

the weights of those associations. Furthermore, the accuracy

of centrality measures depends on the investigator’s ability

to embed their tacit knowledge and novel associations into

centrality algorithms. We review a selection of techniques

below, which we find to be relevant for criminal network

analysis on the above mentioned premises.

An entity is central when it has many associations to other

entities in the network. This kind of centrality is measured

by the degree of the entity and is also known as local

centrality since only entities at a distance of 1 or 2 links

are included. The higher the degree, the more central the

entity. For networks with directed links, both in-degree and

out-degree centrality can be measured, meaning to the number

of incoming and outgoing links an entity has. A network with

high degrees of both is a highly cohesive network. Usually,

not all entities are connected to each other in a network.

Therefore, a path from one entity to another may go through

one or more intermediate entities. Betweenness centrality is

measured as the frequency of occurrence of an entity on the

geodesic connecting other pairs of entities. A high frequency

indicates a central entity. These entities bridge networks,

clusters, and subgroups: “betweenness centrality fleshes out

the intermediaries or the brokers within a network” [23].

Closeness, also known as global centrality, indicates

whether or not an entity has easy access to other entities in

the network. Eigenvector centrality is like a recursive version

of degree centrality where an entity is central to the extent

that the entity is connected to other entities that are central.

Specific techniques for terrorist network analysis often take the

mentioned centrality measures as input to their computations.

Examples include measures of link importance based on

secrecy and efficiency [9], the prediction of covert network

structure, missing links, and missing key players [30], and

custom-made techniques developed by investigators to target

network-specific analysis tasks, such as the node removal

technique described in [31].

D. Hypertext and semantic web technology

Hypertext systems aim at augmenting human intellect, i.e.,

increasing the capability of man to approach a complex

problem situation, to gain comprehension to suit particular

needs, and to derive solutions to problems [32]. CrimeFighter

Investigator supports a range of domain-independent hypertext

structures that are used to support synthesis of information

entities: navigational structures allow arbitrary pieces of in-

235

formation (entities) to be linked (associated, see discussion

above); spatial structures were designed to deal with emergent

and evolving structures of information which is a central

task in information analysis; taxonomic structures can support

various classification tasks.

In the context of criminal network investigation, spatial

structures are useful in various synthesis, sense-making, and

dissemination tasks such as re-structuring, brainstorming, re-

tracing the steps, creating alternative interpretations, and story-

telling. Taxonomic structures are in essence hierarchical (tree)

structures. Hierarchical structures are also known from other

structuring domains (such as composites from the associative

domain and collections from the spatial domain). In the context

of investigation, taxonomic structures can provide a different

visual (hierarchical) perspective of associative and spatial

structures hence supporting the exploring perspectives task

of sense-making. See [2] for further details on the application

of hypertext structures to criminal network investigation.

Semantic web concepts have many characteristics in com-

mon with our understanding of criminal network entities and

their associations. Similar to centrality measures for criminal

networks, semantic web concepts have been developed to

measure the centrality of entities in online social networks. We

are interested in analysis of complex systems in which nodes

could be any object, relations (links) could be of any nature,

and structures are generated by the users (investigators). Se-

mantic web technology can explicitly model the interactions

between individuals, places and things in complex systems

of information entities, but classical social network analysis

methods are typically applied to “these semantic representa-

tions without fully exploiting their rich expressiveness” [33].

A short summary of semantic web technology and a social

network analysis example is given in [34]:

Semantic web [technologies] provide a graph model,a query language and type and definition systemsto represent and exchange knowledge online. These[technologies] provide a [. . . ] way of capturingsocial networks in much richer structures than rawgraphs. Several ontologies can be used to representsocial networks. The most popular is FOAF5, usedfor describing people, their relationships and theiractivity. A large set of properties is dedicated to thedefinition of a user profile: “family name”, “nick”,“interest”, etc. The “knows” property is used toconnect people and to build a social network. [. . . ]The properties in the RELATIONSHIP6 ontologyspecialize the “knows” property of FOAF to typerelationships in a social network more precisely(familial, friendship, or professional relationships).For instance the relation “livesWith” specializes therelation “knows”.

We believe that the outlined approach can be adopted

and extended to support other association types such as the

5http://www.foaf-project.org/6http://vocab.org/relationship/

Fig. 3. “Queries that extract the degree centrality of [individuals] linked bythe property foaf:knows and its specialization relationship:worksWith” [34].

semantic associations described below.

E. Topology of associations

Based on the concepts of centrality and association, we

outline a topology of associations between criminal network

entities which impact the centrality of individual entities

with varying degree. Our topology is divided into direct and

semantic associations (see Figure 4 and 5). Direct associationsare expressed using link entities. The link may be weak

by weight (low), by type (rumor, acquaintance, one-visit-to,

etc.), or by evidence (uncorroborated, questionable news paper,

etc.), but it is nonetheless interpreted as a direct association

by sense-making algorithms and in visualizations.Semanticassociations between criminal network entities are build in-

crementally based on the tacit knowledge of investigators and

the investigation domain their target operates within. Initially,

investigators express information “via visual or textual means

and later formalize that [information] in the form of attributes,

values, types, and relations” [15].

The visual symbol for direct associations is a thick solid

line, and thin solid circles indicate entity connection points.

The visual symbol for semantic associations is a dashed line

and dashed circles indicate connection points. We realize that

some of these associations are more relevant than others, and it

is exactly this relevance of alternative associations that we are

investigating in this paper. In Figure 4a to 4c, we show three

classic associations: the node-link-node association is the most

frequently used (4a), together with the less frequently used

node-link-group (4b) and group-link-group (4c) associations.

Figure 4d to 4g shows four examples of direct associations

that occur in criminal network investigations, but are not in-

cluded when entity centrality is computed. A link could be the

target of an investigation, e.g., Daniel Pearl was investigating

whether or not there was a link between Richard Reid (the shoe

bomber) and the leader of a local radical Islamist group [12].

Other examples include knowledge about the money transfer

between two individuals or that one individual had seen them

talk at the same location on numerous occasions (Figure 4d).

The empty endpoint is another example of a direct association

that occurs in criminal network investigations, but is not (di-

rectly) addressed by traditional centrality algorithms. The need

to include empty endpoints in centrality is straightforward: if

investigators know that someone is distributing drugs to three

individuals, e.g., based on wire taps, but they don’t know who

those individuals are, then an empty endpoint can be used until

it is clear. This could be the case for both nodes and groups

(see Figure 4e and 4g). Finally, direct associations between

entities outside groups to entities inside groups are needed

236

(a) node-node (b) node-group (c) group-group (d) link-link (e) empty endpoint I (f) node-sub node (g) empty endpoint II

Fig. 4. Direct associations in our topology includes classic associations (a-c) and novel associations in terms of centrality measures (d-g).

(a) clique I (b) clique II (c) meta data (d) sequential (e) group-subgroup (f) node-subnode (g) node below

Fig. 5. Semantic associations in our topology include spatial associations (a-d) and hierarchical associations (e-g).

(both for reference and inclusion composites, see Figure 4f).

When criminal network investigators start grouping entities,

structures where entities outside the group are linked to entities

inside the group might emerge. But the relation still has

association to that entity in the subgroup.

The semantic co-location association should be used care-

fully by investigators. If the investigators position entities

near each other spatially because they are assumed to be

related somehow, then it will make sense to use spatially based

associations. But if not, then it will simply clutter the network

with non-relevant relations. If entities are placed near each

other or as overlapping entities it could mean that they are

forming a sort of clique (Figure 5a and 5b). Also, as it is the

case in the analyzed The Wire investigation board, position

entities next to or around a (centered) entity could mean that

the information entities are meta data about the centered entity

(Figure 5c). Entities positioned next to each other horizontally

or vertically, could mean that the entities represent a sequence

(Figure 5d).

Semantic hierarchical associations can occur either when

composites are used or when information entities are posi-

tioned spatially in a manner that resembles that of a hierarchy.

If a group contains single information entities and subgroups,

the single entities must have some sort of relationship to the

entities in the subgroups since their overall classification is

the same (Figure 5e). Also it could be that a single entity

is associated with a composite (group) and therefore might

have some sort of relation with entities within that composite

(Figure 5f). Finally, positioning entities in spatial hierarchies

as shown in Figure 5g indicates entities below other entities

represent sub entities.

The topology of associations can be seen as a wish list

of requirements for what an investigative tool should support

in this regard. The topology is not exhaustive; we expect

to uncover additional associations over time. Especially new

semantic associations based on temporal distance (when in-

dividuals appear on an investigation time line together with

other individuals and events etc.), distance between entities in

the real world, distance in family ties, and so on.

III. CRIMEFIGHTER INVESTIGATOR

CrimeFighter Investigator [2], [35] is based on a number of

concepts (see Figure 6). At the center is a shared information

space. Spatial hypertext research has inspired the features of

the shared information space including the support of inves-

tigation history [2]. The view concept provides investigators

with different perspectives on the information in the space

and provides alternative interaction options with information

(hierarchical view to the left (top); satellite view to the left

(bottom); spatial view at the center; algorithm output view

to the right). Finally, a structural parser assists the investiga-

tors by relating otherwise unrelated information in different

ways, either based on the entities themselves or by applying

algorithms to analyze them (see the algorithm output view to

the right). In the following, central CrimeFighter Investigator

features supporting measures of centrality are presented.

A. Extending centrality algorithms with new associations

The classic centrality algorithms have been extended by

adding some analysis prior to the existing steps. Our imple-

mented betweenness algorithm (described in [31]) with the

extra step for the selected centrality extension(s) works as

follows:

1) Pre-analysis; In this step the algorithm analyzes

whether or not the included association types appear

in the criminal network. If they do then changes are

temporarily made to the network accordingly.

2) List all entity pairs; This step creates a list of all

entity pairs that exists in the network, again based on

the included associations. This means that if the direct

node-group association is included, then all entities

that are directly or indirectly (by association through

intermediary entities) associated to the group with links

are added to the list of entity pairs.

3) List all shortest path(s) for each entity pair; We

calculate the shortest path(s) for all entity pairs without

considering the cost-efficiency of our algorithm: we take

a breadth first, brute-force approach [36], visiting all

nodes at depth d before visiting nodes at depth d + 1,

removing all loops and all paths to the destination node

237

Fig. 6. CrimeFighter Investigator showing an altered version of the investigation board from The Wire.

longer than the shortest path(s) in the set, until only the

shortest path(s) remain.

4) Node occurrence; We calculate the ratio by which each

node in the network appear in the accumulated set of

shortest path(s).

5) Bubble sort; The results are sorted according to the

user’s choice, usually descending with the highest cen-

trality first.

6) Generate report; If the user requests it, a pdf report

is generated for easy dissemination of the results of

the centrality measure. The user can decide what report

elements to include.

Pre-analysis is the algorithm step of primary interest to the

work presented here. For the direct empty endpoint association,

pre-analysis involves adding temporary information elements

as placeholders of empty endpoints. For the semantic co-location association, we create a temporary relation between

two entities if they are not already related and they are within

the user-defined boundaries of each other (see Figure 7).

B. Customizing sense-making and sense-making algorithms

CrimeFighter Investigator algorithms are managed using

a structural parser, where investigators can select different

algorithms to run and control the order in which they are

executed, for example either simultaneously or sequentially.

Figure 8 (left) shows how individual centrality algorithms

can be customized by the user. The user must decide how

to run an algorithm (Figure 8a) and what entities to include

for the respective centrality algorithm (Figure 8b). This is

done using drag and drop between two defined areas as

shown in Figure 8 (right, top frame). For included entities

the user can set a weight (maybe a location counts less than

a person for a measure of betweenness centrality) and for

excluded entities the user how the algorithm should deal with

it, e.g., when tracing a shortest path. Should it not include the

shortest path or simply ignore this entity and continue along

the path? Direct and semantic associations are included

or excluded using the same drag and drop approach as for

(a) without (b) with (c) without (d) with

Fig. 7. The two implemented algorithm extensions, the empty endpointassociation and the co-location association are explained. Without the emptyendpoint association, the link from the empty endpoint to the connected entityis not included in measures of betweenness centrality and degree centrality isnot calculated for the empty endpoint (a) and with that association the linkis included (b). Without the co-location association entities positioned neareach other in the information space are not included in measures of centrality(c), but if entities fall within the boundaries defined by the investigators andthe association is included, then those entities are included in measures ofcentrality (d).

238

entities (see Figure 8c and 8d). Again, weights can be setup for

included associations and the algorithms action(s) for excluded

associations. Finally, we imagine many settings for how to

format and list results (Figure 8e). Typically, normalization

is important for comparison of results. If an investigation has

many of the included entities it can be useful only to display

for example 10 results based on some parameter, e.g., highest

centrality.

Fig. 8. Setting up centrality algorithms using structural parser windows: thecentrality algorithm settings window is shown on the left, and the window forinclusion and exclusion of entities together with specific settings for each ofthose entities is shown on the right.

It is currently possible to set the visual symbols for the

information space and the algorithm view (see Figure 8f). For

the information space the user can decide whether or not

to overlay entities with a geometric shape (circle, square, or

rectangle) containing the calculated centrality (instead of just

showing the results in the algorithm view). The color, size

and outline of the shape can be decided together with the font

and font size of the printed centrality. For the algorithm viewit can be decided how to display the results textually in a

list. Maybe a certain attribute should be printed (e.g., person

’name’ or email ’date’). And the font (type, size and color)

can be set.

IV. EVALUATION

We have tested CrimeFighter Investigator’s support of three

tool requirements on a filtered version of the investigation from

The Wire and a semi-altered version of the same investigation.

We calculate two centrality measures, degree and between-

ness, for two conditions, with and without two designed and

implemented associations.

We test the co-location association on an investigation

inspired by The Wire to evaluate the requirement for support

of semantic associations. The investigation had no direct

associations between entities prior to the test. We have filtered

out all entities except the close-up photos (i.e., the blue

rectangles) and created an investigation using CrimeFighter

Investigator where individuals are positioned with the same

relative distance. All individuals are given numbers or letters

as name, except for the two lieutenants Anton Artis (A.A.)

and Roland Brice (R.B.). The network with the semantic co-

location association included is shown in Figure 9a and the

calculated centralities are shown in Figure 9b.

Prior to testing the empty endpoint association we found

that empty endpoints rarely occurred in the investigation we

analyzed. Links are used to connect two entities, and even if

the contents of one entity is unknown it is still created as a

placeholder. It is unclear whether this is simply because it does

not make sense to work with empty endpoints or if it is because

of a structural bias toward links as simple entity connectors.

To test the influence of the empty endpoint association we

have used some of the links from the previous test to create

a new test case (see Figure 6). We assume that a number of

subgroups have been detected (the four colored composites)

and that the investigators know there is some connection from

the main network to each of these subgroups but it is unclear

how and therefore an empty endpoint is positioned next to

each subgroup.

To test the requirement for centrality measures to consider

multiple associations, we use the same network as for the

empty endpoint requirement (see Figure 6). However, this

time we test both the empty endpoint association and the

co-location association together. The with condition therefore

means that the algorithm replaces empty endpoints with actual

nodes (placeholders) and creates links between co-located

nodes that are not already directly associated.

A. Discussion and summary of results

Testing the requirement for semantic associations illustrated

how centrality measures can be applied to spatial network

structures using a co-location association. It is evident that

when no relations exist in an investigation prior to analysis,

there is a need to define associations between entities in a dif-

ferent way if the investigators need to calculate node centrality

to deal with the uncertainty of an ongoing investigation. We

see that degree centrality indicates the individuals on the right

hand side in Figure 9b as central to the network (e.g., 9, 6, 8,

and 10), but they are of little importance. At the same time

degree doesn’t point to the two lieutenants A.A. or R.B. as key

players like we expected. We therefore find that one should

be careful with considering spatial co-location as a measure

for network degree centrality. Betweenness centrality clearly

points to A.A. and R.B. as key players in the network together

with individual 2. Given the results of our two other tests it is

also interesting that individual 5 is placed in top four in terms

of betweenness.

When we tested the empty endpoints requirement we found

that the measure of degree centrality provides investigators

with no clear tendencies, although it more strongly indicates

individual F, D, A.A., and 3 as central to the network.

The betweenness results more distinctly point to A.A. and

239

(a) test scenario 1 (b) colocation results

(c) empty endpoint results (d) two associations results

Fig. 9. The Wire investigation with links representing colocation associations (a). The degree and betweenness centralities for each of three tests: colocationassociation (b), empty endpoints association (c), and both colocation and empty endpoints associations (d).

R.B. when including the empty-endpoints association. We

also observe that individual 2 is ranked as fourth instead of

seventh which is a more realistic depiction of this individual’s

betweenness in the network. Individual 5 has the highest

change in betweenness when including empty endpoints,

making him an interesting subject for further investigation.

As mentioned earlier, it would be possible to model empty

endpoints using information element placeholders until the

content of the empty endpoint is known. This also means that

traditional social network analysis measures of centrality could

be applied. We therefore recommend to test if empty endpoints

have higher value for restructuring tasks during synthesis than

for centrality algorithms.

Our test of the requirement for support of multiple asso-

ciations was successful in terms of extending two measures

of centrality with more than one association from our topol-

ogy. But for the test investigation the test results did not

add much investigative value. The inclusion of both emptyendpoint and co-location associations connects all entities

in the criminal network through the empty endpoints (indi-

vidual 5 is connected to individual 6 and 12, individual F

to individual H, and individual A.A. to individual M). This

makes the degree and betweenness centrality of key nodes

without the associations less distinctive. The numbers are

flattened because the information elements in the subgroups

achieve higher measures of betweenness centrality with the

associations included. The most interesting result for this

final test was that the degree and betweenness centrality of

individual 5 is increased considerably when the associations

are added. Together, our three requirement tests have shown

that measures of centrality extended with novel types of

associations provided new insights into two organized crime

networks that traditional centrality measures could not provide.

Most important result was that the centrality of individual 5

was increased in all three tests. Individual 5 was not known

to be a central entity in the network before the tests.

V. CONCLUSION

We have presented two novel sense-making algorithms

based on new interpretations of information entity association

and centrality. The algorithms are extensions of classic social

network analysis algorithms where the user can include and

240

exclude specific entities and associations for analysis to match

it with the structures they have build when investigating a

criminal network. More specifically , this paper has three main

contributions:

1) A novel network model with nodes, links, and groups

as first class entities.

2) A topology of direct and semantic network entity associ-

ations based on an analysis of various criminal network

investigations following a target-centric approach.

3) An implementation that supports three of these associa-

tions: the traditional node-link-node association and the

novel empty endpoint association and the semantic co-

location association. Both associations have been tested

on a criminal network investigation from The Wire and

an altered version of that same investigation.

We can conclude that target-centric criminal network in-

vestigation creates structures not clear from the beginning of

an investigation and in order to apply traditional centrality

measures, associations other than node-link-node have to be

supported. We plan to implement support of the other asso-

ciations in our topology in the near future. We would like

to test them on real-world investigations (either post-crime or

ongoing) to learn if and how they could provide useful insights

into the investigated criminal networks.

As an alternative to manually applying a specific radius or

geometric shape to decide co-location association, it could be

interesting to apply a standard machine learning algorithm

that suggests co-location, not in terms of position on the

investigation board, but based on temporal distance, physical

distance in the real world, or distance in family ties.

REFERENCES

[1] R. Clark, Intelligence analysis: a target-centric approach. CQ Press,2007.

[2] R. R. Petersen and U. K. Wiil, “Hypertext structures for investigativeteams,” in proceedings of the 22nd ACM conference on hypertext. ACMPress, 2011, pp. 123–132.

[3] B. Drogin, Curveball. Ebury Press, 2008.[4] T. Weiner, Legacy of Ashes: The History of the CIA. Anchor Books,

2008.[5] Could 7/7 have been prevented? Review of the intelligence on the

London terrorist attacks on 7 July 2005, Intelligence and SecurityCommittee, United Kingdom, 2009.

[6] R. R. Petersen, “Presentation of crimefighter investigator.” British HomeOffice, London, United Kingdom: Presented and demonstrated work onprediction of covert network structure and missing links to a group ofBritish intelligence analysts, March 2011.

[7] T. McDermott and J. Meyer, The Hunt for KSM - Inside the Pursuitand Takedown of the Real 9/11 Mastermind, Khalid Sheikh Mohammad.Little, Brown and Company, 2012.

[8] The 9/11 Commission Report (Executive Summary), Nationalcommission on terrorist attacks upon the United States, UnitedStates, 2004. [Online]. Available: http://www.9-11commission.gov/report/911Report Exec.pdf.

[9] U. K. Wiil, N. Memon, and J. Gniadek, “Crimefighter: A toolbox forcounterterrorism,” Lecture notes in communications in computer andinformation science (Knowledge discovery, knowledge engineering andknowledge management), vol. 128, pp. 337–350, 2011.

[10] M. Sageman, Understanding terrorist networks. Philadelphia, Pensyl-vania: University of Pennsylvania Press (PENN), 2004.

[11] B. F. Todd and A. Nomani, The truth left behind: inside the kidnappingand murder of Daniel Pearl, 2011.

[12] M. Pearl, A mighty heart. Virago Press, 2004.

[13] B. H. Levy, Who killed Daniel Pearl? Melville House Publishing, 2003.[14] V. Krebs, “Mapping networks of terrorist cells,” CONNECTIONS,

vol. 24, no. 3, pp. 43–52, 2002.[15] F. Shipman, J. M. Moore, P. Maloor, H. Hsieh, and R. Akkapeddi, “Se-

mantics happen: knowledge building in spatial hypertext,” in Proceed-ings of the thirteenth ACM conference on Hypertext and hypermedia,ser. HYPERTEXT ’02. ACM, 2002, pp. 25–34.

[16] R. Alvarez and D. Simon, The Wire: Truth Be Told. Pocket Books,2004.

[17] R. Penfold-Mounce, D. Beer, and R. Burrows, “The wire as socialscience-fiction?” Sociology, vol. 45, no. 1, pp. 152–167, Feb. 2011.

[18] R. R. Petersen and U. K. Wiil, “Analysis of emergent and evolvinginformation: the agile planning case,” in Software and data technologies,ser. Communications in computer and information science, J. Cordeiro,A. Ranchordas, and B. Shishkov, Eds. Springer Berlin Heidelberg,2011, vol. 50, pp. 263–276.

[19] (2012) Ibm i2 analyst’s notebook. [Online].Available: http://www.i2group.com/us/products/analysis-product-line/ibm-i2-analysts-notebook

[20] B. Capers, “Crime, legimaticy, our criminal network, and the wire,” Ohiostate journal of criminal law, vol. 8, pp. 459–471, 2011.

[21] D. Simon and E. Burns, “The wire (the complete first season),” 2002.[22] T. A. Taniguchi, J. H. Ratcliffe, and R. B. Taylor, “Gang set space, drug

markets, and crime around drug corners in camden,” Journal of researchin crime and delinquency, vol. 48, pp. 327–363, 2011.

[23] C. Morselli, “The criminal network perspective,” in Inside criminalnetworks, ser. Studies of organized crime. Springer New York, 2009,vol. 8, pp. 1–21.

[24] A. S. Linschoten and F. Kuehn, An enemy we created: the myth of theTaliban/Al-Qaeda merger in Afghanistan, 1970-2010. Hurst, 2012.

[25] R. R. Petersen, “Interview with alex strick van linschoten.” TrafalgarSquare, London, United Kingdom: A discussion of CrimeFighter Inves-tigator, Tinderbox, Gephi, Analyst’s Notebook in relation to Alex’s workwith mapping the temporal evolution of Afghan Taliban., March 2011.

[26] M. Bernstein, The Tinderbox way. Eastgate Systems, 2006.[27] P. A. Gloor and Y. Zhao, “Analyzing actors and their discussion topics

by semantic social network analysis,” in Proceedings of informationvisualization, 2006, pp. 130–135.

[28] J. Scott, Social network analysis, a handbook (second edition). Sage,2000.

[29] L. R. Irons, “Recent patterns of terrorism prevention in the unitedkingdom,” homeland security affairs, vol. 4, 2008.

[30] C. J. Rhodes and P. Jones, “Inferring missing links in partially observedsocial networks,” Journal of the operational research society, vol. 60,no. 10, pp. 1373–1383, 2009.

[31] R. R. Petersen, C. J. Rhodes, and U. K. Wiil, “Node removal incriminal networks,” in Proceedings of european intelligence and securityinformatics conference. IEEE, 2011, pp. 360–365.

[32] D. C. Engelbart, “A conceptual framework for the augmentation of man’sintellect,” in Computer-supported cooperative work. Kaufmann, 1988,pp. 35–65.

[33] G. Ereteo, F. Limpens, F. Gandon, L., O. Corby, M. Buffa, M. Leitzel-man, and P. Sander, “Semantic social network analysis: a concrete case,”in Handbook of Research on Methods and Techniques for StudyingVirtual Communities: Paradigms and Phenomena. IGI Global, 2011,pp. 122–156.

[34] G. Ereteo, M. Buffa, F. Gandon, P. Grohan, M. Leitzelman, andP. Sander, “A state of the art on social network analysis and itsapplications on a semantic web,” 2008.

[35] R. R. Petersen and U. K. Wiil, “Crimefighter investigator: a noveltool for criminal network investigation,” in Proceedings of europeanintelligence and security informatics conference. IEEE, 2011, pp. 360–365.

[36] M. Sipser, Introduction to the theory of computation. PWS PublishingCompany, 1997.

241

association and centrality in criminal networks

Documents