using advanced analytics to facilitate intelligence analysis · using advanced analytics to...

15
WHITE PAPER Using Advanced Analytics to Facilitate Intelligence Analysis Make Connections Between Disconnected Data Fragments to Reveal Hidden Threats

Upload: others

Post on 02-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

WHITE PAPER

Using Advanced Analytics to Facilitate Intelligence AnalysisMake Connections Between Disconnected Data Fragments to Reveal Hidden Threats

Page 2: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

SAS White Paper

Table of Contents

Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Protecting National Security:

New Challenges and Opportunities . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Making Analysts More Efficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

A Use Case: Connecting Fragments of Data to Identify Targets . . . . 3

The US Government’s ‘Identity Discovery Challenge’ . . . . . . . . . . . . . . 3

Using SAS® Software to Find Multiple ‘Paths’ Connecting the Dots . 4

Our Approach to the Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Broadening the Applications of SAS® Security Intelligence . . . . . . . 10

Learn More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Page 3: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

1

Using Advanced Analytics to Facilitate Intelligence Analysis

Executive SummaryIntelligence analysts face incredible challenges today, including pressures to detect growing numbers of threats by analyzing overwhelming amounts of “big data .” Using traditional tools and manual approaches, analysts simply can’t keep up with the need to find that elusive “needle in a haystack” – especially when they have only a few fragments of data as the starting point for investigations . Moreover, success often requires answers to a wide range of open-ended questions – for example, how events or people may be linked; if behavior patterns are being established; and whether certain transactions, shipments or other events exhibit any alarming characteristics .

This paper explores how government agencies can use advanced analytics to link disconnected data and provide answers to complex questions . Rather than requiring analysts to know precisely what to look for at any given moment, advanced analytics with built-in alert systems can be set up to proactively identify, prioritize and present information to analysts based on pattern identification and quantification of risk . Although human intervention is inherent in the process of analysis, parts of the intelligence analysis cycle can be automated with analytics software . In this case, we use social network analysis software . Using these advanced analytic methods and approaches for processing and visualizing data, analysts can accomplish objectives faster and smarter – and do more in less time .

Page 4: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

2

SAS White Paper

Protecting National Security: New Challenges and OpportunitiesAccurate, timely, holistic intelligence is the cornerstone for keeping nations and citizens from harm . Quality intelligence in shorter time frames gives those responsible for public security the ammunition they need to deter threats and combat crime to protect national security – whether it’s on the street or in cyberspace . But to make fast, informed critical decisions that affect national security, consumers of intelligence across government agencies need the right information at the right time – for example, when they are:

• Screening passengers and cargo crossing borders .

• Identifying terrorists trying to enter the country or living under cover in cells .

• Pinpointing emerging cyberthreats and assessing risks .

• Detecting financial crimes .

But success in these types of security endeavors is increasingly dependent on the intelligence community being able to answer a wide range of nebulous questions, such as:

• Can the occurrence of one event be linked to another, thereby establishing a pattern?

• Does a particular cargo shipment exhibit any alarming characteristics?

• Has some network collectively engaged in alarming financial transactions?

• Does an airport activity significantly deviate from normal, expected behavior?

• Do changes in the frequency of communications indicate that an event is likely to occur? Should I be concerned about some recent activity, even though it seems harmless on its own?

Getting answers to these types of questions is notoriously difficult, as it usually requires advanced analysis of “big data .” As governments collect unprecedented volumes of data, analysts are literally being overwhelmed with it . If they continue to use manual processes to analyze data, the risk is that important information will escape undetected . The vast variety of data – structured and unstructured – and the speed at which it is being generated and consumed means agencies need a more automated way to sort through it; uncover meaningful connections, anomalies and insights; and answer complex questions .

At the same time, analysts face higher volumes of potential threats that are both varied and unforeseen . So it’s critical that intelligence analysts accelerate the pace at which they can process big data, as well as more effectively prioritize what data and information to focus on .

Page 5: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

3

Using Advanced Analytics to Facilitate Intelligence Analysis

As explored in this paper, analysts need a way to work with large amounts of data, analyze it, and then fuse the results back together for a unified, intelligible view . This view is what will enable intelligence agencies to quickly and efficiently find the hidden “needles” in today’s Mount Everest-size data haystacks .

Making Analysts More Efficient

The most efficient, effective way to discover insights and anomalies within massive data sets is to use advanced analytics that can analyze data and data fragments to deliver meaningful, prioritized results . These solutions can sift through enormous volumes of data, to identify trends and uncover patterns, providing analysts and decision makers the additional insight they need in order to act quickly and decisively . The results have tangible, real-world benefits .

A Use Case: Connecting Fragments of Data to Identify Targets One of the most important uses for advanced analytics is helping analysts connect fragments scattered across massive amounts of data so they can identify potential threats . In a perfect world, analysts working on cases would have fast, easy access to complete data that’s unified and linked logically . But in the real world, this is rarely the case . Often, analysts are given only fragments of data to start with – bits of information that may be incomplete at best, but that they suspect may somehow be related and essential to detecting emergent threats . Moreover, they have had to make connections between these bits of data manually – a time-consuming and nearly impossible task – before they can identify specific threats or targets .

To connect data fragments, agencies need proven, repeatable analytical processes using complex algorithms to help them quickly uncover risks hidden in big data . And these processes need to be user-friendly so they can be adopted and used by analysts with little time to learn new tools .

From time to time, the US government issues analytical challenges to see how they can be addressed and learn about technologies and approaches that may be relevant to their efforts . These challenges enable agencies to “try before they buy,” as well as access fresh, innovative approaches to their most complex challenges .

The US Government’s ‘Identity Discovery Challenge’

In 2012, SAS participated in the US government’s “Identity Discovery Challenge,” which focused on helping the agencies discover non-obvious linkages between fragments of data . The data set was fictitious (in other words, fabricated, randomly generated and not based on any database, repository or system), so any resemblance to real persons, living or dead, would be purely coincidental . The challenge stated:

To connect data fragments,

agencies need proven,

repeatable analytical processes

using complex algorithms to

help them quickly uncover risks

hidden in big data.

Page 6: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

4

SAS White Paper

“An unidentified male visited a medical clinic located at 440 East Madison Avenue in Bethesda, Maryland . The individual signed in with an illegible signature and partially illegible phone number . Before receiving attention from medical staff, the individual exited the facility . While examining other patients of the clinic, it was discovered that one of the patients showed symptoms of an influenza-like illness consistent with a potentially deadly and highly contagious virus .

Staff initiated quarantine procedures to limit close contact until laboratory-confirmed diagnosis of influenza was completed . In reviewing the sign-in log, staff discovered an entry which was unaccounted for . When asked, patients of the clinic could not recall any supporting information about the unidentified individual . Clinic staff contacted disease control authorities to respond to the incident .

An Analytic Response Team is tasked to identify the potentially infected individual using contact tracing . The team is a given a single lead – a partial phone number – 212-998-75?? . The team uses this available information as a starting point for developing the identity and whereabouts of the individual . The phone number fragment is used to search available data sources through a fuzzy matching, entity resolution process . The team has received the results of the search and is preparing to apply analytic methods to yield contact tracing leads for the field investigators .”

Specifically, challenge participants were asked to identify: 1) the name of the unknown subject, and 2) the location of the unknown subject . Additionally, participants were asked to give details about what they found and how they found it .

Trying to determine answers to these questions by making connections between fragments of data scattered across a large data set in an entirely manual fashion would be extremely difficult, if not impossible . So the goal of the project was to help analysts automate certain parts of the process using an advanced analytics solution that performs some initial analytics processes and prioritizes what data to look at; this would prevent analysts from wasting time looking in the wrong places and help ensure that real targets or threats don’t fall through the cracks .

Using SAS® Software to Find Multiple ‘Paths’ Connecting the DotsUsing SAS Social Network Analysis – a key component of SAS Security Intelligence – a SAS team was able to deliver the right answers and relevant supporting details for this challenge, and therefore “won” the challenge . Working behind the scenes, SAS Social Network Analysis can connect the dots for analysts so they can find multiple, unique paths of possible connections between data fragments representing people, places or things . Each path of automatically generated connections represents another way of looking at a given set of data fragments within a broad context. The analytic engine also prioritizes what results investigators should focus on .

Page 7: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

5

Using Advanced Analytics to Facilitate Intelligence Analysis

Our Approach to the Challenge

SAS was provided access to several data stores that included data on credit card records, credit card transaction data, home purchase agreement information, hotel reservations, government ID data, subscriber lookup data, travel reservations and white pages . Specifically, the SAS team was given a set of 345,987 nodes and a more than 58,000 links . Each node consists of a person-centric record that contains data such as their first, middle and last name; city; state; zip code; and phone number . SAS was also given two pieces of information known to be true: the partial phone number left by the individual in question and the address of the clinic . Without applying any analytic techniques, the non-prioritized data resembles the “yarnball” shown in Figure 1 below .

The SAS team’s plan was to use SAS Social Network Analysis to find a path through the data that links the two pieces of factual information, similar to the way Google Maps enables people to find the fastest way to reach an end destination . An example of a path shown in SAS Social Network Analysis can be seen in Figure 2 .

The SAS team was given two

pieces of information known

to be true: the partial phone

number left by the individual in

question and the address of the

clinic.

Figure 2: Mapping the relationships between the individual entities within nodes using SAS Social Network Analysis.

Figure 1: Shown here is a node without prioritization (the “yarnball”).

Page 8: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

6

SAS White Paper

Figure 2 is an example of how SAS Social Network Analysis can link people, places and things to create a path . In this example, the red and green lines form two separate paths that connect a hotel reservation (6318085343 [HR]) to a government-issued ID (298808448 [ID]) .

In using SAS Social Network Analysis, the assumption in this challenge is that the individual of interest would likely fall somewhere along a path that connects the partial telephone number to the address of the clinic . To begin the process of identifying the most likely path that would reveal the name and location of the person being sought, the team executed the following steps:

Step 1: Identify communities – or densely connected entities with multiple connections between them – from the initial set of data . This will be used to enrich the information about the entities on the path for the analyst . This gives analysts an initial idea of what the data looks like, but it’s difficult to know where to start with this “yarnball” of unprioritized data (see Figure 2) . Step 2: Break apart the original nodes into constituent entities (such as person’s name, address and phone number) and generate new links between these entities . With this challenge, the number of links increased from approximately 58,000 to 1 .3 million after the nodes were decomposed into multiple entities as explained below . For example, in the figure below, data for a “node” appears so that each record is represented by a row, and each column is represented as a field . The original record is read in as:

• Polo Frodo Tunnelly 3961 Keyser Ridge Road Burlington NC 27715 242218430

SAS software separates the record into three entities:

• Polo Frodo Tunnelly

• 3961 Keyser Ridge Road, Burlington, NC, 27715

• 242218430

These entities are delineated in red below, and then are displayed graphically as shown in Figure 3 .

The number of links increased

from approximately 58,000

to 1.3 million after the nodes

were decomposed into multiple

entities.

Page 9: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

7

Using Advanced Analytics to Facilitate Intelligence Analysis

Step 3: Determine candidate paths and calculate alert scores (see Figure 4) . A candidate path is a path that connects the start and end nodes (partial telephone number and Bethesda clinic address) and therefore is considered a possible solution . Each pair of start and end nodes has multiple routes connecting different nodes that may reveal the name of the person being sought and information about how or where he could be found . The software calculates an alert score for each path that helps analysts identify the paths that have the strongest connections and are most likely to include the target individual . The higher the alert score, the higher the likelihood of finding the target along the path .

Analysts can drill down into the identified candidate paths as listed in Figure 4, and review a summary of its details and characteristics (see Figure 5) to quickly determine whether it is viable and worth exploring further . In this case, 180 paths were identified, and the target was identified in the path with the highest alert score . Depending on the situation, analysts might continue to explore down the list as far as they feel necessary .

Figure 3: Entities of a node with associated links.

Figure 4: List of candidate paths prioritized by automatically calculated alert scores.

Page 10: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

8

SAS White Paper

As is shown in Figure 5, it is difficult to spot the target in lines of data that represent the path . But, visualizing the data and relationships – as in Figure 6 – can help the analyst find the answer faster .

In this case, there are actually two correct paths linking partial phone number and the clinic address: one is primary (yellow) and the other is secondary (green) . They are both correct and have valid entities . However, the person of interest does not show up on the secondary path, but it still validates the connections between the entities .

Here in Figure 6, the path highlighted in yellow suggests the person of interest is Tollman Took, who is represented three times along the path . Additional insight can be derived by looking at the thickness of the lines and second- and third-degree connections, which might be relevant to the investigation . The heavier lines represent links with higher certainty; the thinner lines represent more spurious connections or “fuzzy” matching .

Figure 5: Summary information displays for a selected candidate path.

Page 11: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

9

Using Advanced Analytics to Facilitate Intelligence Analysis

Fig

ure

6: V

isua

lized

net

wo

rk a

fter

bei

ng o

rgan

ized

by

an a

naly

st. T

he p

rim

ary

pat

h is

sho

wn

in y

ello

w a

nd t

he s

eco

ndar

y p

ath

is s

how

n in

gre

en. N

ote

tha

t th

e m

ost

sim

ilar

per

son

of

inte

rest

ap

pea

rs t

hree

tim

es (c

ircl

ed in

red

) alo

ng t

he y

ello

w p

ath

as T

ollm

an T

oo

k, T

olm

an F

red

egar

To

ok

and

To

m F

Tuk

. Fuz

zy m

atch

ing

mak

es it

po

ssib

le t

o id

entif

y in

div

idua

ls

des

pite

inco

rrec

tly s

pel

led

nam

es.

• T

hick

line

: Hea

vily

wei

ght

ed, m

ore

tru

sted

co

nnec

tion

• T

hin

line:

Les

s w

eig

hted

, les

s tr

uste

d c

onn

ectio

n•

HR

: Ho

tel r

eser

vatio

n•

TR

: Tra

vel r

eser

vatio

n•

WP

: Whi

te p

ages

• ID

: Go

vern

men

t is

sued

ID•

HP

A: H

om

e p

urch

ase

agre

emen

t•

CC

CR

: Cre

dit

card

acc

oun

t in

fo (e

.g.,

whe

re p

erso

n liv

es)

• S

RLU

: Rev

erse

tel

epho

ne lo

oku

p•

CC

TR

: Cre

dit

card

tra

nsac

tion

Key

fo

r ty

pes

of

dat

a sh

ow

n:

Page 12: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

10

SAS White Paper

Findings

Using SAS Social Network Analysis to execute these steps, SAS software automated and accelerated the parts of the overall process for getting analysts the information needed to do what formerly only people could do: make sense of the connections between data and data fragments and arrive at the correct findings . The SAS analysts looked at the prioritized nodes in Figure 4, investigated them and pieced together the history .

Regarding the questions “Who is the unknown subject?” and “Where is the unknown subject?”, the SAS team’s analysis revealed that Tollman Took was the individual in question who left the clinic at 440 East Madison Ave . in Bethesda, MD . In addition, the SAS team uncovered other relevant facts that would help analysts during an investigation, specifically:

• Tollman Took was most likely visiting Melliot Hornblower (based on his hotel reservation information), who lives in Bethesda . The partial phone number that he left in the clinic sign-in log is likely that of Tollman Took’s wife, Gloriana Brandybuck .

• Based on the fact that a credit card belonging to Brandybuck was also used at a Bethesda location during the time period of Took’s trip to Bethesda, it is possible that she accompanied him on the trip . Alternatively, he may have mistakenly taken and used his wife’s credit card on his trip .

• Took and Brandybuck are most likely married, based on a shared street address (Pin Oak Drive, NY) and on a name variation for Brandybuck (Gloria Took-Brandybuck) on a home purchase agreement for the same address .

• Inigo Maggot and Dudo Boffin share a travel reservation ID with Took . They may have accompanied Took on his trip to Bethesda . Boffin has a Cleveland address, implying an additional geographic dimension in the possible spread of flu from the Bethesda clinic .

• Took and Brandybuck may have a daughter (Gloriana Rose Took) living in Buffalo, NY .

Broadening the Applications of SAS® Security IntelligenceThis Identity Discovery Challenge is just one example of how SAS software can be used to automate some of the tasks associated with analyzing big data and deliver a repeatable approach that can be used by analysts and investigators to find answers to the most complex intelligence questions . The software’s automated processes also save analysts huge amounts of time, freeing them to focus on the most likely targets so they can handle the growing volume of potential security threats .

Page 13: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

11

Using Advanced Analytics to Facilitate Intelligence Analysis

As stated earlier, SAS Social Network Analysis is a key component of SAS Security Intelligence – a comprehensive set of tactical and strategic solutions that give intelligence analysts fast, reliable intelligence needed to effectively coordinate information and proactively work to prevent and deter crime, terrorism and other threats . And even though just one component of SAS Security Intelligence was used for the challenge, if given more information, some of the various other techniques and solutions could have been used .

As shown in Figure 7, SAS Security Intelligence delivers a diverse set of analytics capabilities to score behavior, entities and networks across multiple organizations .

SAS Security Intelligence generates highly accurate results through superior modeling and industry-specific analytic solutions . Only SAS Security Intelligence provides:

• An enterprise approach. A common analytics platform lets you thoroughly analyze data across the organization on a single platform to provide seamless protection against losses from security threats .

• Hybrid analytic approach. No single analytic approach is guaranteed . This is why SAS incorporates multiple methods – business rules, anomaly detection, predictive data mining, social network analysis and more, to help connect the dots between seemingly unrelated events and hidden relationships .

• Faster implementation. SAS offers multiple deployment options – on-site, hosted, SaaS – infused with best practices from hundreds of successful implementations that minimize your budget expenditures .

Figure 7: The hybrid approach of SAS Security Intelligence uses multiple analytic methodologies and techniques to help analysts be more productive.

Page 14: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

12

SAS White Paper

Learn MoreAs we demonstrated here, by successfully applying the latest statistical and mathematical techniques, SAS software can solve the most complex problems that governments face today . For example, for intelligence analysts, our solutions maximize the amount of quality intelligence that can be gleaned from the scattered information fragments embedded in enormous volumes of both structured and unstructured data . If patterns exist, SAS can help analysts find them . If connections can be made, SAS can help make them .

We do this by providing a complete analytics framework combined with decades of experience in analytic processes and methods that are ideal for tackling hard analytical problems . And because the analytics framework is modular, government agencies can expand, grow or modify their analytics environment as their needs grow .

SAS solutions can also manage enormous amounts of data – even in the very largest of enterprises . SAS products can help agencies capture, analyze and monitor all-source information, sift through enormous volumes of data, identify trends and match patterns hidden in big data – and give analysts and decision makers the additional insight they need in order to act . And perhaps most importantly, SAS returns value to the government by using automated analytic processes to deliver faster, better analysis to increase the efficiency and effectiveness of analysts, investigators and other resources critical to national security .

Learn more

To learn more about how your organization can benefit from SAS Security Intelligence, visit us online at: sas.com/national-security

Page 15: Using Advanced Analytics to Facilitate Intelligence Analysis · Using Advanced Analytics to Facilitate Intelligence Analysis Our Approach to the Challenge SAS was provided access

About SASSAS is the leader in business analytics software and services, and the largest independent vendor in the business intelligence market . Through innovative solutions, SAS helps customers at more than 60,000 sites improve performance and deliver value by making better decisions faster . Since 1976 SAS has been giving customers around the world THE POWER TO KNOW® .

SAS Institute Inc. World Headquarters +1 919 677 8000To contact your local SAS office, please visit: sas.com/offices

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies. Copyright © 2016, SAS Institute Inc. All rights reserved. 106137_G41277.1116