information to insight - wired.com · information to insight in a counterterrorism context robert...

23
Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466 UCRL-PRES-211485 UCRL-PRES-211467 This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under Contract W-7405-Eng-48

Upload: others

Post on 07-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Information to Insightin a Counterterrorism Context

Robert BurlesonLawrence Livermore National Laboratory

UCRL-PRES-211319 UCRL-PRES-211466UCRL-PRES-211485UCRL-PRES-211467

This work was performed under the auspices of the U.S. Department of Energy by University of California, Lawrence Livermore National Laboratory under Contract W-7405-Eng-48

Page 2: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

We must be able to address the analysts’requirements

• Strategic analysis− See the “big picture” and how to counter terrorism− Support decision makers in setting policies and priorities− Integral to targeting technical and human source collection

• Tactical analysis− Predict and warn of pending attacks− Provide an understanding of our adversaries' current intentions

and capabilities − Allow the United States to act with precision both defensively

and offensively

• Both strategic and tactical analysis require a system capable of fusing information obtained from very diverse sources…

The Analysis, Dissemination, Visualization, Insight, and Semantic Enhancement (ADVISE) system is being

developed for DHS S&T to meet these requirements

Page 3: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Semantic Graph

TemplateSubgraphs

Ontology

2003 +

ADVISE lets us understand the information that characterizes our national security challenges

Volume and Disparity of SourcesDistribution and Automation

Scalable, adaptive interface to disparate data sources withunique sensors

Compatible interfaces for viewing, analysis, & insight

~1990

Viewing,Analysis, &

Insight

Integration&

Correlation

Sensors

Information

Data

Organizations

Imagery

NetworksSensors

Creating chains of relationships between disjoint information

Information Interface

Textreports

Knowledge Interface

Page 4: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

What drives the design…

“Connect the Dots…”• Scaling to massive data volume• Ingest information from information sources

− 100’s of systems− Real-time− High-throughput− Stove-piped by intent

• Support 100’s of analysts− Event notification in near-real time

• Control access and Protect privacy• Responsive to change

Page 5: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

What to consider when scaling to massive levels

What do we want from the Knowledge Fusion engine?• Relations between facts (nodes)

− Individual facts without relations not particularly useful(might as well keep stovepipes)

• Relate facts (build the graph)…− …at high ingest rates with results in real-time

• Responsive to change

What is important when scaling to massive sizes?• An optimal model

− Use relations between facts (connectivity) to extract knowledge from data

• Query performance− Key for high-complexity algorithms

Page 6: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Semantic graphs provide the basis for these massive knowledge relationships

sparkie.llnl.gov

LLNL

California

Livermore

SantaClara

JonMiller

Fremont

JeniferJones

UsesEmailed

Ow

ned

by

Vendor to

Subcontractor to

Web

con

nect

ion

to

Works for

Works for

Owner by

Located in

Cal

led

Located in

Works for

Located in

www.x.comCompany x

Company Y

• Information source• Classification• Time (effectively)• Geo-location (if appropriate)• Confidence

DavidSmith

Page 7: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

The fused graph reveals connections and gaps not immediately apparent

Existing search tools can find documents that contain a given connection:

“Person A” & “Person B”

“Person A” & “Country X”

“Person A” & “City Y”

Graph identifies connections that span several messages (sources):

Person A Person B WMD Program???(Person C)

Previously unknown “Middlemen”(path traversal)

Person A

Person B

Financial transactions

Hidden common connection (unknown nodes)

Material transhipment

???(Country X)

Page 8: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Two facts "fuse" when they contain a common node with identical attribute values

Bob

Alice

Sent e-mail to

Alice ChicagoTraveled to

Bob

Yahoo.com

Subscribes to

Page 9: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

ADVISE canonicalizes data to maximize fusion and improve searches

Chicago

chicago

windy city

ChicagoAfter

canonicalization and fusion

Applications can use any of the organization names

to get the same result

Page 10: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Information Layer

Knowledge Layer

Application Layer

Information Interface

Knowledge Interface

Semantic Graph

TemplateSubgraphs

Visualization Simulation Network analysis

...

Dynamic sourcesLists /Files

...The Information Interface supports multiple high throughput distributed information systems that send facts directly to ADVISE.

New applications can utilize the semantic graph, template subgraphs, and ontology to develop complex insights

The Knowledge Layer fuses facts and relations into a massive-scale, ontology-driven semantic graph.

Ontology

The ADVISE system model partitions the design

Page 11: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Country: IraqCity: Baghdad, IraqLocation: construction sitePerson: U.S. Embassy officialPerson: Jeffrey Ake

Relation: LOCATED_INLocatee: construction siteLocator: Baghdad, Iraq

Event: KIDNAPPINGVictim: Jeffrey AkePerpetrator: militantsLocation: construction site

HARD

Creating entities and relationships from free text is critical

“BAGHDAD, Iraq (CNN) -- A hostage shown in a videotape on an Arabic language satellite TV network Wednesday is the American executive who was kidnapped Monday at a construction site in Baghdad, according to a U.S. Embassy official.

Jeffrey Ake, president and chief executive officer of a machine manufacturing firm, was seen in the video being held at gunpoint by militants.”

Baghdadconstruction siteU.S. Embassy

official

kidnapped

Jeffrey Ake

militants

“BAGHDAD, Iraq (CNN) -- A hostage shown in a videotape on an Arabic language satellite TV network Wednesday is the American executive who was kidnapped Monday at a construction site in Baghdad, according to a U.S. Embassy official.

Jeffrey Ake, president and chief executive officer of a machine manufacturing firm, was seen in the video being held at gunpoint by militants.”

Country: IraqCity: Baghdad, IraqLocation: construction sitePerson: U.S. Embassy officialPerson: Jeffrey Ake

Relation: LOCATED_INLocatee: construction siteLocator: Baghdad, Iraq

Event: KIDNAPPINGVictim: Jeffrey AkePerpetrator: militantsLocation: construction site

Page 12: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Integrating knowledge extraction into ADVISE

Information Layer

Knowledge Layer

Application Layer

Knowledge Extraction

Extraction Engine…Extraction Engine

LouisianaFramework

ExtractorSpecificParser

ExtractorSpecificParser

OntologyTranslator & Plugins

SpudTagging Assistant

Lists /Files

Page 13: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Evaluating extraction engines• Qualitative: Show resultant graph to analysts

− They hate it

• Quantitative: Compare engine output to an answer key− Modified GATE to

evaluate extraction engine results against one another or against a hand-annotated answer key

− Hand-annotated some documents (not fun)

− Can use documents entered via Spud

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

All Entities All Links

Page 14: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Current direction for text extraction• Integration

− Improve usability of Louisiana− Add graph interactivity to Spud− Work on merging results from multiple engines

• Evaluation− Evaluate more engines

• AeroText and ClearForest on deck− Look for applicable pre-tagged document corpora− Build graph-comparison capability in ADVISE

• Collaboration

Page 15: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

?

Graph analysis environment

Graph Metrics

Pattern Analysis

Component Analysis

Strength of Association

Community Analysis

Page 16: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

We build a semantic graph from various information sources

Some data will fail to fuse

Analyzing resulting components can provideus valuable information about data fusion

The graph is based on an ontology, which only allows certain relationships

Component Analysis assists in the understanding of how graphs fuse

Page 17: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Community Analysis partitions the graph into clusters of “related” nodes

• Measure the “betweenness” of each link

• Eliminate the link with highest “betweenness”

• Stopping criterion – computed at each iteration to determine “ideal” partition

Our stopping criterion measures the density of links within communities relative to the density of links between communities - iterations stop when this is maximized

Page 18: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Community Analysis partitions the graph into clusters that may facilitate knowledge discovery

Key Uses for Graph Analysis:• Examining the semantic graph at

varying degrees of granularity

• Trials indicate a tendency to produce semantically homogeneous communities

• Metrics run on communities provide a local and more detailed analysis of a large semantic graph

Page 19: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Graph Metrics helps in the understanding of what is in the graph

• Our library of graph metrics allows us to:− Analyze high-level content− Characterize our graph/

communities− Measure knowledge

extraction performance

• Node/Link Type Frequencies• Node Degree Distributions• Path Analysis• Ontology Utilization Metrics• High Degree Node Statistics

Page 20: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

• Identify rare and common patterns

• Pattern matching

• Fuzzy pattern matching

???

Pattern Analysis determines potentially valuable information from patterns in the graph

Page 21: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

Allow pairs of nodes to be ranked according to their relative strength of association

Allow multiple paths between two nodes to be ranked according to their relative strength

Topological strength

(neighborhood)

Source-based Weight

(quantify source support)

Strength of Association allows nodes to be ranked according to their relative strength

Page 22: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

IAIA BTS RegionBTS Region NBACCNBACC Mission Mission OrganizationsOrganizations

ADVISE supports scalable knowledge management across multiple missions

Shared sources

ADVISEADVISEScalable Knowledge

Management and Integration

Semantic Graph

TemplateSubgraphs

Ontology

Knowledge Interface

Information Interface

RTAS

Semantic Graph

TemplateSubgraphs

Ontology

Knowledge Interface

Information Interface

Texas

Controlled sources

Semantic Graph

TemplateSubgraphs

Ontology

Knowledge Interface

Information Interface

TVIS

Border Ops BKC

Knowledge Discovery and Distribution

OperationalOperationalCentersCenters

IA Analysis

Page 23: Information to Insight - wired.com · Information to Insight in a Counterterrorism Context Robert Burleson Lawrence Livermore National Laboratory UCRL-PRES-211319 UCRL-PRES-211466

This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial products, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or the University of California. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or the University of California, and shall not be used for advertising or product endorsement purposes.

This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48.

UCRL-PRES-211319 UCRL-PRES-211466UCRL-PRES-211485UCRL-PRES-211467

Disclaimer and Auspices