a solution: the visual multidimensional scaling (vmds) intelligence data fusion engine

26
U.S. Army Research, Development and Engineering Command Visual Multidimensional Scaling (VMDS): an Intelligence Fusion Engine Tim Hanratty (Lead), John Brand, Ann E. M. Bornstein Andrew Niederer, John Richardson US Army Research Laboratory Computational and Information Sciences Directorate Aberdeen Proving Ground, MD 21005

Upload: yagil

Post on 15-Jan-2016

20 views

Category:

Documents


0 download

DESCRIPTION

Visual Multidimensional Scaling (VMDS): an Intelligence Fusion Engine Tim Hanratty (Lead), John Brand, Ann E. M. Bornstein Andrew Niederer, John Richardson US Army Research Laboratory Computational and Information Sciences Directorate Aberdeen Proving Ground, MD 21005. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

U.S. Army Research, Development and Engineering Command

Visual Multidimensional Scaling (VMDS): an Intelligence Fusion Engine

Tim Hanratty (Lead), John Brand, Ann E. M. BornsteinAndrew Niederer, John Richardson

US Army Research LaboratoryComputational and Information Sciences Directorate

Aberdeen Proving Ground, MD 21005

Page 2: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion

engine

Analysts need a method to determine similarity of objects under analysis to other objects in a background population, all described by sparse, non-normal data of different data types.

Multidimensional scaling (MDS) is an appropriate similarity analysis methodology but is difficult to use.

Page 3: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Why Multidimensional Scaling?

• The similarity analysis methodology must be tolerant of sparse data of dissimilar types, of varying reliability.

• Data on a person may include information such as tribal affiliation, gender, pulse rate, agreement of any bio-ID resulting from an encounter with identity documents on the person during the encounter, how many and what kinds of documents are on the person, number of entries in the intel data base linked to the bio-ID or claimed name of the person, etc.

• These data are – Seldom distributed normally, – Virtually certain to be incomplete, and– Of differing data types (e.g., ratio data such as pulse rate, nominal data such

as tribal membership).

Multidimensional Scaling (MDS) is a similarity methodology tolerant of these factors.

Page 4: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

The Visual Multidimensional Scaling (VMDS) similarity analysis engine simplifies and partially automates use of MDS, including data import/export and display of results.

A focus problem to develop the engine and its underlying methodology:

Identify High Value Individuals (HVIs) using similarity analysis of massive but sparse input data sets resulting from parallel streams of dissimilar data types, of varying reliability and arriving at different times, so that a field commander can take appropriate action:

– Release– Release and watch– Detain.

The Visual Multidimensional Scaling (VMDS) engine

Page 5: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Use of similarity analysis

Similarity analysis:

• Is complementary to graph analysis based on functional relationships, when functional relationships—A controls B, B provides services to C,D, and E, C has been seen with E and F, etc.—are known.

• Allows inferences when no information on functional relationships exists, that is, A resembles B, rather than A controls B.

• Determines resemblance based on observables to – Cue the analyst to the importance of a subject to begin the

surveillance/monitoring process,– Indicate other subjects in the subject’s circle who may also be important, once

functional information about a subject has been gathered,– Act as a filter or as an additional tool to investigate a possible enemy cell.

• Is a mathematical implementation of what police and intelligence officers have always done—

– “He looks suspicious” integrates observational and other data to form a conclusion based on unspecified and possibly subliminal recognition of similarities to known groups.

Similarity analysis is complementary to the present method of graph analysis based on functional relationships. It is a first step to Level 3 Fusion.

Page 6: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Identity (ID) is a key

• A subject may be – linked to a positive identity through biometrics and then through that ID to other

data such as civil records, or– known only by a name, possibly an alias or variant name.

• If we have a personal ID associated with a subject we may compile– A Biometric Rap Sheet of ID events dealing with that subject (presently done at

the Biometric Fusion Center)– A list of intel reports referring to that subject (presently done at NGIC).– The biometric information may then be fused with other data such as human

observations, situational information, etc.

• If the person is known only by reports citing names which may be aliases, information on that person may be linked to those aliases and a pattern of similarities with key groups determined.

• If a subject has no traceable identity, that itself is a cue to military value.

• The data is almost always sparse and of varying reliability.

Page 7: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Similarity analysis using MDS

• MDS is a similarity analysis methodology based on dimension reduction.

• MDS reduces the set of pairwise (dis)similarities of entities described in a high dimension input data set to a set of (dis)similarities in a reduced dimension solution space, (typically 2 or 3 dimensions) that follow the distances in the higher dimension set.

• The (dis)similarities in the solution set are reflected by distances in the 2- or 3-dimension solution space produced by a set of coordinates describing the location of the entities.

• The points describing the entities in the solution space may be visualized to show the closeness and, hence, similarity of the entities to each other. The display and other measures may guide further investigation.

• Effective use of MDS requires substantial knowledge on the part of the analyst, experience in choosing analytical options, and prior preparation of data.

• This knowledge can be incorporated into an analytical engine that incorporates the needed background and facilitates data preparation.

• An application has been developed to partially automate and simplify the analysis, the Visual MDS (VMDS) application.

Page 8: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Development conducted with a notional persons data base

• Methodology development was done using a data base of invented persons described by notional, plausible information.

• Data include reasonable but incomplete situational, observational, biometric ID, biometric stress indicators, ID documents, records, and intel data.

• Some data in the notional data base is congruent to real or widely accepted surrogate data sources:

Intel messages are from STEF or modeled after STEF.

Situational, descriptive, and biometric ID data is congruent to real data from Biometric Task Force, including the Iraq Biometric Watch List and a sample Biometric Rap Sheet.

• Biometric stress indicators have been widely used but are not presently used in theater due to focus on ID. They are technically accessible and based on the well accepted “fight or flight” syndrome.

• Biometric stress indicators are a cue and may reflect innocent stress as well as guilty stress.

Page 9: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Notional persons data base

A data base of 52 notional persons was constructed based on Soft Target Exploitation and Fusion (STEF) intel message set.

The 19 STEF persons are known only by reference to those names in intel messages and are terrorists of one kind or another.

An additional 33 persons were invented based on an overall scenario and individual encounter scenarios to justify the development of the known information on each person.

The 52 persons were from several population subgroups, including 10 innocents, 5 petty criminals, 8 militia, 10 non-STEF terrorists, and the 19 STEF terrorists.

A 53rd person was invented, an innocent, as a probe of the analysis methodology.

Page 10: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Development conducted with notional persons data base

Numerical attributes are in aqua

• The notional persons data base was originally embodied in Excel, a fast development environment.

• Excel does not portray associations with multiple entities easily, such as multiple identities.

• A relational data base is under development using Access.

The excerpt illustrates the ease of portrayal of the description of a person, including the abstraction of the descriptive data vector.

Page 11: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Surrogate data set congruent to real biometric ID data sets

Numerical attributes used in MDS, extracted from characteristics of notional persons data base

Attributes included in biometric watch list and biometric rap sheet—bio ID only

The biometric identity and personal descriptive information in the biometric data bases is a foundation of the notional persons data base.

The extent of the additional data beyond bio-ID—situational, biometric stress indicators, documentary, intel—is also clear.

Page 12: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Attribute vectors used as input to MDS analysis

SituationalHuman

observation Remote biosensed

Direct biosensed

Civil, criminalrecords Intel data

Heavy emphasis on biometrics—underlying ID is key and biometrics is key to ID!

Personal documents

Page 13: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Visual MultiDimensional Scaling (VMDS) Program

The Visual MDS (VMDS) package is an integrated analysis and visualization engine developed using R to carry out MDS analyses and GGobi to display the results.

It is presently in beta version.

This package integrates and partially automates

• data import/export, • MDS data map generation,• a visualization tool based on GGobi.

VMDS will include specific entity/relations analysis tools.

Analytical entity/relations analysis functions may include methods from Outlier Analysis and Facet Theory.

Page 14: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

VMDS analysis options control panel

VMDS allows import of data in a commonly used format, .csv.

Analysis options are menu driven, which helps analyst:• MDS analysis type (classical, non-metric)• (Dis)similarity function• Distance function, i.e., city black, euclidean, etc.• Dimensions in the solution set (2-, 3-D)• Icon color keying.

Data analysis will also be menu driven in later versions.

Page 15: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

VMDS visualization output

VMDS visualization tool is GGobi—three dimensional cue based on relative motion of points in constellation on rotation. Separation of groups is very clear on rotation but not in a static display.

VMDS analysis of 21-dimension version of notional persons data. Analyst used default analysis options.

Page 16: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

VMDS visualization output

Rotating the constellation shows separation of groups.

Page 17: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

3-D Rendering with static visual depth cues

The utility of static depth cueing is shown by the display above.

The 3-D coordinates of the reduced dimension solution set produced using PERMAP, an MDS engine used earlier in the development of the methodology, were inserted in a customized X3D visualizer that provides static depth cues.

The size of the spheres is indicative of closeness to the forward plane in the perspective view.• The red sphere (solid white arrow) represents the individual under analysis, RI.• Green spheres are innocents. RI clearly closely resembles innocents, with increasingly less resemblance to

criminals (blue), militia (pinkish red), and various terrorists (yellow, gold, light cyan)• Yellow and gold spheres represent the STEF terrorists; gold symbols are a hostage team that is a subset of the

STEF terrorists.• The smaller yellow sphere (dashed arrow) near RI is a STEF terrorist; its size indicates the 3-D perspective of

distance in that viewing angle—it is not actually near the green spheres or near RI. This separation becomes clear on rotation of the constellation.

Page 18: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Analyst report

• Subject of Remote Inquiry (RI) appears to most closely resemble innocents in the background population, and less closely resemble the groups that include petty criminals, militia, and several kinds of terrorists.

Analyst concludes that the subject is probably not a High Value Individual, most likely is an innocent.

Page 19: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Analyst conclusion

The ground truth is that RI is an innocent.

Analysis corroborates ground truth.

Page 20: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

View ahead

• Develop more quantitative method.

– Analysis based on map evaluation.

– Investigate quantitative methods for estimating resemblance such as Outlier Detection.

• Expand data set.

– Framework based on relatively small notional data set.

– Expanding scope of notional data set.

– Introducing problem of multiple personas vs. intrinsic identity.

• Obtain real data.

– Notional set plausible, reasonable, but need to compare with reality.

– Real data obtained from Biometric Task Force.

– Observed Tactical Network Topology Test.

Page 21: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

VMDS improvements

VMDS is being modified with additional features

• Static visual cues to 3-D depth

• User ability to designate groups and members of groups from the screen

• Computation of group centroid

• Computation of distances of selected entities to selected group centroids

• Computation and display of frequency histograms of distances of group members to group centroids, with distances of selected points to the cetroids overlain on the respective histograms

• User blogs/notes

Page 22: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

VMDS development

• VMDS will be exercised in a concept development exercise/demonstration at a quarterly Biometrics Task Force exercise

• A controlled experiment will evaluate the difference in an analyst’s ability to determine population grouping in simulated encounters with 1 to up to 500 subjects at a time. The notional data base has been bootstrapped to several hundred individuals to support this experiment

Page 23: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Summary

• A methodology has been developed and documented to perform a similarity analysis of a specific intelligence problem, determination of high value individuals.

• The methodology has been embodied in a beta version of a software package to partially automate the application of the methodology including display of the results.

• The software package will be exercised at a quarterly Biometric Task Force exercise.

• The utility of the package will be assessed in a controlled experiment with a bootstrapped population sample.

• This methodology is applicable to a general class of intelligence problems, similarity based analysis.

Page 24: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

BACKUPS

Page 25: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Person with underlying identity or verified ID (birth and or other records)(Files sometimes referred to as “personality files,” since information gives a sense of who the subject is, or their personality. By

extension, the person is sometimes referred to as a “personality.”)Underlying ID: Abdul bin Zawahiri, illegitimate son of Achmed bin Zawahiri, born Mosul 3 December 1982….

Persona 4Deceptive (false)

identity 1“Dhul Fiqar”

Persona 5Incomplete

information, only known as “Abdullah”

Persona 3Alternate legitimate name convention 3“Abdul al-Talebani”

Persona 2Alternate legitimate name convention 2

“Abdul al-Tikriti”

Persona 1Alternate legitimate name convention 1“Abdul bin Zawahiri”

Encounter 1.Detained at check point, subject used real name corroborated by bio-ID.

Bio-ID as

Encounters 2, 3.Detention at check point, no bio-ID kit available. Subject’s bogus ID cards accepted as real, analyst unable to link to underlying identity.

Encounter 4.Arrest in raid, Police recognize as al Zawahiri, use bio-ID kit, determine underlying ID.

Bio-ID unavailable

Bio-ID as

Encounter 5.Informant report, no other info, no bio-ID, analyst unaware of ground truth ID.

no other info

Example of relation of multiple personas to underlying ID

Bio-ID unavailable

Page 26: A solution: the Visual Multidimensional Scaling (VMDS) intelligence data fusion engine

Possible representation of multiple personas and underlying ID

Excel representation of multiple personas—the underlying ID of an individual is assumed known, one Muhammad_al_Rekh. Al_Rekh is also known under three aliases used in three encounters, each encounter involving a different alias, with different situational data, a different claimed background for the false identity, different stress related biometric data. All the encounter aliases are linked biometrically to the same underlying identity.