restructuring transactional data for link analysis in the fincen … · 2006-01-11 ·...

9
Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg** and Raphael W.H. Wong U.S. Department of the Treasury,Financial Crimes Enforcement Netnvork (FinCEN), 2070Chain Bridge Road, ViennaVA 22182 **Current address:National Association of Securities Dealers (NASD) Regulation, Inc., 9513 KeyWest Avenue, Rockville MD 20850 [email protected], [email protected] Abstract Due to the nature and costs of data collection, many real- world databases consist of large numbers of independent transactions. Finding evidence of structured groups of entities reflectedin this datais a task aptly suited to Link Analysis. However, the databases usually must be restructured to alloweffective search and analysis of the linkage structures hidden in the original transactions. The FinCEN AI System (FAIS)[Senator 1995] is an example such an application. We briefly discuss the process of database restructuring and show how it is used to support the discovery and analysis of evidence of money laundering in a database of cash transactions. Introduction Transactional Databases In our modern world, muchof human activity is initially recorded in terms of individual transactions. Both the government and commercial sectors produce tremendous quantities of transactional data, often with little additional analysis being done at the time of collection. Interpretation, search, organization, even error correction and re-formatting may be left for the analyst at somelater time. This maybe due to the cost of doing such analysis on all the data, or it may be that the particular questions to be answered were not known when the data collection system was designed. Legal and regulatory prohibitions mayalso stand in the wayof collecting certain data about every actor, indiscriminately. In any case, we are often faced with a large number of relatively uninteresting transactions, with little data in each * The authors of this paper are (or were) employees of the Financial CrimesEnforcement Network ~ of the U.S. Department of the Treasury, but this paper in no way represents an official policy statement of the U.S. Treasury Department or the U.S. Government. The views expressed herein are solely those of the authors. This paper implies no general endorsement of any of the particular products mentioned in the text. that might be used to select out the relevant ones for further analysis. In situations where we are looking for fraud in transactional data, the fraudulent activities are likely to be camouflaged to look like normal activities. The actors involved may spend much of their time in normal, uninteresting, activities and only occasionally in fraudulent ones. [Goldberg 1997] It is clear to many analysts who work with this data that the inter- relationships among transactions hold the key to select the proper subset and to understand the implicit activities hidden in the data -- activities which are incompletely reflected in the recorded transactions. The analysis of these relationships, called Link Analysis, is a vital technique used by law enforcement and intelligence analysts the world over. [Andrews 1990] Finding Hidden Structure Statistical and other data mining methods (which often are formulated to model and characterize populations of similar instances) can analyze sets of transactions to show consistent or novel relationships among data within a single transaction or among subsets of transactions which are clearly identified by inherent identifiers. However, fmding structural relationships among transactions, which reveal the structure of the enterprises or organizations involved, requires that the database be restructured to makethese relationships more explicit. [Goldberg 1995] Once restructured, the database may be queried using a number of well known approaches for on-line analytical processing (OLAP). The costs of restructuring, borne once, can then benefit a number of subsequent analyses. We will discuss three of the most common restructuring processes, which are employed by FAISin restructuring the Bank Secrecy Act (BSA) data for analysis, disambiguation, consolidation, and aggregation. We will then describe some of the specific challenges of doing this with the BSA data, and finally, we will demonstrate howthe restructured database supports two levels of link analysis for the detection of money laundering and other financial crimes. 38 From: AAAI Technical Report FS-98-01. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved.

Upload: others

Post on 19-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Restructuring Transactional Data for Link Analysis in the FinCEN … · 2006-01-11 · Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg**

Restructuring Transactional Data for Link Analysisin the FinCEN AI System

Henry G. Goldberg** and Raphael W.H. Wong

U.S. Department of the Treasury, Financial Crimes Enforcement Netnvork (FinCEN),2070 Chain Bridge Road, Vienna VA 22182

**Current address: National Association of Securities Dealers (NASD) Regulation, Inc.,9513 Key West Avenue, Rockville MD [email protected], [email protected]

AbstractDue to the nature and costs of data collection, many real-world databases consist of large numbers of independenttransactions. Finding evidence of structured groups ofentities reflected in this data is a task aptly suited to LinkAnalysis. However, the databases usually must berestructured to allow effective search and analysis of thelinkage structures hidden in the original transactions. TheFinCEN AI System (FAIS) [Senator 1995] is an example such an application. We briefly discuss the process ofdatabase restructuring and show how it is used to supportthe discovery and analysis of evidence of money launderingin a database of cash transactions.

Introduction

Transactional Databases

In our modern world, much of human activity is initiallyrecorded in terms of individual transactions. Both thegovernment and commercial sectors produce tremendousquantities of transactional data, often with little additionalanalysis being done at the time of collection.Interpretation, search, organization, even error correctionand re-formatting may be left for the analyst at some latertime. This may be due to the cost of doing such analysison all the data, or it may be that the particular questions tobe answered were not known when the data collectionsystem was designed. Legal and regulatory prohibitionsmay also stand in the way of collecting certain data aboutevery actor, indiscriminately.

In any case, we are often faced with a large number ofrelatively uninteresting transactions, with little data in each

* The authors of this paper are (or were) employees of theFinancial Crimes Enforcement Network~ of the U.S.Department of the Treasury, but this paper in no wayrepresents an official policy statement of the U.S. TreasuryDepartment or the U.S. Government. The views expressedherein are solely those of the authors. This paper impliesno general endorsement of any of the particular productsmentioned in the text.

that might be used to select out the relevant ones forfurther analysis. In situations where we are looking forfraud in transactional data, the fraudulent activities arelikely to be camouflaged to look like normal activities.The actors involved may spend much of their time innormal, uninteresting, activities and only occasionally infraudulent ones. [Goldberg 1997] It is clear to manyanalysts who work with this data that the inter-relationships among transactions hold the key to select theproper subset and to understand the implicit activitieshidden in the data -- activities which are incompletelyreflected in the recorded transactions. The analysis ofthese relationships, called Link Analysis, is a vitaltechnique used by law enforcement and intelligenceanalysts the world over. [Andrews 1990]

Finding Hidden StructureStatistical and other data mining methods (which often areformulated to model and characterize populations ofsimilar instances) can analyze sets of transactions to showconsistent or novel relationships among data within asingle transaction or among subsets of transactions whichare clearly identified by inherent identifiers. However,fmding structural relationships among transactions, whichreveal the structure of the enterprises or organizationsinvolved, requires that the database be restructured tomake these relationships more explicit. [Goldberg 1995]Once restructured, the database may be queried using anumber of well known approaches for on-line analyticalprocessing (OLAP). The costs of restructuring, borneonce, can then benefit a number of subsequent analyses.

We will discuss three of the most common restructuringprocesses, which are employed by FAIS in restructuringthe Bank Secrecy Act (BSA) data for analysis,disambiguation, consolidation, and aggregation. Wewill then describe some of the specific challenges of doingthis with the BSA data, and finally, we will demonstratehow the restructured database supports two levels of linkanalysis for the detection of money laundering and otherfinancial crimes.

38

From: AAAI Technical Report FS-98-01. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved.

Page 2: Restructuring Transactional Data for Link Analysis in the FinCEN … · 2006-01-11 · Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg**

Transforming Transactions into Links and

Nodes

Nodes: Finding Relevant EntitiesAn essential, initial activity in preparing data for linkanalysis is to identify relevant entities and actors in thetransactional data. Applying domain specific knowledge isvery important for this task. For example, in a database ofmedical transactions, every unique individual must beconsidered a separate entity. However, in a database ofcommercial transactions, families or businesses mayrepresent the correct level of granularity for analysis.Representing the real-world entities which underlie thetransactions, and which are reflected in various identifiersin the data, is the role of consolidation. Correcting forerrors (and intentional misrepresentations) in this data the process of disambiguation.

These two activities are closely related and can be quiteresource consuming. For example, in [Stolfo 1997] aprocess called "merge-purge" is employed to disambiguateidentifiers on credit card transactions. No single identifieris relied upon exclusively, but a distance measureinvolving a number of identifiers is used through a cost-effective means of cross-comparing limited sub-sets oftransactions [Hernandez 1995]. The resulting sets ofrecords can then be consolidated into a profile of activitythat can be analyzed for the likelihood of fraud.

In cases where relatively error-free identifiers areavailable in the data, it is still necessary to consolidatemultiple transactions in order to evaluate the activities ofthe identified entities. Once the sets of records areconsolidated, straightforward aggregation functions suchas sums and averages, as well as more complex functions(e.g. determination of the most likely spelling of a namegiven a number of near misspellings), may be easilyapplied to produce a representative profile of theconsolidated records.

R

T T

R

/\

t

R il/LConsolidation 1

~J Consolidation 2

R R RPartition 2

Figure 1 : Multiple Consolidations

We should note that the activities of consolidation andaggregation are essential in many domains which deal withmultiple occurrences of similar activities. For example, ina task domain of learning abstract concepts from robotsensor data, [Rosenstein 1998] refers to clusteringinstances into categories, and abstracting from categories

to form prototypes. The critical point about data for linkanalysis, is that the consolidation and aggregation must notreplace the individual transactions, since they provide therelationship information which will eventually connect theconsolidated entities. In Figure 1 (from [Goldberg 1995])consolidation on transactions, T (defined to have twoactors), has identified a set of entities, R, which may belinked together by the transactions in which they"participate". Domain specific knowledge will indicatewhether the sets are disjoint (e.g. depositors and accounts),intersecting (e.g. doctors and patients)., or identical (e.g.traders in a stock market).

Links: Finding Meaningful RelationshipsIn order to support effective link analysis, linkagesbetween entities should represent meaningful relationshipsas defined by the domain. Thus, linking an individual withan account into which he makes large cash deposits ismeaningful for the task of fending money laundering, whilelinking him to the other depositors at that bank is probablynot. This is domain specific -- apparent only from ourunderstanding of how banks (and money laundering)operate.

Transactions in databases generally refer to entities(actors) by a f’Lxed set of identifiers. They may be viewedas vectors of identifiers of entities and transaction details,as in:

T=[il,i2,...,j 1 ,j2,...,tl,t2,...]where il, i2,.., are attributes of entity i;

j 1, j2 .... are attributes of entity j;and tl, t2,.., are attributes of the transaction.

disambiguation may then be seen as a clustering operationin the space(s) of attributes, I, J, etc. I may be identical J if the transactions relate entities from the samepopulation, e.g. in telephone calls between two phonenumbers, of they may be separate, e.g. deposits by a personinto a bank account. Such a transaction is seen asproviding primary evidence for a link, L[i,j]. It is therealization of these links in the supporting database whichallows rapid search and display of the linkage structuresinherent in the transactional data.

The transaction above also provides evidence foradditional links, since the association of attribute values isa relationship which may be of secondary interest to theanalyst as well. For example, after finding a set of bankdeposits which relate two individuals to some accountsowned by a business, further analysis might f’md that theyoccasionally use the same social security number. Thismay be a case of the same person using an alias, or of twopeople using one id. Knowledge of the transactioncollection methods will help to determine the a priorilikelihood of each. Finer grained link analysis will show

1 Consolidation may also be viewed as a kind of link

analysis, with an identity match providing a link betweentransactions. The consolidated set of transactions is thetransitive closure over these links.

39

Page 3: Restructuring Transactional Data for Link Analysis in the FinCEN … · 2006-01-11 · Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg**

these associations, and may be of interest in evaluating thenetwork. We will see that these two levels of granularityof linkage are used by FAIS to support first search andretrieval and then detailed analysis of networks.

The FinCEN AI System: Cash Transactions to

Money Laundering Enterprises

The FinCEN AI System was designed to exploit the BankSecrecy Act data to support the detection and analysis ofoperations engaged in money laundering and otherfinancial crimes. We will provide a brief description of theproblem, the data, and the database. We will then discusshow the data is restructured for link analysis and how linkanalysis is performed. Finally, we will mention somechallenges which are seen as areas for futureimprovements in the system.

Subjects

Accounts

ransactions

Figure 2: Clusters of Transactions

Problem Description

Money Laundering is a complex process of placing theprofit, usually cash, from illicit activity into the legitimatefinancial system, with the intent of obscuring the source,ownership, or use of the funds. Money launderingtypically involves a multitude of transactions, perhaps bydistinct individuals, into multiple accounts with differentowners at different banks and other financial institutions.

Generally, money laundering schemes usually involvetwo phases: placement and layering. Money is placed invarious bank accounts, "spent" at casinos and otherbusinesses, and used to "purchase" phantom goods.Financial transactions are often structured to avoid thereporting rules. Multiple stages of money transfer areemployed to disguise the true origin and ultimateownership (layering). Money is transferred amongaccounts and in and out of the country by a variety ofinstruments, including checks, wire transfers, andsmuggled currency. The ownership of these accounts ishidden through a variety of means. Names and locationsof businesses involved are continually changed. Prior tostrict enforcement of the reporting laws, bank officialsoften cooperated wittingly or unwittingly in these schemes.There is still the danger that legitimate f’mancial (andother) businesses will be suborned by the criminalorganizations behind these activities.

The BSA Data. To combat money laundering, the BSArequires reporting of cash transactions in excess of$10,000. In addition, related transactions which inaggregate exceed the limit and any other transactionswhich appear suspicious to the bank officials are to bereported as well.

This record keeping preserves a financial trail forinvestigators to follow and allows the Government tosystematically scrutinize large cash transactions. FAISaccumulates about 12 million transactions per year. Themajority are CRTs (Currency Transaction Reports) fi!ed banks, credit unions, etc. A small number of relatedreports are filed by gambling casinos. Another, smallernumber of CMIR reports (Cash and Monetary Instruments)are accumulated. These are filed by anyone entering orleaving the USA with currency or other negotiableinstruments valued at $10,000 or more. The data reportedon the forms are subject to errors, uncertainties, andinconsistencies that affect both identification andtransaction information.

These three types of reports are designed to draw aprotective curtain around the legitimate financial industry,protecting it from most sources of undocumented largeamounts of cash. Since most criminal activities dependupon cash at the street level, it was believed that this datawould suffice. While other channels are available formoving cash (and new ones are coming in various forms ofelectronic currency and the Internet), these datainstruments have served to capture some "footprints" ofmany money laundering operations.

Consolidation and Aggregation in the FAIS Database.The FAIS database consists of individual transactionrecords of the three form types mentioned above. Subject(people and businesses) and bank account records arecreated during the process of disambiguating inputidentifiers and consolidating transactions with those ofprior subjects and accounts. This operation results inconsolidated sets of transactions (called clusters),illustrated in Figure 2, with associated collections of

1000000

100000

100OO

IOO0

Frequency

IO0

10

1

¯ : : : :" : ’

10 100 1000 1OO00

#of Links

Figure 3: Cluster "Fan-out"

4O

Page 4: Restructuring Transactional Data for Link Analysis in the FinCEN … · 2006-01-11 · Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg**

FAIT Main Targeling Window

Case: <none>

4ii!i!ii!ia a f.address id = address.address id ii

ANCTa_a_f, aJi~ id = a]ias.aJias_id -AND a_a_f.fintran_id = flntran.fintran_ldAND (addresszip code LIKE "cJ2%’)AND fa~ as,ocoJostlon L KE "L QUOF~’,4,7

Locations )

Dates )

Age "~

Target ;-

EXEC Query) View Results

ImportFile ):i~ i-::

Clear,! ;i;. QUIT )~=;-ii :;

Figure 4: IQI Main Query Windowidentifiers, aggregated money and transaction data, and thelinkage of subjects and accounts to individual transactions(and thence to one another). The disambiguation processis tuned to be very conservative, since it is easier toaggregate further, if needed, than to split clusters thatmistakenly combined separate entities. This consolidationprocess transforms the database from a purely transactionalone to a form which can support direct link analysis via(efficient) database operations.

Figure 3 shows the distribution of cluster "fan-out" - thenumber of clusters immediately linked by one transactionto each cluster. This distribution is an estimate of the sizeof the networks which are available for analysis, althoughconservative disambiguation tends to make these smallerthan their true size, and as we will show, we rely uponhuman intervention to gather additional networks together.It may be seen that the distribution is highly skewed, witha few clusters (primarily businesses engaged in largelycash sales, such as supermarkets) linking to large numbersof others (usually employees and bank accounts of manyseparate retail branches). It is tempting to try to eliminatethis data from the input, however, due to regulatory as wellas practical reasons, this has not been done.

The Role of Link Analysis in FAISLink analysis plays a critical role in FAIS, both in networkacquisition and analysis. In this section, we discuss therole of link analysis in identifying relationships, exposingstructure of organizations, and characterizing the roles ofkey actors. We describe two levels of link formationwhich enable these analyses to be performed over a hugedatabase. Finally, we illustrate the process of finding andanalyzing a network in FAIS through a series ofillustrations.

Identifying Relationships. Money laundering is based onrelationships and the methods for hiding them from theknowledge of law enforcement. Disentangling the resultsof layering requires that we detect and represent therelationships hidden in the transactional data. Theserelationships include both the channels for movement ofmoney, and the ownership and operational relationshipsamong entities in the organization. Identifying the moneytransfer relationships allows users of FAIS to inferownership and operational relationships (e.g. commondepositor, joint ownership, multiple businesses owned by asingle entity, etc.)

Key Actors and Roles. Certain entities in the networkmay play essential roles in its operation, yet be hiddenfi-om immediate access in the transactions. Roles such ascourier, pass-through account, or shell corporation aremore readily seen in the structural map of the organizationthan in the raw data. Furthermore, poor qualitydisambiguation (due to the difficulties of dirty data,intentional misidentification, and missing transactions) canbe corrected by the analyst when common relationships areseen to imply common identity.

FAJTMain Targeting Window

Localion: : : :: -~

Figure 5: Constructing a Location Query

41

Page 5: Restructuring Transactional Data for Link Analysis in the FinCEN … · 2006-01-11 · Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg**

Iniliai Report- Subjects

¯ Subjects v]. N~tworks v . Assodations ,--’~" Toolsr . D_isplay~

Search Parameters

24538 45 1984 PERSON-21915 ID-18153 LIQUOR STORE

25460 16 651 BUSINESS-21791 i ID-18153 LIQUOR STORE

2.545~J 15 618 PERSOI’,J-:?_1790 ID 2

24540 5 1E2 BUSINESS-20947 ID-2

883 4 115 PERSON-778 !D-12595 700124

884 4 115 BUSINESS-719 ID-101!2

2338 4 101 BUSINESS 1350 ID-1,9484

2337 4 107 PERSON-1949 ID- 18494 450222

24539 4 157 PERSON-20946 ID-19364

26674 3 174 BLIS!NESS-22789 ID- 19364

Structure of Enterprises. The overall structure of anorganization can explain a great deal about its operation.

Figure 6: Initial Subject Retrieval

recognition of this structure and the roles of the individualentities. (While most of the illustrations in this paper aretaken from a "scrubbed", demo database, Figure 13 is

Text taken from a real moneylaundering case. It isincluded to give anaccurate feel for the linkanalyses which FAISsupports.)

L

Monthly Aggregates for Cluster: 24538

TO CmipBo~d ) . ( Done )

Two Levels of LinkAnalysis

Database Restructuringfor Node Evaluation andRapid Search. As wehave described above, thedatabase of BSAtransactions isrestructured toconsolidate transactionsand link subjects andaccounts. Thisrestructuring makesexplicit in the databasethe subject-accountrelationships inherent inFigure 7: Cluster Details

In the final figure we see a large link diagram taken froma case involving many "legitimate" businesses funnelinglaundered money through a single account. Therelationship of the two sets of businesses and individualson either side of this account becomes clear through the

the individualtransactions (as well as relationships between multiplesubjects such as owner-depositor, or multiple accounts).The system can make use of these relationships to providerapid search and retrieval to the analyst. Additionally,consolidation links multiple identifiers (from many

42

Page 6: Restructuring Transactional Data for Link Analysis in the FinCEN … · 2006-01-11 · Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg**

I- Accounts#-~counts -- " Net, vc&s ~ . . _A.ssoda’Jons r.~. Tools $3. _Display -.

laundering is known toinvolve certain typesof businesses. He canfocus on this location,and specify certaintypes of businesses.2

The initial retrievalis directed at a set ofsubjects (or accounts)fitting the analyst’sspecifications. Thisset (Figure 6) maycontain unrelatedclusters. Also,important ones may bemissing. However, theuser can inspect theentities retrieved indetail (Figure 7), sinceFigure 8: Linked Accounts

transactions) each subject or account, increasing the abilityof the analyst to retrieve relevant data based on a structurequery.

In addition to human evaluation of the clusters producedby this process, automated evaluation using rule-basedinference is also implemented. This aspect of the system isdiscussed in greater detail in [Senator 95].

the consolidationprocess associates

aggregated data from all transactions in the cluster with theentity. After isolating those clusters which seem to be ofinterest, the analyst can use the IQI to widen the search byretrieving other clusters linked to the specified set by oneor more transactions. Figure 8 shows all accounts linked tothe highlighted subjects selected in Figure 6.

Subjects

. Subjects v) Networks v" Associations ~’~ Tools v~,.Display v)

~C~;~ ~5‘~=-~‘~T:~:~‘£:%~,~‘?~‘~-~7~-‘:‘~‘~-~-~~‘~‘‘~:~-~ ~-~- ~~ ~o ~~~--?~ I[II ii I~Itr~ .~ ¢I ~ I~:~

,iiii~ i I¢,~ i~. i.l I~ H,, II"~I ~.-- i~,,~, .-- I II&~/4 IIIt~¢~, ~= ~

i- 24598 2 1998 PERSO -2191 ,D-18153 L,OOOR STORE24539 0 157i #-ER~ i ID-19364 LIQUOR STORE2454o 0 182" B-U~i-N~Z - ID-2 U~UOR STONE ~41i~?:i

~-i 25459 1 619 PERSON-21790 ID-2 LIQUOR STORE26674 0 174 BUSINESS-22789 ID-19364 LIQUOR STORE~--~-,~1

~I 25460 1 653 BUSINESS-21791 ’D-18153 LIQUOR STORE ~-.-’-~_~1~’~ 25464 O 67 PERSON-21797 ID-18071 6008.3 MANGER dQUOR STO~al~---i 25465 0 67 PERSON-21798 D-2 L QUOR STORE ~@~1---.~:-~ 25572 I 1098 BUSINESS-21916 ID-2 LIQUOR STORE ~~.-~ 28229 0 14 PERSON-24270 ID-19235 341211 OWNER LIQUOR STORE ~-~

o 98 us, ESS-31o2 ,D-2 uQUOR STORE

.... .< ~.,.:. ~ ~.~.~..~:.~ ~.~=;.~_~:~< _.~ ~_-~...-~:-.~__~..~-.~:.&-<~ ~~ Done --- ~-.~-~-~-"-~-~.4.I

Figure 9: Linked Subjects (2nd Ply)

Using the IQI for Network Acquisition. FAIS suppliesthe user with an Interactive Query Interface enabling rapidsearch and retrieval of relevant data. Figure 4 shows themain IQI window. Queries based on a number of differenttypes of identifier data are constructed (removing the needfor analysts to learn complex SQL). For example, Figure shows the construction of a location query. The analystmay have an interest in a particular location where money

This process may continue as long as necessary.

2 The latter is rather unreliable, since the "occupation or

business" field in the CTR is supplied by the transactorrather than the financial institution. However, these dataproblems may be compensated for by the consolidationand linkage searches implemented in FAIS.

43

Page 7: Restructuring Transactional Data for Link Analysis in the FinCEN … · 2006-01-11 · Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg**

(Transitive closure is usually reached in a few iterations.)At any ply, the user may search for subjects or accountsdepending on their understanding of the structure of thenetwork and data.3 Figure 9 shows the set of subjectslinked to the accounts of Figure 8. Since further searchesyield no new clusters, the process is ended here and thesecond phase of link analysis is entered.

Detailed Link Analysis of Isolated Networks. A set ofclusters may be selected and extracted for analysis in AltaAnalytics’ NetMap tool [Davidson 1993] using one of anumber of prepared link analysis models. Since thedatabase links all transactions to the clusters, and sinceclusters are accumulated in each ply of the IQI searchprocess, the full set of transaction data may be specified inthis way. We go back to the raw transaction data toacquire the full set of relationships -- both the transactionalassociations, e.g. person X deposited into account Y, andthe consolidation associations, e.g. name A occurred withaddress B -- in a uniform model amenable to link analysis.

Figure 10 shows the initial NetMap display (wagonwheel format) of the data underlying the selected subjectsfrom Figure 9. This display may be manipulated in anumber of ways. For example, we remove thetransactions, since we are more interested in the entitiesinvolved, and convert the display to a node and edge formfamiliar to many law enforcement analysts (Figure 11).This display is shown in a partial state of beingmanipulated to bring out the structural aspects of theorganization behind these transactions. All underlyingdata which has been used to form linkages, and anyadditional data we feel may be useful is available for theanalyst to drill down, and can be seen in closeup zoom(Figure 12).

An example of a finalized display is shown in Figure 13.It is drawn from a successful case involving a large moneylaundering operation. The role of the circled account isvery clear from the structure of the network, as well as therelative roles of the businesses and individuals in the two

separate sub-nets. Common features ofthe subsets of clusters (e.g. ethnicorigin of the names of individuals andforeign destinations of money transfers)lead to further understanding of thehidden relationships of the organizationthus uncovered.

Figure 10: Initial Link Diagram (Wheel)

3 If the subject has a number of Casino or CMIR reports,

which do not link to accounts, the analyst would probablysearch for more subjects at the next ply.

ChallengesAmong the challenges for improvingFAIS, the following are especiallyrelevant to link analysis and the role itplays in the system. While FAIS hasbeen a great success in helping analystsin investigating leads or developingnew leads and in analyzing complexcriminal organizations, we haveidentified the following as high payoffor critical areas for additional effort.Network Analysis. It is apparent thatthe final stage of link analysis in FAISis still quite labor-intensive.Algorithmic support for graphical re-organization of networks would reduceanalyst time significantly, allowingmore time for investigative analysis.Other supporting link analysis"methods" are required as wellincluding for example, search forsimilar sub-nets and automatic

identification of "key" nodes. While the "man in the loop"mode of operation that FAIS implements has been onesource of its success, analysts need more powerful tools todeal with this data in a timely manner.

44

Page 8: Restructuring Transactional Data for Link Analysis in the FinCEN … · 2006-01-11 · Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg**

nelmap - bsa

Figure 11 : Reorganizing Link and Node Diagram

Processing Time. One challenge for FAIS is to reduce thesearch time necessary to build the clustered subject andaccount nodes. We have shown how, oncebuilt, these database structures enable theanalyst to conduct their research on theBSA data in a more meaningful, abstractedspace of entities and relationships. The costof maintaining this transformed database asthe data grows over time is so high that anumber of simplifying assumptions havebeen made. The name and address matcheshave been decomposed into indexeddatabase queries to be computable. Searchtechniques from CBR may be useful here.Richer, more knowledge-based4

disambiguation, can be implemented in apractical system when better databasesearch techniques allow more processingtime to be allocated to this activity.

Merging Other Data with the BSA.Another challenge of interest to FinCEN isthe need to cross reference the informationin the BSA data with other sources of highvalue information. For example, the

4 For example, a better understanding of the nature of

ethnic name conventions and variations would improvesubject disambiguation.

Suspicious Activity Report (SAR) record, new data instrument now being collectedsince April 1, 1996, offers a potential sourceof linkages, associations, and relationshipsfrom information residing in different datasources. We see as critical the ability toidentify and cross reference common entities,and merge the linkage data to allow analysesover the expanded networks. While theexample mentioned above is an establisheddata source, and a special application may bedeveloped to support this merger, manysources of additional data are case-specific,e.g. seized records, the results of courtordered wiretaps, etc. An easy, uniformmethod for support these mergers is required.Such an approach might be based on anoverall semantic description of the domain, inwhich various data sources can be modeled.

Temporal Link Analysis. The fmalchallenge is our interest in finding temporallinkages in the data. Time is a key feature thatoffers an opportunity to uncover linkages thatmight be missed by more typical data analysisapproaches. A simple example of telephonecall records can make this point clear.Assume that we have the subpoenaed recordsof calls involving two telephone numbers, Aand B. They may contain records of two

other numbers, X and Y, which never contact one another.However, we might fred from temporal analysis that there

netm~lp -

Figure 12: Node Closeup

exists an inferred link by showing that a call from A to X istemporally proximate to a call from B to Y in a highpercentage of instances. We may infer that the two arecorrelated and hypothesize a linkage between X and Y,

45

Page 9: Restructuring Transactional Data for Link Analysis in the FinCEN … · 2006-01-11 · Restructuring Transactional Data for Link Analysis in the FinCEN AI System Henry G. Goldberg**

given our understood linkage of A and B. Anotherexample might involve deposit to an account. Periodicdeposits which fit the business cycle of the "legitimate"cover may be discounted, simplifying the analysis, while

~_ NETMAP Presentation Tool - lusrPapps/disUnetmap,4.13/nelma: =____

: File _E~t D_i~pl~j :Optior~ _Col~ LIP,# Ie~t HodB: ~: =::

Figure 13: Final Presentation of an Money Laundering Case

deposits at odds with this cycle may indicate the flow oflaundered funds (e.g. cash deposits by a ski resort in thesummer).5 These kinds of temporal analyses offer theopportunity for greatly improved detection of criminalactivity, as well as the opportunity to filter out legitimateactivities at an earlier stage in the process.

Summary

We have discussed the fundamental role of databaserestructuring and link analysis in detecting and analyzingcomplex activities hidden in transactional data. Wedescribed the processes of entity disambiguation andtransaction consolidation and the utility of creating explicitlinkages in the supporting database. We demonstrated how

5 Or they may indicate some other unusual activity. One

story of such an analysis involves the observation of agreat deal of cash being transferred from banks in aparticular city over one weekend. Observed andinvestigated, the reason turned out to be the Super Bowl.

these processes are combined in FAIS to support search,retrieval, and analysis of complex networks of transactionsand entities which are the database "footprint" of criminalorganizations engaged in money laundering and other

financial crimes. Finally we introducedsome areas where algorithmic support wouldgreatly improve the efficiency andeffectiveness of the system -- more powerfulnetwork analysis tools, better and fasterconsolidation, merger of other data into themodel, and temporal analysis of links.

References

Andrews, P. P. and Peterson, M. B. eds.1990. Criminal Intelligence Analysis.Loomis, CA: Palmer Enterprises.

Davidson, C. 1993. What Your DatabaseHides Away. New Scientist 1855:28-31 (9January).

Goldberg, Henry G. and Senator, Ted E.,"Restructuring Databases for KnowledgeDiscovery by Consolidation and LinkFormation," Proceedings of the FirstInternational Conference on KnowledgeDiscovery and Data Mining, Montreal,August, 1995.

Goldberg, Henry G. and Senator, Ted E.,"Break Detection Systems," Proc. ofWorkshop on ,41 and Fraud Detection, (at14th National Conferemnce on ArtificialIntelligence, AAAI-97), Providence, July1997.

Hemandez M A and Stolfo S J (1995), A Generalization Band Joins and The Merge/Purge Problem, Department ofComputer Science, Columbia University, New York, NY.

Rosenstein, Michael T. and Cohen, Paul R., "Conceptsfrom Time Series", Proc. 15th National Conf. on AI, July1998.

Senator, T.E., Goldberg, H.G., et. al., "The FInCENArtificial Intelligence System: Identifying Potential MoneyLaundering from Reports of Large Cash Transactions," inProc. 7th Annual Conf. IAAI, Aug. 1995.

Stolfo, Salvatore J., Fan, David W., et. al., "Credit CardFraud Detection Using Meta Learning: Issues and InitialResults," Department of COmputer Science, ColumbiaUniversity, New York, 1997.

46