itbi term paper by neha and avinandita

Upload: nehaharit

Post on 30-May-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    1/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    A Study on Evolution of Data Mining Techniques in FBI Post 9/11Avinandita Sarkar, MBA

    Neha Harit, MBA

    AbstractThis paper introduces the ways datamining techniques are used to counterterrorism. It concentrates on effects of9/11 attack on data mining techniquesused by FBI, throwing light onmeasures and initiatives taken up byFederal Bureau of Investigation tocounter terrorism.The paper concludeswith detailed description on how FBIhas incorporated and transitioned toIDW System.

    IntroductionData and information gathered fromthat data is an extremely valuableorganizational resource. As defined byThuraisingham, data mining is theprocess of posing queries andextracting useful patterns or trendsoften previously unknown from largeamounts of data using various

    techniques such as those from patternrecognition and machine learning.There have been severaldevelopments in the use of datamining techniques with time in the fieldof counter-terrorism applications. Thispaper will provide an overview on theuse of data mining for counter-terrorism. This also discusses datamining solutions that attempt to detector prevent terrorism and at the sametime maintain some level of privacy,

    throwing light on both non real timethreats and real threats and see howdata mining in general and web datamining in particular could handle suchthreats. Along with this an introductionto link analysis discusses how it isuseful for detecting abnormal patterns.

    After giving an introduction on datamining techniques for counter-terrorism, this paper throws some lighton transition in FBI after 9/11 attack inU.S. Post 9/11 there were certainactions taken and new programs wereintroduced. This paper throws somelight on the Investigative DataWarehouse architecture deployed byFBI, discussing the details of thedatasets acquired along with thestructure of the program followed form

    then to now.

    Survey of LiteratureData mining is the process ofextracting patterns from data. Datamining has become an increasinglyimportant tool to transform data intoinformation. It is commonly used in awide range of profiling practices, suchas marketing, surveillance, scientificdiscovery, fraud detection and

    combating terrorism. Jess Mena [1]wrote the first book to outline how datamining technologies can be used tocombat crime in the 21st century. Itintroduces security managers, lawenforcement investigators, counter-intelligence agents, fraud specialists,and information security analysts tothe latest data mining techniques andshows how they can be used asinvestigative tools. It uses clear,understandable language for novice

    readers and provides instructions as tohow to search public and privatedatabases and networks to flagpotential security threats and root outcriminal activities even before theyoccur. Hsinchun Chen [2] in his bookgives a detailed analysis aboutadvanced techniques and

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    2/14

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    3/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    impact on privacy of personal data.This paper presents a research inprogress study that investigates theneed for an expanded role of ethics indata mining. [14], CounteringTerrorism: Integration of Practice and

    Theory. [15], Article throws light onhow fast growing FBI data miningsystems billed as a tool for huntingterrorists is being used in hacker anddomestic criminal investigations, andnow contains tens of thousands ofrecords from private corporatedatabases, including car-rentalcompanies, large hotel chains and atleast one national department store,declassified documents obtained byWired.com show.[16], this is a report to

    the National Commission on TerroristAttacks upon the United Statesexplaining the FBIs counterterrorismprogram.[17], Bhavani Thuraisinghamin her book introduces how datamining has become a useful tool fordetecting and preventing terrorism,explaining technical challenges fordata mining, various types of terroristthreats how these techniques canprovide solution to counter terrorism.

    Data mining applicationsin Counter terrorismIn this we will discuss a high leveloverview of how web data mining aswell as data mining could help towardscounter terrorism. Web data mininggoes beyond just mining structureddata. We will throw some light onmining unstructured data, mining forbusiness intelligence, web usage

    mining and web structured mining as aweb data mining. This states that datamining could contribute towardscounter-terrorism, by extracting hiddenpatterns and trends from largequantities of data is very important fordetecting and preventing terroristattacks. We will be examining both thenon real time threats and real time

    threats and observe how data miningand web data mining in ourterminology encompasses data miningas it deals with data mining on the webas well as mining structured andunstructured data.

    Data Mining for Handling

    Threats

    Data used for mining purposes for

    handling of threats are grouped in

    different ways. An example of that is

    information related and non-

    information related groups of data.

    Another way of grouping is real- time

    and non real-time threats. These

    groupings are somewhat arbitrary innature, e.g. a non real-time threat

    becomes a real-time threat when a

    suspected terrorist decides to attack at

    a certain date.

    Non Real-time Threats:

    Non real-time threats are threatsthat do not have any timeconstraints. Data might be collectedover months, analyzed and thencome at a conclusion may notoccur. For data mining to workeffectively, many examples andpatterns are needed. Patterns andhistorical data are used to makepredictions. The prime requisite isgood data to carry out data miningand obtain useful results. Examplesof barriers here are incomplete dataand unwillingness of organizationsto share data. Hence mining toolshave to make a lot of assumptions

    regarding incomplete andunavailable data. An alternative isto carry out federated data miningunder some federatedadministrator.The next step is to decide whatdata needs to be collected. Mostlydata regarding various people likewhere they come from, what they

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    4/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    are doing, who are their relatives,etc. are gathered and then groupsare formed of individuals havingsimilar patterns. Individuals withcriminal records are kept under highvigilance.

    Once the data is collected, the datais formatted and organized. Datamay be structured or unstructureddata. Also, there might be data thatmay not be of much use. Therefore,the data is segmented in terms ofcritical data and non-critical data.Once the outcomes are determined,the mining tools are used to startthe mining process.After that comes the most complexpart. The usefulness of the mining

    results are to be decided. Chancesof getting a false positive or a falsenegative is pretty high and either ofthe results could be disastrous. Atpresent human specialists areneeded to work with the miningtools. If the tool states that a certainperson is a terrorist, the specialistwill have to do some more checkingbefore arresting or detaining.A non real-time threat couldbecome a real-time threat. Thechallenge will then be to find exactlywhat the attack will be? Then, datamining tools that can continue withthe reasoning as new informationcomes in, are needed, i.e., as newinformation comes in, thewarehouse needs to get updatedand the mining tools should bedynamic and take the new data andinformation into consideration in themining process.

    Real-time Threats:

    In the case of real-time threatsthere are time constraints. That is,such threats may occur within acertain time and thereforeimmediate response is required.There are several types of datamining techniques for real-time

    threats. Both hypothetical data aswell as simulated data are neededto be used. As many possiblesimilar examples should begathered from counter-terrorismspecialists. Once the examples are

    gathered and training of the neuralnetworks and other data miningtools are initiated, the next task isdeciding what sort of models are tobe built. To handle real-timethreats, dynamically changingmodels are needed. This is thebiggest challenge faced.Real time data mining is acontroversial topic as many peopleopine that it is an impossible task.Hence the challenge is to redefine

    data mining and figure out ways tohandle real-time threats.Analyzing data emanating from

    sensors is a common source of

    gathering data e.g. surveillance

    cameras placed in various places

    such as shopping centres and in

    front of embassies and other public

    places. The data emanating from

    these sensors have to be analyzed

    in real-time to detect/preventattacks. Hence arises the issues

    that raise the questions of privacy

    and civil liberties. But the real

    dilemma is what really the

    alternatives are? Should privacy be

    sacrificed to protect the lives of

    millions of people? Policy makers

    and lawyers need to work together

    to come up with viable solutions.

    Analyzing the Techniques:The goal of data mining is toanalyze data and make predictionsand trends. It includes examiningvarious data mining outcomes anddiscussing how they could beapplied for counter-terrorism. Theoutcomes of these analyses arrivedat by making associations, link

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    5/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    analysis, forming clusters,classification and anomalydetection. The techniques thatresult in these outcomes aretechniques based on neuralnetworks, decisions trees, market

    basket analysis techniques,inductive logic programming, roughsets, link analysis based on thegraph theory, and nearestneighbour techniques. The methodsused for data mining are top downreasoning which starts with ahypothesis and then determinewhether the hypothesis is true, orbottom up reasoning which startswith examples and then goes up tobecome a hypothesis.

    Several data mining techniques areAssociation techniques: Anexample of association technique ismarket basket analysis. The goal ofmarket basket analysis is to findwhich items go together. Clusteringtechniques: Clustering is atechnique where data is analyzedfrom various clusters.Anomaly detection: Anomalydetection is the technique ofobserving and analyzing deviationsfrom general pattern.

    Link Analysis:

    Link Analysis is a particular datamining technique that is especiallyuseful for detecting abnormalpatterns. Link analysis uses variousgraph theory techniques to reducethe graphs into manageablechunks. The objective is to findinteresting associations and then

    determine how to reduce thegraphs to manageable and notcombinatorially explosive results.

    A Note on Privacy

    The issue of privacy has been a topicof recent debates among the counter-terrorism experts and civil libertiesunions and human rights lawyers which

    points out that gathering informationabout people, mining information aboutpeople, conduction surveillanceactivities and examining say e-mailmessages and phone conversations,etc. are all threats to privacy and civil

    liberties. So the objective of datamining is taking a turn towardsenhancing national security but at thesame time ensuring privacy ofindividuals. A proposed approach is toprocess privacy constraints in adatabase management system. Itshould consist of levels of privacy likefully-private, semi-private, etc. Butseveral sources have said that privacyenhanced data mining may be timeconsuming and may not be scalable.

    More investigation is required on thisarea to come up with viable solutions.

    FBI Counter Terrorism

    Program post 9/11

    Since the attack of September 11,2001, Federal Bureau of Investigation(FBI) has implemented acomprehensive plan thatfundamentally transforms theorganization to enhance their ability topredict and prevent terrorism. With thisthey developed a three step plan thatprovided immediate support tocounterterrorism investigators andanalysts. This plan transitions awayfrom separate systems containingseparate data(ACS, TelApps) towardsInvestigative DatabaseWarehouse(IDW) containing all thedata that can legally be storedtogether.

    SCOPE:

    The initial step towards the IDW wasthe implementation of SecureCounterterroist Operational PrototypeEnvironment (SCOPE) program. Thisprogram quickly consolidatescounterterrorism information fromvarious data sources, providing

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    6/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    analysts at headquarters with accessto more information in far less timethan with other FBI investigativesystems. SCOPE data base even ifgave opportunity to test newcapabilities in a controlled

    environment; this has now beenreplaced by IDW.

    Investigative Data Warehouse

    The IDW, delivered in its first phase inJanuary 2004, now provides analystswith full access to investigativeinformation within FBI files, includingACS and VGTOF data, open sourcenews feeds, and the files of otherfederal agencies such as DHS.Without needing to know the physical

    location or format of the data IDWallows users to access and providesphysical storage for that data. Thedata in the IDW is at the secret level,and the addition of TS/SCI level data isin the planning stages.They have planned to enhance theIDW by adding additional data sourceslike Suspicious Activity Reports, andby making it easier to search. With thisthe agents and analysts using newanalytical tools will be able to searchrapidly for pictures of known terroristsand match or compare the pictureswith other individuals in minutes ratherthan days. This will help in identifyingrelationships across cases. The majoradvantage of this deployment is that itwill take seconds to search up to 100million pages of international terrorism-related documents.

    Master Data Warehouse

    The plan was to turn the IDW into aMaster Data Warehouse (MDW) thatwill include the administrative datarequired by the FBI to manage itsinternal business processes in additionto the investigative data. MDW willgrow to eventually provide physicaldata storage for, and become the

    system of record for, all the FBIelectronic files.

    Analytical ToolsTo make the most out of the IDW datastored, advanced analytical tools were

    planned to be used. These tools allowFBI agents and analysts to look acrossmultiple cases and data sourcesindentifying relationships and otherpieces of information that initiallywerent readily available using olderFBI systems. These tools will makedatabases searches simple andeffective, give analysts newvisualization, geomapping, link-charting and reporting capabilities andallow analysts to request automatic

    updates to their query resultswhenever new, relevant data isdownloaded into the database. Pleaserefer illustrations from 1 to 3, whichgive fictional examples that illustratehow some of these tools can assistdrawing connections between discretepieces of information.

    FBI IDW Systems

    In August 2006, the Electronic FrontierFoundation (EFF) sought governmentrecords concerning the FBI IDWpursuant to the Freedom ofInformation Act (FOIA), EFF filed alawsuit o October 17, 2006. Thefollowing data is based upon therecords provided by 2009, along withpublic information about the IDW andthe datasets included in the datawarehouse.

    Overview of IDW

    IDW is a centralized, web-enabled,closed system repository forintelligence and investigative data.According to the documents, the FBIbegan speding on IDW in 2002 andsystem implementation was completedin 2005. IDW 1.1 was released in july2004 with enhanced functionality,including batch processing capabilities.

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    7/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    FBI worked with Science ApplicationsInternational Corporation (SAIC),Convera and Chilliad for developingthe project. By March 2006, the IDWhad 53 data sources and over half abillion. By September 2008, the IDW

    had grown to nearly one billion.

    IDW System ArchitectureAccording to FBI project description,IDW system environment consists of acollection of UNIX and NT serversproviding secure access to cohort ofvery large-scale storage devices. Theservers provide application, webservers, relational database servers,and security filtering servers. IDW webapplication can be accessed through

    FBINet by the user desktop units,providing browser based access to thecentral database and their accesscontrol units. The entire configurationis designed to be scalable to enableexpansion as more data sources andcapabilities are added.

    A DOJ Inspector General reportexplained: "Data processing isconducted by a combination ofCommercial-Off-the-Shelf (COTS)applications, interpreted scripts, andopen-source software applications.Data storage is provided by severalOracle Relational DatabaseManagement Systems (DBMS) and inproprietary data formats. Physicalstorage is contained in NetworkAttached Storage (NAS) devices andcomponent hard disks. Ethernetswitches provide connectivity betweencomponents and to FBI LAN/WAN. An

    integrated firewall appliance in theswitch provides network filtering."

    IDW SubsystemsAccording to the IDW Concept ofOperations, the IDW has two mainsubsystems, the IDW Secret (IDW-S)and IDW-Special Project Tean(IDW-SPT). It also consist of a development

    platform (IDW-D) and a subsystem formaintenance and testing (IDW-I).

    IDW SecretThis system is the main subsystemof the IDW authorized to process

    classified national security data upto, and including, informationdesignated Secret. However,neither Top Secret data nor anySensitive CompartmentedInformation (SCI) is authorized tobe processed by this system. TheIDW Top Secret/ SensitiveCompartmented Information leveldatamart, appears to be in theplanning stage. This system is thesuccessor of the Secure Counter-

    Terrorism/collaboration OperationPrototype Environment.

    IDW-Special Project TeamA special project was started toaugment the existing IDW systemwith new capabilities for use by FBIand non-FBI agents on the JTTFs(Joint Terrorism Task Force) inNovember 2003 byCounterterrorism Division, alongwith the Terrorist FinancingOperations Section (TFOS). TheFBI office of Intelligence is theexecutive sponsor of the IDW. TheIDW Special Projects Team wasoriginally initiated for the 2004Threat Task Force. By May 2006,the Special Project Team providedservices to 5 task forces oroperations.

    As Described by the FBI, The

    Special Projects Team (SPT)Subsystem allows for the rapidimport of new specialized datasources. These data sources arenot made available to the generalIDW users but instead are providedto a small group of users who havea demonstrated "need-to-know".The SPT System is similar in

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    8/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    function to the IDW-S system. Withthe main difference is a differentset of data sources. The SPTSystem allows its users to accessnot only the standard IDW DataStore but the specialized SPT Data

    Store.

    IDW FeaturesDeputy Assistant Director Hulon alsoasserted that "when the IDW iscomplete, Agents, JTTF [JointTerrorism Task Force] members andanalysts, using new analytical tools,will be able to search rapidly forpictures of known terrorists and matchor compare the pictures with otherindividuals in minutes rather than days.

    They will be able to extract subjects'addresses, phone numbers, and otherdata in seconds, rather than searchingfor it manually. They will have theability to identify relationships acrosscases. They will be able to search upto 100 million pages of internationalterrorism-related documents inseconds." Since then the number ofrecords already grew ten folds.

    At FBI, Office of the Chief TechnologyOfficer (OCTO) developed an alertcapability that allowed users of IDW tocreate up to 10 queries of the systemand be automatically notified when anew document is uploaded to thedatabase that meets their searchcriteria. Users can search for termswithin a defined parameters. Forexample, the search: 'flight school'NEAR/10 'lessons' would return alldocuments where the phrase 'flight

    school' occurred within 10 words of theword "lessons." Users can also specifywhether they want exact searches, orif they want the search tool to includeother synonyms and spelling variantsfor words and names.

    "IDW includes the ability to searchacross spelling variants for common

    words, synonyms and meaningvariants for words, as well as commonmisspellings of words. If a usermisspells a common word, IDW willrun the search as specified, but willprompt the user to ask if they intended

    to run the search with the correctspelling."

    By 2006, the IDW was processingbetween 40,000 and 60,000"interactive transactions" in any givenweek, along with between 50 and 150batch jobs. An example of a batchprocess is where "the complete set ofSuspicious Activity Reports iscompared to the complete set of FBIterrorism files to identify individuals in

    common between them."

    Dataset in IDWAccording to various FBI documents,38 data sources were included in theIDW on or before August 2004.

    Automated Case System(ACS), Electronic Case File(ECF)The dataset consists of ASCIIflat files (metadata anddocument text) andWordPerfect documentsconsisting of the ECs, FD-302s,Facsimiles, FD-542s, Inserts,Transcriptions, Teletypes, LetterHead Memorandums (LHM),Memorandums and other FBIdocuments contained withinACS. The ACS system, FBIscentralized electronic casemanagement system consists ofInvestigative Case Management

    component, Electronic CaseFile component and UniversalIndex Component.

    Secure Automated MessagingNetwork (SAMNet)ASCII files in standard cabletraffic message format (allcapitals with specific header),

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    9/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    consisting of all messagingtraffic sent either from the FBI toother government agencies, orsent from other governmentagencies to the FBI through theAutomated Digital Information

    Network (AutoDIN), includingIntelligence Information Reports(IIRs) and TechnicalDisseminations (TD) from theFBI, Central Intelligence Agency(CIA), Defence IntelligenceAgency (DIA), and others fromNovember of 2002 to present.

    Joint Intelligence CommitteeInquiry (JICI) DocumentsScanned copies (TIFF images

    and ASCII OCR text) of all FBIdocuments related to extremistIslamic terrorism between 1993and 2002. These arecounterterrorism files that werescanned into a database toaccommodate the JICI'sinvestigation into the attacks ofSeptember 11th.

    Open Source NewsThe open source data collectedfor the FBI comes from theMiTAP system run by SanDiego State University. MiTAPis a system that collects rawdata from the internet,standardizes the format,extracts named entities, androutes documents intoappropriate newsgroups. Thisdataset is part of the DefenseAdvanced Research Projects

    Agency (DARPA) TranslingualInformation Detection,Extraction and Summarization(TIDES) Open Source Dataproject.

    Violent Gang and TerroristOrganization File (VGTOF)

    Lists of individuals andorganizations who the FBIbelieves to be associated withviolent gangs and terrorism,provided by the FBI NationalCrime Information Center

    (NCIC). It includes biographicaldata and photos pertaining tomembers of the identifiedgroups in the form of ASCII flatfiles (data/metadata) and JPEGimage binaries (none, one ormultiple per subject). Thebiographical data includes theindividual's name, sex, race,and group affiliation, and, ifpossible, such optionalinformation as height and

    weight; eye and hair colors;date and place of birth; andmarks, scars, and tattoos.

    CIA Intelligence InformationReports (IIR) and TechnicalDisseminations (TD)A copy of all IIRs and TDs atthe Secret security classificationor below that were sent to theFBI from 1978 to at least May2004. Intelligence InformationReports are designed to providethe FBI with the specific resultsof classified intelligencecollected on internationally-based terrorist suspects andactivities, chiefly abroad.

    IntelPlus scanned documentlibrariesCopies of millions of scannedTIFF format documents and

    their corresponding OCR ASCIItext related to FBI's majorterrorism-related cases.IntelPlus is an application thatallows the users to view "Tableof Contents" lists from largecollections of records. The useris able to display the documentwhether it is in text form or one

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    10/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    of several graphic formats andthen print, copy or store theinformation. The applicationallows tracking associateddocuments on related topicsand provides a search

    capability.

    Financial CrimesEnforcement Network(FinCEN) DatabasesData related to terroristfinancing. "FinCEN requiresfinancial institutions to preservefinancial paper trails behindtransactions and to reportsuspicious transactions toFinCEN for its database.

    FinCEN matches its databasewith commercial databasessuch as Lexis/Nexis and thegovernment's law enforcementdatabases, allowing it to searchfor links among individuals,banks, and bank accounts." Atleast one of these databasesincludes all currency transactionreport (CTR) forms on bankcustomers' cash transactions ofmore than $10,000: "In 2004,FinCEN first provided the FBIwith bulk transfer of [CTRs]"Over 37 million CTRs were filedbetween 2004-2006.

    Terrorist FinancingOperations SectionDatabasesAccording to Dennis Lormel,Section Chief of the TerroristFinancing Operations Section,

    TFOS has a "centralizedterrorist financial databasewhich the TFOS developed inconnection with its coordinationof financial investigation ofindividuals and groups who aresuspects of FBI terrorisminvestigations. The TFOS hascataloged and reviewed

    financial documents obtained asa result of numerous financialsubpoenas pertaining toindividuals and accounts. Thesedocuments have been verifiedas being of investigatory

    interest and have been enteredinto the terrorist financialdatabase for linkage analysis.The TFOS has obtainedfinancial information from FBIField Divisions and LegalAttached Offices, and hasreviewed and documentedfinancial transactions. Theserecords include foreign bankaccounts and foreign wiretransfers."

    Foreign Financial ListCopies of informationconcerning terrorism-relatedpersons, addresses, and otherbiographical data submitted toU.S. financial institutions fromforeign financial institutions.

    Selectee ListCopies of a TransportationSecurity Administration (TSA)list of individuals that the TSAbelieves warrant additionalsecurity attention prior toboarding a commercial airliner.According to Michael Chertoff,"fewer than" 16,000 peoplewere designated "selectees" asof October 2008.

    Terrorist Watch List (TWL)The FBI Terrorist Watch and

    Warning Unit (TWWU) list ofnames, aliases, andbiographical informationregarding individuals submittedto the Terrorist ScreeningCenter (TSC) for inclusion intoVGTOF and TIPOFF watchlists. Also called the TerroristScreening Database (TSDB),

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    11/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    the database "contained a totalof 724,442 records as of April30, 2007."

    No Fly ListA copy of a TSA list of

    individuals barred from boardinga commercial airplane.According to Michael Chertoff,2,500 people were on the "nofly" list as of October 2008.

    Universal Name Index (UNI)MainsA copy of index records for allmain subjects on FBIinvestigations, except certainrecords that might reveal people

    in witness protection orinformants. "A main file name isthat of an individual who is,himself/herself, the subject ofan FBI investigation."

    Universal Name Index (UNI)RefsA copy of index records for allindividuals referenced in FBIinvestigations, except certainrecords that might reveal peoplein witness protection orinformants. A "reference issomeone whose name appearsin an FBI investigation.References may be associates,conspirators, or witnesses."

    Department of State Lost andStolen PassportsA copy of records pertaining tolost and stolen passports. "The

    Consular Lost and StolenPassports (CLASP) databaseincludes over 1.3 million recordsconcerning U.S. passports. Allpassport applications arechecked against CLASP,PIERS [Passport InformationElectronic Records System], theSocial Security Administration's

    database, and the ConsularLookout and Support System(CLASS), which includesinformation provided by theDepartment of Health andHuman Services (HHS) and law

    enforcement agencies such asthe Federal Bureau ofInvestigations (FBI) and U.S.Marshals Service." "The overallCLASS database of names hasrisen to over 20 million recordsin recent years, includingmillions of names of criminalsfrom FBI records provided tothe State Department under theterms of the USA PATRIOTAct." "The Online Passport Lost

    & Stolen System permitscitizens to report a lost or stolenpassport." It includes "Name,date of birth (DOB), socialsecurity number (SSN),address, telephone number,and e-mail address," asreported by the citizen.

    Department of StateDiplomatic Security ServiceA copy of past and currentpassport fraud investigationsfrom the "DOS DDS RAMSdatabase." The RecordsAnalysis Management System(RAMS) Database "allows allField Offices, Resident AgentOffices (RAO) and the Bureauof Diplomatic Security to track,maintain, and efficiently sharelaw enforcement investigativecase information. RAMS

    contains CLASSIFIEDinformation." By September2005, the Department of Stateswas "developing a 'KnowledgeBase' on-line library that will bea 'gateway' to passportinformation, anti-fraudinformation, and relevantdatabases. All passport field

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    12/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    agencies and centers can usethis system to submit anti-fraudinformation such as exemplarsof genuine and malafidedocuments, fraud trends in theirrespective regions, and other

    information that will be instantlyavailable throughout thedepartment."

    ConclusionThis paper gives an overview ofusages of Data mining applications inthe act of counter terrorism. It gives abrief description of different types ofperceived threats (Real-time Threatsand Non Real-time Threats) and

    analysis of the techniques used tohandle these threats. It also lightlycovers the ethical dilemma of issues ofprivacy of individuals.

    Next comes the in-detailed study of the

    FBI Counter Terrorism Program post

    9/11. Post 9/11, the FBI went through

    a transition from the separate systems

    containing separate data(ACS,

    TelApps) towards Investigative

    Database Warehouse(IDW) whichcontained all the data that could legally

    be stored together. Extensive

    discussion about the details of IDW is

    covered in this paper. The final topics

    of discussion involve the analytical

    tools used to analyze the data stored

    in the IDW and the various sources

    from which these data were gathered.

    References1. Investigative data mining for

    security and criminal detection

    By Jess Mena

    2. Terrorism Informatics:

    Knowledge Management and

    Data Mining for Homeland

    Security By Hsinchun Chen,

    Edna Reid, Joshua Sinai, Andrew

    Silke, Boaz Ganoz

    3. Web data mining and

    applications in business

    intelligence and counter-terrorism By Bhavani M.

    Thuraisingham

    4. Security Informatics and

    Terrorism: Patrolling the Web

    By Cecilia S. Gal, Paul B. Kantor,

    Bracha Shapira

    5. Fighting terror in cyberspace

    By Mark Last, Abraham Kandel

    6. Justice officials defend data

    mining as anti-terror tool By

    Drew Clark National Journal's

    Technology Daily November 15,

    2002

    7. Data Mining And Counter-

    Terrorism: The Use Of

    Telephone Records As An

    Investigatory Tool In The War

    On Terror by Bryan D. Kreykes

    8. Commentary by Bruce Schneier

    Why Data Mining Won't Stop

    Terror03.09.06

    9. Ellen Nakashima, FBI Shows Off

    Counterterrorism Database, 2006

    http://www.washingtonpost.com/wp

    -

    dyn/content/article/2006/08/29/AR2

    006082901520.html

    10. MATRIX data mining system is

    unplugged, 2005

    http://www.privacyinternational.org/

    article.shtml?cmd[347]=x-347-

    205261

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    13/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    11. Robb S Todd, FBI's New Data

    Warehouse A Powerhouse, 2006

    http://www.cbsnews.com/stories/20

    06/08/30/terror/main1949643.shtml

    12. Report on the Investigative Data

    Warehouse, 2009http://www.eff.org/issues/foia/inves

    tigative-data-warehouse-report

    13. James Lawler, AStudy of DataMining and Information Ethics inInformation Systems Curricula.

    14. February 28, 2002, Countering

    Terrorism: Integration of

    Practice and Theory, An

    Invitational Conference FBI

    Academy, Quantico, Virginia

    15. Ryan Singel, Newly Declassified

    Files Detail Massive FBI Data-Mining Project, 2009

    http://www.wired.com/threatlevel/2

    009/09/fbi-nsac/

    16. A Report to the National

    Commission on Terrorist

    Attacks upon the United States,

    The FBIs Counterterrorism

    Program, 2001

    17. Bhavani Thuraisingham, Data

    Mining for Counter-Terrorism

    Illustrations

    Illustration 1

  • 8/9/2019 ITBI Term Paper by Neha and Avinandita

    14/14

    A Study on Evolution of Data Mining Techniques Post 9/11

    Illustration 2

    Illustration 3