uncovering the darkweb: a case study of jihad on theweb
DESCRIPTION
While the Web has become a worldwide platform forcommunication, terrorists share their ideology and communicatewith members on the “DarkWeb”—the reverseside of the Web used by terrorists. Currently, the problemsof information overload and difficulty to obtain acomprehensive picture of terrorist activities hinder effectiveand efficient analysis of terrorist information on theWeb. To improve understanding of terrorist activities,we have developed a novel methodology for collectingand analyzing Dark Web information. The methodologyincorporates information collection, analysis, and visualizationtechniques, and exploits variousWeb informationsources. We applied it to collecting and analyzing informationof 39 JihadWeb sites and developed visualizationof their site contents, relationships,and activity levels. Anexpert evaluation showed that the methodology is veryuseful and promising, having a high potential to assist ininvestigation and understanding of terrorist activities byproducing results that could potentially help guide bothpolicymaking and intelligence research.TRANSCRIPT
-
Uncovering the Dark Web: A Case Study of Jihadon the Web
Hsinchun ChenArtificial Intelligence Lab, Department of Management Information Systems, The University of Arizona,Tucson, AZ 85721, USA. E-mail: [email protected]
Wingyan ChungDepartment of Operations and Management Information Systems, Leavey School of Business, Santa ClaraUniversity, Santa Clara, CA 95053, USA. E-mail: [email protected]
Jialun QinManagement Department, College of Management, University of Massachusetts Lowell, Lowell, MA 01854,USA. E-mail: [email protected]
Edna ReidDepartment of Library Science, Clarion University, Clarion, PA 16214, USA. E-mail: [email protected]
Marc SagemanThe Solomon Asch Center for Study of Ethnopolitical Conflict, University of Pennsylvania, Philadelphia,PA 19104, USA. E-mail: [email protected]
Gabriel WeimannDepartment of Communication, University of Haifa, Haifa 31905, Israel. E-mail: [email protected]
While the Web has become a worldwide platform forcommunication, terrorists share their ideology and com-municate with members on the Dark Webthe reverseside of the Web used by terrorists. Currently, the prob-lems of information overload and difficulty to obtain acomprehensive picture of terrorist activities hinder effec-tive and efficient analysis of terrorist information on theWeb. To improve understanding of terrorist activities,we have developed a novel methodology for collectingand analyzing Dark Web information. The methodologyincorporates information collection, analysis, and visual-ization techniques, and exploits various Web informationsources. We applied it to collecting and analyzing infor-mation of 39 JihadWeb sites and developed visualizationof their site contents, relationships,and activity levels. Anexpert evaluation showed that the methodology is veryuseful and promising, having a high potential to assist ininvestigation and understanding of terrorist activities byproducing results that could potentially help guide bothpolicymaking and intelligence research.
Received September 20, 2006; revised June 29, 2007; accepted January 4,2008
2008 ASIS&T Published online 7 April 2008 in Wiley InterScience(www.interscience.wiley.com). DOI: 10.1002/asi.20838
1. IntroductionThe Internet has evolved to become a global platform
through which anyone can conveniently disseminate, share,and communicate ideas. Despite many advantages, misuseof the Internet has become ever more serious, however.Terrorist organizations, extremist groups, hate groups, andracial supremacy groups are using the Web to promote theirideology, to facilitate internal communications, to attacktheir enemies, and to conduct criminal activities. Warningshave been made that terrorists may launch attacks on suchcritical infrastructure as major e-commerce sites and govern-mental networks (Gellman, 2002). Insurgents in Iraq haveposted Web messages asking for munitions, financial support,and volunteers (Blakemore, 2004). It therefore has becomeimportant to obtain from the Web intelligence that permitsbetter understanding and analysis of terrorist and extremistgroups. We define this reverse side of the Web as a DarkWeb, the portion of the World Wide Web used to help achievethe sinister objectives of terrorists and extremists.
Currently, intelligence from the Dark Web is scattered indiverse information repositories through which investigatorsneed to browse manually to be aware of their content. Much
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 59(8):13471359, 2008
-
of the information stored in search engine databases couldbe properly collected and analyzed for transformation intointelligence and knowledge that would enhance understand-ing of terrorists activities. However, search engines oftenoverwhelm users by producing laundry lists of irrelevantresults and creating information overload problems. Relatedbut unfocused information makes it difficult to obtain a com-prehensive description of a terrorist group or a terrorismtopic. Many Web resources contain information about ter-rorism, but a relatively small proportion comes from terroristgroups themselves and data on the Web often are not persis-tent and may be misleading. Many terrorist Web sites do notuse English, so investigators who do not speak that languagemay be unable to understand a sites content.
In this article, we have addressed the aforementionedproblems by proposing and implementing a semiautomatedmethodology for collecting and analyzing Dark Web infor-mation. Leveraging human preciseness and machine effi-ciency, the methodology consists of various steps includingcollection, filtering, analysis, and visualization of Dark Webinformation. We used this comprehensive methodology tocollect and analyze data from 39 Arabic terrorist Web sitesand conducted an evaluation of the results. This researchaimed to study to what extent the methodology can assistterrorism analysts in collecting and analyzing Dark Webinformation. From a broader perspective, this research con-tributes to the development of the new science of Intelli-gence and Security Informatics (ISI), the study of the use anddevelopment of advanced information technologies, systems,algorithms, and databases for national security related appli-cations through an integrated technological, organizational,and policy based approach (Chen, 2005; Strickland & Hunt,2005). We believe that many existing computer and informa-tion systems techniques need to be reexamined and adaptedfor this unique domain to create new insights and innovations.
The rest of this paper is structured as follows. The secondsection presents a review of terrorists use of informationtechnologies to facilitate terrorism, information services forstudying terrorism, and advanced techniques for collect-ing and analyzing terrorism information. The third sectiondescribes a methodology for collecting and analyzing DarkWeb information. The fourth section illustrates the use of themethodology in a case study of Jihad on the Web (whereJihad is an Islamic term referring to a holy war wagedagainst enemies) and discusses the evaluation results. The lastsection concludes the study and discusses future directions.
2. Literature Review2.1. Terrorists Use of the Web
Recent studies have shown how terrorists use the Web tofacilitate their activities. Tsfati and Weimann used the namesof terrorist organizations to search six search engines andfound 16 relevant sites in 1998 and 29 such sites in 2002(Tsfati & Weimann, 2002). Their analysis of site contentrevealed heavy use of the Web by terrorist organizations to
share ideology, to provide news, and to justify use of violence.Relying on open source information (e.g., court testimony,reports, Web sites), researchers at the Institute for SecurityTechnology Studies identified five categories of terrorist useof the Web (Technical Analysis Group, 2004): propaganda(to disseminate radical messages); recruitment and training(to encourage people to join the Jihad and get online train-ing); fundraising (to transfer funds, conduct credit card fraudand other money laundering activities); communications (toprovide instruction, resources, and support via email, digi-tal photographs, and chat session); and targeting (to conductonline surveillance and identify vulnerabilities of potentialtargets such as airports). Among these, using the Web as apropaganda tool has been widely observed.
Identified by the U.S. Government as a terrorist site,Alneda.com called itself the Center for Islamic Studiesand Research, a bogus name, and provided informationfor Al Qaeda (Thomas, 2003). To group members (insid-ers), terrorists use the Web to share motivational stories anddescriptions of operations. To mass media and non-members(outsiders), they provide analysis and commentaries ofrecent events on their Web sites. For example, Azzam.comurged Muslims to travel to Pakistan and Afghanistan tofight the Jewish-backed American Crusaders. Qassam.netappealed for donations to purchase AK-47 rifles (Kelley,2002). Al Qaeda and some humanitarian relief agenciesused the same bank accounts via www.explizit-islam.de(Thomas, 2003).
Terrorists also share ideologies on the Web that providereligious commentaries to legitimize their actions. Based ona study of 172 members participating in the global SalafiJihad, Sageman concluded that the Internet has created aconcrete bond between individuals and a virtual religiouscommunity (Sageman, 2004). His study reveals that the Webappeals to isolated individuals by easing loneliness throughconnections to people sharing some commonality. Such vir-tual community offers a number of advantages to terrorists.It no longer ties to any nation, fostering a priority of fight-ing against the far enemy (e.g., the United States) ratherthan the near enemy. Internet chat rooms tend to encour-age extreme, abstract, but simplistic solutions, thus attractingmost potential Jihad recruits who are not Islamic scholars.The anonymity of Internet cafs also protects the identityof terrorists. However, Sageman does not consider the Inter-net to be a direct contact with Jihad, because devotion toJihad must be fostered by an intense period of face-to-faceinteraction. In addition, existing studies about terrorists useof the Web mostly use a manual approach to analyze volu-minous data. Such an approach does not scale up to rapidgrowth of the Web and frequent change of terroristsidentitieson the Web.
2.2. Information Services for Studying TerrorismDespite the public nature of the Web, terrorists often try
to prevent authorities from tracing their Web addresses andactivities, which has prompted several information services
1348 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi
-
to monitor the Web sites of militant Islamic groups and toprovide access to translated versions of information postedthere. The Jihad and Terrorism Project was developed bythe Middle East Media Research Institute to bridge the lan-guage gap between theWest and the Middle East by providingtimely translations of Arabic, Farsi, and Hebrew documents(Middle East Media Research Institute, 2004). The Projectfor the Research of Islamist Movements (www.e-prism.org)studies radical Islam and Islamist movements, focusing pri-marily on Arabic sources. These projects provide access toan array of information such as translated news stories, tran-scripts, video clips, and training documents produced byterrorists but fall short of supporting analysis and visual-ization of terrorist data from the Dark Web (Project for theResearch of Islamist Movements, 2004).
2.3. Advanced Information Technologies forCombating Terrorism
Since the 9/11 attacks, there has been increased interestin using information technologies to counter terrorism. Astudy conducted by the U.S. Defense Advanced ResearchProjects Agency shows that their collaboration, model-ing, and analysis tools speeded analysis (Popp, Armour,Senator, & Numrych, 2004), but these tools were not tai-lored to collecting and analyzing Web information. Althoughnew approaches to terrorist network analysis have been calledfor (Carley, Lee, & Krackhardt, 2001), existing efforts haveremained mostly small scale; they have used manual anal-ysis of a specific terrorist organization and did not includeresources generated by terrorists in their native languages. Forinstance, Krebs manually collected data from English newsreleases after the 9/11 attacks and studied the network sur-rounding the 19 hijackers (Krebs, 2001).Although automatedsocial network analysis techniques have been proposed toanalyze and portray criminal networks, it is not clear whetherthe techniques are applicable to the mostly unstructured datain terrorist Web sites that contain textual and multimediadata (Xu & Chen, 2005). Their use of structured data ina police department database also does not help understandterrorist Web sites. Other advanced information technologieshaving potential to help analyze terrorist data on the Webinclude information visualization and Web mining.
Information visualization technologies have been used inmany domains (Zhu & Chen, 2005) such as criminal anal-ysis (Chung, Chen, Chaboya, OToole, & Atabakhsh, 2005)and business stakeholder analysis (Chung, 2007). For exam-ple, multidimensional scaling (MDS) algorithms consist of afamily of techniques that portray a data structure in a spatialfashion, where the coordinates of data points are calculatedby a dimensionality reduction procedure (Young, 1987).MDS has been many different applications. Chung and hiscolleagues developed a new browsing method based on MDSto depict the competitive landscape of businesses on the Web(Chung, Chen, & Nunamaker, 2005). He and Hui appliedMDS to displaying author cluster maps in their author co-citation analysis (He & Hui, 2002). Eom and Farris applied
MDS to author co-citation in decision support systems (DSS)literature over 1971 through 1990 in order to find contributingfields to DSS (Eom & Farris, 1996). Kealy applied MDS tostudying changes in knowledge maps of groups over timeto determine the influence of a computer-based collaborativelearning environment on conceptual understanding (Kealy,2001). Although much has been done in different domains tovisualize relationships of objects using MDS, no attemptsto apply it to discovering terrorists use of the Web have beenfound.
Web mining is the use of data mining techniques toautomatically discover and extract information from Webdocuments and services (Chen & Chau, 2004; Etzioni, 1996).Chen and his colleagues (Chen, Fan, Chau, & Zeng, 2001)showed that the approach of integrating meta-searching withtextual clustering tools achieved high precision in searchingthe Web. Web page classification, a process of automati-cally assigning Web pages into predefined categories, canbe used to assign pages into meaningful classes (Mladenic,1998). Web page clustering, a process of identifying natu-rally occurring subgroups among a set of Web pages, can beused to discover trends and patterns within a large number ofpages (Chen, Schuffels, & Orwig, 1996). Although a numberof Web mining technologies exist (e.g., Chen & Chau, 2004;Last, Markov, & Kandel, 2006), there has not yet been a com-prehensive methodology to address problems of collectingand analyzing terrorist data on the Web. Unfortunately, exist-ing frameworks using data and text mining techniques (e.g.,Nasukawa & Nagano, 2001; Trybula, 1999) do not addressissues specific to the Dark Web.
To our knowledge, few studies have used advanced Weband data mining technologies to collect and analyze terroristinformation on the Web, though these technologies have beenwidely applied in such other domains as business and scien-tific research (e.g., Chung et al., 2004; Marshall, McDonald,Chen, & Chung, 2004). New approaches to collecting andanalyzing terrorist information on the Web are needed.
3. A Methodology for Collecting and AnalyzingDark Web Information3.1. The Methodology
To address threats from the wide range of informationsources that terrorists and extremists use to spread their ideasand to conduct destructive activities, we have proposed asemiautomated methodology integrating various informationcollection and analysis techniques and human domain knowl-edge. Figure 1 shows the methodology aiming to effectivelyassist human investigators to obtain Dark Web intelligenceusing information sources, collection methods, filtering, andanalysis. Information sources consist of a wide range of providers of
terrorist or terrorism information on the Web. Some of theseare readily accessible (e.g., search engines) while some, liketerrorism incident databases and Web sites developed and
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1349DOI: 10.1002/asi
-
Information Sources
Collection Methods
DomainSpidering Back link
search
Group/PersonalProfile Search
MetaSearching
Downloading fromInternet archives
and forums
Filtering
AnalysisDomesticTerrorism
InternationalTerrorism
The WebDark WebHate Groups | Racial Supremacy | Suicidal Attackers | Activists /
Extremists | Anti-Government | ...
Terrorist Group Web Sites
SearchEngines
PropagandaWeb Sites
Publications on Terrorism
-Domain knowledge-Linguistic knowledge
-Verification-Group profiling-Showing relationships-Analyzing dynamics
-Searching-Browsing-Spidering
Indexing Visualization
Extraction Clustering
Classification
Section 4.1.1
Section 4.1.2
Section 4.1.3
FIG. 1. A methodology for collecting and analyzing Dark Web information.
maintained by terrorists and their supporters, can only bereached with the help of domain experts.
Collection methods make possible automatic searching,browsing, and harvesting of information from identifiedsources. Domain spidering starts with a set of relevant seedURLs and relies on an automatic Web page collection pro-gram, often called a spider or crawler, to harvest Web pageslinked to the seed URLs. Back-link search, supported bysome search engines such as Google (www.google.com) andAltaVista (www.altavista.com, acquired by Overture that wasthen acquired by Yahoo! in 2003), allows searching of Webpages that have hyperlinks pointing to a target Web domainor page. It helps investigators trace activities of terrorist sup-porters and sympathizers, whose Web pages often referenceterrorist sites (e.g., glorify martyrs actions, show a concur-rence of terrorist attacks). Group/personal profile search,exemplified by major Web portals such as Yahoo! (mem-bers.yahoo.com) and MSN (groups.msn.com), reveals theprofiles of groups or individuals who share the same inter-ests. Terrorists and their supporters may perhaps put hotlinks in their profiles, which allow investigators to discoverhidden linkages. Meta-searching uses related keywords asinput to query multiple search engines from which investi-gators or automated programs can collate top-ranked resultsand filter out duplicates to obtain highly pertinent URLs ofterrorist Web sites. With careful formulation of search termsand appropriate linguistic knowledge, they can obtain highlyrelevant results. For example, searching the Arabic name ofUsama Bin Laden ( ) in multiple search enginesreturns mixed results about terrorist news articles and ter-rorist Web sites, while augmenting Usama Bin Laden withthe keyword Sheikh (the head of tribe or leader in Arabic),
which is frequently used by Al Qaeda to refer to Bin Laden,can give more relevant terrorist and supporter Web sites.Downloading from Internet archives and forums exploits thetemporal dimension of Web information. For instance,the InternetArchive (www.archive.org) offers access to histor-ical snapshots of Web sites. Usenet discussion forums providea wealth of textual communication that can be mined forhidden patterns over time.
Filtering involves sifting through collected information andremoving irrelevant results, but to perform this task requiresdomain knowledge and linguistic knowledge. Domain knowl-edge refers to knowledge about terrorist groups, their relation-ships with other terrorist and supporter groups, their presenceon and usage of the Web, as well as their histories, activi-ties, and missions. Linguistic knowledge deals with terms,slogans, and other textual and symbolic clues in the nativelanguages of the terrorist groups. Filtering can be automaticor manual, depending on requirements for efficiency of pro-cess and precision of the results. Typically, manual filteringachieves high precision, but it is less efficient and relies ondomain experts who have had years of experience in the field.Automatic filtering is very efficient as it often uses computersand machine learning to process large amounts of data but theresults are less precise. Investigators can obtain high-qualitydata for analysis from filtered repositories.
Analysis provides insights into data and helps investigatorsidentify trends and verify conjectures. Several functions sup-port these analytical tasks. Indexing relates textual terms toindividual Web pages, thereby supporting precise searchingof the pages. Extraction identifies meaningful entities suchas terrorist names, frequently used slogans, and suspiciousterms. Classification finds common properties among entities
1350 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi
-
and assigns them to predefined categories to help investiga-tors predict trends of terrorist activities. Clustering organizesentities into naturally occurring groups and helps to iden-tify similar terrorist groups and their supporters. Visualizationpresents voluminous data in a format perceivable by humaneyes, so investigators can picture the relationship within a net-work organization of terrorist groups and can recognize theirunderlying structure.
3.2. Discussion of the MethodologyAlthough the Internet has been publicly available since the
1990s, the Dark Web emerged only in recent years. A lackof useful methodology designed for Dark Web data collec-tion and analysis has limited the capability to fight againstterrorism. As discussed above, the proposed methodologyhas incorporated various data and Web mining technologieswhile still allowing human domain knowledge to guide theirapplication. Its semiautomated nature combines machine effi-ciency with the advantages of human precision, a usefulcomplement to computers that usually fail to detect deceptionand ambiguity on the Dark Web. Its coverage of wide vari-eties of data sources and techniques ensures a comprehensiveDark Web data collection, a challenge often faced by terror-ism and intelligence analysts. Therefore, the methodologyand its integration and application of data and Web miningtechnologies to Dark Web analysis are novel contributions tothe ISI research.
4. Jihad on the Web: A Case StudyTo demonstrate the value and usability of our methodol-
ogy, we have applied it to collecting and analyzing the useof the Web for Jihad, an Islamic term referring to a holy warwaged against enemies as a religious duty. Believers contendthat those who die in Jihad become martyrs and are guaran-teed a place in paradise. In the recent decades, the conceptof Jihad has been used as an ideological weapon to combatagainst Western influences and secular governments and toestablish an ideal Islamic society (Encyclopedia BritannicaOnline, 2007). Jihad supporters are closely related to terror-ist groups while maintaining anonymity using the Web. Forexample, prior to the 9/11 attacks, Al-Qaeda members senteach other thousands of messages in a password-protectedsection of an extreme Islamic Web site (Anti-DefamationLeague, 2002). Terrorist groups such as Hamas, Hizbollah,and Palestinian Islamic Jihad also use Web sites as propa-ganda tools. We describe the steps of applying the methodol-ogy as follows (see Figure 1). The data described below werecollected in 2004.
4.1. Application of the Methodology4.1.1. Collection. To collect data, we first identified foursuspicious URLs through Web searching, referencing to pub-lished terrorism reports, and performing personal profilesearches on Yahoo. (For example, we searched hizbollahin Google where we found its URL among the top-ranked
results.) These URLs are Palestinian Islamic Jihad (PIJ;www.qudsway.com), Hizbollah (www.hizbollah.org), themilitary wing of Hamas (www.ezzedeen.net), and an Ara-bic Web site with a pro-Jihad forum (www.al-imam.net). A2003 U.S. Department of State report confirmed that PIJ,Hizbollah, and Hamas to be terrorist or terrorist-affiliatedgroups (Department of State, 2003). Though Al-Imam.netis not classified as a terrorist organization, it contains pro-Jihad forums in which messages and links to terrorist Websites are posted. We then used the back-link search functionof Google to obtain several hundreds URLs that point to thefour suspicious URLs. As Dark Web information can be scat-tered in many different sources and can be changed quicklyover time, the several methods used to identify the four initialURLs enabled us to cover a broader scope and a more timelycontent than relying only on published reports (e.g., U.S.Department of States annual report). While different initialURLs and different times of data collection could affect thecontent of the data collected, we believe that the choice ofthe four URLs are representative of the Dark Web. It wouldbe an interesting future direction to study the extent to whichdata collection affect the quality of analysis results.
4.1.2. Filtering. We conducted two rounds of filtering.First, we manually filtered out unrelated sites, such as newsor governmental Web sites that report or discuss only terror-ist activities, religious Web sites with no reference to Jihador violence, and political Web sites where there is no men-tion or approval of terrorist activities. We retained Web sitesof terrorist organizations, those of terrorist leaders and thosethat praise terrorists or their actions. Forty-six sites remainedafter this round of filtering.
Second, with the help of a native Arabic speaker (whois not a terrorism expert), we manually added 14 terror-ist and supporter sites identified by querying Google withthe keywords (in Arabic) that we had found in the terroristand supporter sites. Such keywords included the leaders andorganizations names in Arabic (mojahedin iran, markazdawa, , etc.). To limit the scope of analy-sis, we considered only the top 50 results returned from thesearch engine in each query search. In addition, we manu-ally removed 21 sites from the set of all sites obtained basedon their relevance to the domain. This round of filtering andrefining resulted in 39 Arabic Web sites24 terrorist sitesand 15 supporter sites.
4.1.3. Analysis. We performed clustering, classification,and visualization on the 94,326 Web pages collected bycrawling the 39 terrorist and supporter sites using an exhaus-tive breadth-first search spidering program (with a maximumdepth of 10 levels). The first analysis task we performed wasclustering in which we considered as input the 46 Web sitesidentified from the first round of filtering (see paragraph 1 ofSection 4.1.2). The clustering involves calculating a similar-ity between each pair of Web sites in our collection to uncoverhidden Web communities. We define similarity to be a real-valued multivariable function of the number of hyperlinks in
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1351DOI: 10.1002/asi
-
one Web site (A) pointing to another Web site (B), and thenumber of hyperlinks in the latter site (B) pointing to theformer site (A). In addition, a hyperlink is weighted propor-tionally to how deep it appears in the Web site hierarchy. Forinstance, a hyperlink appearing on the homepage of a Website is given a higher weight than hyperlinks appearing at adeeper level. Specifically, the similarity between Web sitesA and B is calculated as follows:
Similarity(A,B) =
All links Lb/w A and B
11 + lv(L)
where lv(L) is the level of link L in the Web site hierar-chy, with homepage as level 0 and the level increased by 1with each level down in the hierarchy. Using these heuristics,a computer program automatically extracted hyperlinks onWeb pages and calculated their similarities.
In the second analysis task, we classified the sites by theiraffiliations with terrorist groups, ideologies, and religions,and by their Web site attributes. Our native Arabic speakermanually identified the affiliations of all the Web sites accord-ing to their site content. Even with the help of the Arabicspeaker, the components of methodology are generic enoughto be applicable to other domains. The choice of this Arabicspeaker, (again, who is not a terrorism expert), also wouldnot affect the results. Table 1 shows the details of the Websites and their affiliations.
In addition to using affiliations, we classified the sites byindicating how terrorists and their supporters use the Webto facilitate their activities. From our literature review, weidentified six types of terrorist use of the Web and 27 uniqueWeb site attributes. Table 2 presents these attributes catego-rized under the six types. Following this coding scheme, theArabic speaker manually read through all the subject Webpages to record terrorist uses of the Web. Similarly to thatused in studying the openness of government Web sites (LaPorte, Jong, & Demchak, 1999), our coding involved findingwhether an attribute existed on the Web sites (i.e., binary scor-ing). Manual coding of each Web site required 45 minutes to1 hour.
To reveal patterns of terroristWeb site existence and degreeof a sites activities, we performed in the third analysis tasktwo types of visualization: multidimensional scaling andsnowflake visualizations.
Multidimensional scaling visualization provided a high-level picture of all the terrorist groups and their rela-tionships. We used Multidimensional scaling (MDS) totransform a high-dimensional similarity matrix to a set oftwo-dimensional coordinates (Young, 1987). While othervisualization techniques might have been applicable, wechose MDS because it suits the current data structure andprovides a vivid picture summarizing terrorist groups rela-tionships. Figure 2 shows these relationships in which thesites appear as nodes and the lines connect pairs of sites thathave at least one hyperlink pointing from one site to another.Using the similarity matrix as input, the MDS algorithm cal-culated coordinates of each site and placed the sites on a
two-dimensional space where proximity reflects similarity.Upon closer examination of the figure, seven clusters of sitesemerge. (The numbers in parentheses refer to the sites inTable1. The URLs were filtered out in the second-round filteringbut appeared in the collection after the first-round filtering.)
(1) Hizballah Cluster (# 7, 11, 12, hizbollah.org, andintiqad.org) contains the Web site of Hizballah group(www.hizbollah.org) and its affiliated sites such as HizbollahE-magazine (www.intiqad.org), Hizbollah Support Associ-ation (#11), and the site of Sayyed Hassan Nasrollah (#12),a major leader of Hizbollah.
(2) Palestinian Cluster (# 4, 5, 6, 9, 13, 14, 15, 36, andh4palestine.com) includes militant groups fighting againstIsrael (e.g., Al-Aqsa Martyrs Brigade, Hamas). There arelinks between sites of the same group (e.g., # 4 and 14) andlinks between sites of different groups (e.g., # 9 and 6).
(3) Al Qaeda Cluster (# 26, 28, 31, 35, 37, and sahwah.com)includes Salafi groups supporters Web sites that often arelinked to each other in their Other friendly Web sites sec-tion. They use their Web sites heavily to propagate theirideology. For example, Al-ansar.biz posted a video of thebeheading of Nicholas Berg, one of the first civilians killedby terrorists (Newman, 2004). Alsakifah.org provides anonline discussion forum.
(4) Caucasian Cluster (# 10, 34, kavkazcenter.com, kavkaz.tv,kavkazcenter.net, and kavkazcenter.info) consists of Websites that link to Chechen rebels and provide news updatesfrom Chechen areas. For example, Qoqaz.com has docu-mented operations against Russian military.
(5) Jihad Supporters (# 29, 30, 32, 33, clearguidance.blogspot.com, and ummanews.com) consist of Web sites providingnews and general information on the global Jihad movement.These sites rarely are linked to each other and often play apropaganda role that targets outsiders.
(6) Hizb-Ut-Tahrir (# 27, hizb-ut-tahrir.org, expliciet.nl,khilafah.com, and hilafet.com) contains a non-terrorist polit-ical group, Hizb-Ut-Tahrir, dedicated to the restoration ofIslamic law and Khilafah (global leadership of Muslims). Ithas a presence in many Arab countries (e.g., Lebanon, Jor-dan) and some European countries. For instance, Expliciet.nlis a Dutch Web site based in the Netherlands.
(7) Tanzeem-e-Islami Cluster (tanzeem.org) consists of a sin-gle site representing the Pakistani Tanzeem-e-Islami partywith no clear ties to terrorism.
Snowflake visualization supports analysis of differentdimensions (or categories) of activities of a Web site clus-ter. It originates from a star plot that has been widely used todisplay multivariate data (Chambers, Cleveland, Kleiner, &Tukey, 1983). A snowflake shown in Figure 2 represents aterrorist site cluster. Figure 3 shows five snowflake diagrams,each representing the degree of activity of terrorist/supportergroups in the five terrorist clusters (Clusters 15) describedabove. (Clusters 6 and 7 are not included because they do notcontain terrorist sites.) The six sides of a snowflake repre-sent the six dimensions of terrorist use of the Web, as shownin Table 2 and explained above. Each of these six dimen-sions represents a normalized scale between 0 and 1 (activityindex), showing the degree of activity on the dimensions.
1352 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi
-
TABL
E1.
Ana
lysis
ofJ
ihad
terr
orist
grou
psan
dth
eirs
upp
orte
rss
ites.
No
Nam
eU
RLa
Des
crip
tionb
Terr
oris
tgro
upc
Rel
igio
n
Terr
oris
tGro
ups
Web
Site
s(tot
al:24
)1
Spec
ialF
orc
ew
ww
.sp
ecia
lforc
e.ne
tPr
ovid
esco
mpu
terg
ame
repl
icat
ing
the
fight
ing
scen
esbe
twee
nLe
bane
sere
sista
nce
and
Isra
elio
ccu
pier
sH
izba
llah
Shia
Mus
lim
2Pa
lesti
neIn
foin
Urd
upa
lesti
ne-in
fo-u
rdu.
com
Ham
asn
ews
Web
site
inU
rdu
Ham
asSu
nniM
uslim
3A
l-Man
arw
eb.m
anar
tv.o
rgTh
eWeb
site
ofA
l-Man
ar,
theT
Vch
anne
lofL
eban
ese
Hiz
balla
hH
izba
llah
Shia
Mus
lim4
Abr
arw
ayw
ww
.ab
rarw
ay.co
mN
ews
Web
site
ofI
slam
icJih
ado
fPal
estin
eG
uerri
llagr
oup
Pale
stini
anIs
lam
icJih
adSu
nniM
uslim
5Is
lam
icJih
adM
ail
ww
w.jim
ail.co
mN
ews
Web
site
ofI
slam
icJih
ado
fPal
estin
eG
uerri
llagr
oup
Pale
stini
anIs
lam
icJih
adSu
nniM
uslim
6Ez
z-al
-din
eAl-Q
assam
ww
w.ez
zede
en.n
etA
gene
ralp
orta
lofI
zz-E
deen
Al-Q
asam
Ham
asSu
nniM
uslim
7H
izbo
llah
ww
w.hi
zbol
lah.
tvTh
eoffi
cial
Web
site
ofH
izba
llah
Org
aniz
atio
nH
izba
llah
Shia
Mus
lim8
Info
Pale
stina
ww
w.in
fopa
lesti
na.c
omH
amas
info
rmat
ion
and
new
sW
ebSi
tein
Mal
ayH
amas
Sunn
iMus
lim9
Kat
aeb
AlA
qsa
ww
w.ka
taeb
alaq
sa.c
omTh
eoffi
cial
Web
Site
ofA
lAqs
aM
arty
rsB
rigad
esA
l-Aqs
aM
arty
rsB
rigad
eSe
cula
r10
Kav
kaz
ww
w.ka
vka
z.or
g.uk
The
new
sW
ebSi
teo
fChe
chen
guer
rilla
fight
ers
Isla
mic
Inte
rnat
iona
lBrig
ade,
Spec
ialP
urpo
seIs
lam
icR
egim
ent,
Riy
adus
-Sal
ikhi
nR
econ
naiss
ance
and
Sabo
tage
Bat
talio
no
fCh
eche
nM
arty
rs
Sunn
iMus
lim
11M
oqaw
ama
ww
w.m
oqa
wam
a.tv
Web
site
oft
heH
izba
llah
ssu
ppor
tgro
upH
izba
llah
Shia
Mus
lim12
Nas
rolla
hw
ww
.n
asro
llah.
org
Hiz
balla
hle
ader
ssit
e(S
heikh
Has
san
Nas
rolla
h)H
izba
llah
Shia
Mus
lim13
Alsh
ohad
aw
ww
.b-
alsh
ohda
.com
Web
site
ofH
amas
and
Isla
mic
Jihad
dedi
cate
dto
mar
tyrs
Ham
as,P
ales
tinia
nIs
lam
icJih
adSu
nniM
uslim
14Qu
dsW
ayw
ww
.qu
dsw
ay.co
mPr
ovid
esge
nera
lnew
so
fIsla
mic
Jihad
ofP
ales
tine
Pale
stini
anIs
lam
icJih
adSu
nniM
uslim
15R
antis
iw
ww
.ra
ntis
i.net
Web
site
ofA
bdel
Azi
zAlR
antis
iaH
amas
lead
erH
amas
Sunn
iMus
lim16
Peop
les
Moja
hedin
of
Iran
ww
w.ira
n.m
ojahe
din.or
gW
ebsit
epos
ting
stat
emen
tsby
theP
eopl
es
Moja
hedin
Org
aniz
atio
nM
ujahe
din-e
Kha
lqO
rgan
izat
ion
Secu
lar
17N
atio
nalC
ounc
ilo
fR
esist
ance
ofI
ran
ww
w.ira
nncr
fac.
org
Offi
cial
Web
site
oft
heFo
reig
nA
ffairs
Com
mitt
eeo
fthe
Nat
iona
lCo
unci
lofR
esist
ance
ofI
ran
Muja
hedin
-eK
halq
Org
aniz
atio
nSe
cula
r
18Ir
ania
nPe
ople
sFa
daee
Gue
rrilla
sw
ww
.sia
hkal
.com
The
mem
oria
lWeb
Site
oft
heIr
ania
nPe
ople
sFa
daee
Gue
rrilla
sM
ujahe
din-e
Kha
lqO
rgan
izat
ion
Secu
lar
19Th
eO
rgan
izat
ion
of
Iran
ian
Peop
les
Feda
ian
ww
w.fa
dai.o
rgTh
eO
rgan
izat
ion
ofI
rani
anPe
ople
sFe
daia
n(M
ajority
)offi
cial
Web
site
Muja
hedin
-eK
halq
Org
aniz
atio
nSe
cula
r
20O
rgan
izat
ion
ofI
rani
anPe
ople
sFe
daye
eG
uerri
llas
ww
w.fa
daia
n.or
gO
rgan
izat
ion
ofI
rani
anPe
ople
sFe
daye
eG
uerri
llasm
emo
rial
Web
site
Muja
hedin
-eK
halq
Org
aniz
atio
nSe
cula
r
21Th
eU
nion
ofP
eopl
es
Feda
ian
ofI
ran
ww
w.et
ehad
efed
aian
.org
New
san
din
form
atio
nW
ebsit
eo
fthe
Uni
ono
fPeo
ple
sFe
daia
no
fIra
nM
ujahe
din-e
Kha
lqO
rgan
izat
ion
Secu
lar (Con
tinue
d)
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1353DOI: 10.1002/asi
-
TABL
E1.
(Con
tinue
d)
No
Nam
eU
RLa
Des
crip
tionb
Terr
oris
tgro
upc
Rel
igio
n
22R
evo
lutio
nary
Peop
les
Libe
ratio
nFr
ont
ww
w.dh
kc.n
etR
evo
lutio
nary
Peop
lesL
iber
atio
nFr
onto
ffici
alW
ebsit
e.Pr
ovid
esnew
san
dst
atem
ents
oft
heorg
aniz
atio
nR
evo
lutio
nary
Peop
les
Libe
ratio
nA
rmy/
Fron
tSe
cula
r
23D
HK
CIn
tern
atio
nal
ww
w.dh
kc.in
foW
ebsit
eo
fDH
KC
inTu
rkish
Rev
olu
tiona
ryPe
ople
sLi
bera
tion
Arm
y/Fr
ont
Secu
lar
24Cr
usad
eB
egin
sjor
gev
inhe
do.si
tes.u
ol.c
om.b
rTh
eB
razi
l-bas
edW
ebsit
elin
ksto
Lash
kar-e
-Tai
ba
a
terr
orist
org
aniz
atio
nba
sed
inPa
kista
nLa
shka
r-eTa
yyib
aSu
nniM
uslim
Supp
orte
rsW
ebsit
es(to
tal:1
5)25
AlA
nsar
ww
w.al
-ans
ar.bi
zPr
ovid
essu
ppor
tto
AlQ
aeda
org
aniz
atio
n,as
wel
las
artic
les
abou
tthe
Sala
fiSu
nniI
deol
ogy
AlQ
aeda
Sunn
iMus
lim
26A
loka
bw
ww
.al
okab
.co
mPr
ovid
esar
ticle
sabo
utth
eSa
lafi
Sunn
iIde
olog
yan
dth
eJih
adist
movem
ent
AlQ
aeda
Sunn
iMus
lim
27A
lsaki
fah
Foru
mw
ww
.al
saki
fah.
org
Prov
ides
educ
atio
nals
erv
ices
and
afo
rum
dedi
cate
dto
the
disc
ussio
no
fthe
Sala
fiId
eolo
gyA
lQae
daSu
nniM
uslim
28Ci
had
ww
w.ci
had.
net
Age
nera
lJih
adW
ebsit
epr
ovid
ing
info
rmat
ion
abou
tall
Jihad
activ
ities
aro
un
dth
ew
orld
AlQ
aeda
Sunn
iMus
lim
29Cl
earG
uida
nceF
oru
mw
ww
.cl
earg
uida
nce.
com
Foru
mo
fJih
adsu
ppor
ters
AlQ
aeda
Sunn
iMus
lim30
Shei
khH
amid
Bin
Abd
alla
hA
lAli
ww
w.h-
alal
i.net
Sala
fiEd
ucat
iona
lWeb
site
with
som
eJih
adid
eas
AlQ
aeda
Sunn
iMus
lim
31Jih
adun
spun
ww
w.jih
adun
spun.c
omPr
o-Jih
adn
ews
Web
site
AlQ
aeda
Sunn
iMus
lim32
Mak
tab-
Al-J
ihad
ww
w.m
akta
b-al
-jihad
.com
Pro-
Jihad
new
sW
ebsit
eA
lQae
daSu
nniM
uslim
33Qo
qaz
ww
w.qo
qaz.
com
Jihad
new
sfro
mth
eCa
ucas
usIs
lam
icIn
tern
atio
nalB
rigad
e,Sp
ecia
lPur
pose
Isla
mic
Reg
imen
t,R
iyad
us-S
alik
hin
Rec
onna
issan
cean
dSa
bota
geB
atta
lion
of
Chec
hen
Mar
tyrs
Sunn
iMus
lim
34Su
ppor
ters
of
Shar
eeah
ww
w.sh
aree
ah.o
rgA
gene
ralp
orta
lded
icat
edto
the
Jihad
istm
ovem
ent
AlQ
aeda
Sunn
iMus
lim
35M
olta
qaw
ww
.al
mol
taqa
.org
Ham
asFo
rum
Ham
asSu
nniM
uslim
36Sa
raya
ww
w.sa
raya
.com
Pro-
Jihad
Web
site
AlQ
aeda
Sunn
iMus
lim37
Osa
ma
Bin
Lade
n1o
sam
abin
lade
n.5u
.com
AW
ebsit
ede
dica
ted
toO
sam
aB
inLa
den
AlQ
aeda
Sunn
iMus
lim38
Taw
hed
ww
w.ta
whe
d.w
sPr
o-Jih
adW
ebsit
eA
lQae
daSu
nniM
uslim
39Th
eR
ight
Wo
rdw
ww
.rig
htw
ord
.net
Pro-
AlQ
aeda
Web
Porta
lA
lQae
daSu
nniM
uslim
aSo
me
oft
heU
RLsa
nd
sites
may
have
been
chan
ged
atth
etim
eo
frea
ding
due
toth
era
pid
chan
geo
fthe
Dar
kW
eb.
b The
desc
riptio
nsar
eo
btai
ned
from
theW
ebsit
es.
cD
escr
iptio
nso
fthe
sete
rror
istgr
oups
appe
arin
the
U.S
.Dep
artm
ento
fSta
teR
epor
tPa
ttern
ofG
loba
lTer
rori
sm,2
002.
1354 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi
-
TABL
E2.
Cate
gorie
soft
erro
ristu
seo
fthe
Web
and
Web
site
attr
ibu
tes.
Cate
gory
Attr
ibu
teD
escr
iptio
n
Com
mun
icat
ions
E-m
ail
Any
liste
dem
aila
ddre
sso
rfe
edba
ckfo
rm.
Tele
phon
e(in
cludin
gWeb
phon
e)Te
leph
one
num
bers
ofo
rgan
izat
ion
offi
cial
s.M
ultim
edia
tool
sVi
deo
clip
sofb
ombi
ngsa
nd
oth
erac
tiviti
es.V
ideo
,so
un
dre
cord
ing
&ga
me
(e.g.,
lead
ers
mes
sage
san
din
struc
tions
).O
nlin
efe
edba
ckfo
rmA
llow
the
use
rto
give
feed
back
or
ask
ques
tions
toth
eWeb
site
ow
ner
san
dm
aint
aine
rs.
Doc
umen
tatio
nR
epor
t,bo
ok,l
ette
r,m
emo
and
oth
erre
sou
rces
prov
ided
(e.g.,
inpd
f,W
ord
,an
dEx
celf
orm
ats).
Fund
raisi
ngEx
tern
alai
dm
entio
ned
Oth
ergr
oups
or
gover
nm
ents
supp
ortin
gth
eorg
aniz
atio
n.Fu
ndtr
ansf
erFu
ndtr
ansf
erm
etho
ds.
Don
atio
nD
onat
ions
un
dert
hefo
rmo
fdire
ctba
nkde
posit
s.Ch
arity
Don
atio
nsto
relig
ious
wel
fare
org
aniz
atio
nsas
soci
ated
with
terr
orist
org
aniz
atio
n.Su
ppor
tgro
ups
Subo
rgan
izat
iona
lstr
uctu
res
char
ged
with
the
fund
raisi
ngpr
ogra
m.
Oth
ers
Oth
erat
trib
ute
sbe
long
ing
toth
isca
tego
ry.
Shar
ing
ideo
logy
Miss
ion
The
majo
rgoa
lso
fthe
org
aniz
atio
n(e.
g.,de
struc
tion
ofa
nen
emy
stat
e,lib
erat
ion
ofo
ccu
pied
terr
itorie
s).D
octri
neTh
ebe
liefs
oft
hegr
oup
(e.g.,
relig
ious
,co
mm
un
ist,e
xtr
eme
right
).Ju
stific
atio
no
fthe
use
ofv
iole
nce
Ideo
logy
con
done
sthe
use
ofv
iole
nce
toac
com
plish
goal
s(e.g
.,su
icid
ebo
mbi
ng).
Pinp
oint
ing
enem
ies
Clas
sifies
oth
ersa
sei
ther
enem
ieso
rfri
ends
(e.g.,
U.S
.ise
nem
y,Ta
liban
regi
me
isfri
endl
y).Pr
opag
anda
(insid
ers)
Slog
ans
Shor
tphr
ases
with
relig
ious
or
ideo
logi
calc
onnota
tions
.D
ates
Men
tions
date
sin
the
histo
ryo
fthe
terr
orist
grou
p,su
chas
the
date
ofa
majo
ratta
ck.
Mar
tyrs
desc
riptio
nLi
ststh
en
ames
ofm
embe
rsw
hodi
edin
terr
orism
rela
ted
ope
ratio
nso
rde
scrip
tions
oft
heci
rcum
stanc
es.
Lead
ers
nam
e(s)
Terr
oris
tgro
upsl
eade
r(s)n
ame
ascl
aim
edby
theW
ebsit
e.B
anne
rand
seal
Ban
nerd
epic
ting
repr
esen
tativ
efig
ures
,gra
phic
alsy
mbo
ls,or
seal
soft
heorg
aniz
atio
n.N
arra
tives
ofo
pera
tions
and
even
tsPr
ovid
esn
arra
tives
oft
heo
pera
tions
and
atta
ckso
fthe
grou
p.O
ther
sO
ther
attr
ibu
tes
belo
ngin
gto
this
cate
gory
.
Prop
agan
da(ou
tside
rs)R
efer
ence
tom
edia
cover
age
For
exam
ple,
theW
ebsit
ecr
itici
zesW
este
rnm
edia
cover
age
ofe
ven
tsw
ithex
plic
itm
entio
no
foutle
tso
feven
tssu
chas
CNN
,CBS
.N
ews
repo
rting
Gro
ups
ow
nin
terp
reta
tion
ofe
ven
ts.
Virtu
alco
mm
un
ityLi
stser
vA
utom
atic
mai
ling
lists
erver
that
broa
dcas
tsto
ever
yone
on
the
list.
Tex
tcha
tro
om
Virtu
alro
om
whe
rea
chat
sess
ion
take
spl
ace.
Tex
tmes
sagi
ngch
atse
ssio
nsu
chas
ICQ.
Mes
sage
boar
dA
llow
sm
embe
rsto
post
and
read
mes
sage
son
line.
Web
ring
Ase
rieso
fweb
sites
linke
dto
geth
erin
arin
gth
atby
clic
king
thro
ugh
allo
fthe
sites
inth
erin
gth
ev
isito
rw
illev
entu
ally
com
eba
ckto
the
orig
inat
ing
site.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1355DOI: 10.1002/asi
-
11
12
27
7
5
13
15
14
9
6
36
37
2628
34
29
35
3332
ktab-al-jihad.com
1. Hizballah Cluster
5. Jihad Supporters
6. Hizb-Ut-Tahrir Cluster
2. Palestinian Cluster
4
30
4. Caucasian Cluster
31
3. Al-Qaeda Cluster
FIG. 2. Clustering and visualization of terrorist Web sites (The numbers refer to those appearing in Table 1)*.
The activity index of Cluster c on dimension d was calculatedby the following formula:
Activity Index (c, d) =
ni
mj
wi,j
m n
where wi,j ={
1 attribute i occurs in Web site j0 otherwise
n = total number of attributes in the specified dimension d;m = total number of Web sites belonging to the specifiedCluster c.
The closer the activity index is to 1, the more active acluster is on that dimension. This index reveals in what areasthe terrorist groups are active and hence provides investiga-tors and analysts with clues about how to devise strategies tocombat a group.
4.2. Results and Discussions
Our preliminary observations show that the methodol-ogy yielded promising results. For example, it identifiedWeb sites affiliated with 10 of the 26 groups classified asJihad terrorist organizations in the U.S. State Departmentreport on terrorism. Al-Ansar.biz (# 26), the site that posted
the beheading video of Nicholas Berg, posted messagesfrom Al Qaeda leaders such as Osama Bin Laden, AymanAl-Zawahiri, and Al-Zarqawi, praising their attacks on ene-mies. Another site, Tawhed.com (site 39), posted a poempraising the 9/11 attacks. The rhetoric of the poem commonlyappears in many Al Qaeda affiliated Web sites, referring tothe Americans as crusaders ( ). Words like Sunna andJamah ( ) reflect the branch of Islam to which theSalafi groups belong.
From the snowflake diagrams (Figure 3), we found thatterrorists and supporters use the Web heavily to share ideol-ogy and to propagate ideas, especially to their members. Forexample, the Palestinian cluster (Cluster 2) actively sharesits ideology and heavily uses the Web as a propaganda toolfor members. The Web sites in this cluster support libera-tion of Palestine, pinpoint and criticize their enemies, anddescribe details of operations and rationales supported byQuaran verses. In contrast, Jihad supporters (Custer 5) rarelyuse the Web for propaganda but share ideology and com-municate there. The Hizbollah cluster (Cluster 1) resemblesthe Palestinian cluster in heavy use of the Web for sharingideology and insider propaganda. For example, the sites inthis cluster glorify martyrs and leaders and also were usedmoderately for outsider propaganda and communications.In all the five clusters, we found little evidence of using the
1356 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi
-
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 1: Hizballah Cluster
0.53
0.20
0.92
0.72
0.50
0.13
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 2: Palestinian Cluster
0.43
0.10
0.81
0.81
0.44
0.35
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 3: Al-Qaeda Cluster
0.52
0.12
0.85
0.30
0.30
0.32
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 4: Caucasian Cluster
0.60
0.10
0.50
0.50
0.50
0.40
Communications
Fundraising
Sharingideology
Propaganda(insiders)
Propaganda(outsiders)
Virtualcommunity
Cluster 5: Jihad Supporters
0.40
0.05
0.500.210.38
0.20
FIG. 3. Snowflake visualization of five terrorist site clusters.
Web for fundraising or building a virtual community. Prob-ably such uses have gone underground or do not appear onthe Web.
4.3. Expert Evaluation and Results
Based on the above results, we have invited a terrorismexpert to conduct an evaluation of the methodology. A seniorfellow of the U.S. Institute of Peace at Washington D.C., theexpert is a professor of communication in a major researchuniversity in Israel. Having expertise in modern terrorism andthe Internet, he has published more than 80 refereed journalarticles and books and is a frequent speaker at internationalconferences on counter terrorism. This expert also leads ateam of about 16 research assistants who regularly moni-tor 4,300 sites on the Dark Web for terrorist activities. Theapproach he and his team use to collect and analyze terror-ists use of the Web is largely manual, relying on laborioushuman browsing and monitoring of selected Web sites. Hisexperience in manual analysis served to contrast with ourmethodology that automated part of the DarkWeb data collec-tion and analysis. We decided to use expert validation insteadof other evaluation methods because of two reasons: (1)Lab experiment is not suitable because typical experimental
subjects do not have much knowledge in the Dark Web, and(2) it is not feasible to invite terrorists to participate in an inter-view or empirical evaluation. The expert was not involved inwriting this article.
The evaluation was conducted using an unbiased struc-tured questionnaire and a formal procedure. We showed theresults to our expert and asked him to provide detailed com-ments on the categorization of Web sites and attributes, thevisualization and clustering of terrorist groups, and the usabil-ity of the snowflake visualization. In general, he deemed theresults to be very promising and the methodology designto be excellent. He believed that this was the start of avery important research that will result in a useful databaseand a reliable methodology to update and maintain thedatabase.
The expert was greatly impressed by the visualization andclustering capabilities of the methodology, and he providedvaluable comments on our work. However, he said that the39 Web sites shown in Table 1 do not represent the entirepopulation of all terrorist Web sites, the number of whichhe estimated to be over four thousands. Because we focusedonly on Middle Eastern terrorist groups (rather than all ter-rorist groups in the world), we believe that our methodologyhas yielded representative results and has automated much
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1357DOI: 10.1002/asi
-
of the manual work of identifying and analyzing terroristWeb sites. He suggested adding qualitative measures such aspersuasive appeals, rhetoric, and attribution of guilt to theWeb site attributes shown in Table 2. We believe that theseimportant attributes are difficult to be incorporated intothe automated processing of our methodology because oftheir qualitative nature. He considered the clustering and visu-alization shown in Figure 2 to be very important because of itsusefulness to investigation of terrorist activities on the Web.He called the snowflake visualization very accurate and veryuseful to investigation of terrorist Web sites but criticized theway we created linkages among Web sites. He suggested con-sidering textual citations and other references in addition tousing only hyperlinks.
Overall, the expert agreed that the results were verypromising because they offer useful investigation leads andwould be very helpful to improve understanding of terror-ist activities on the Web. Because of the high qualificationand relevant experience of this expert, we believe that theevaluation results can accurately reflect the effectiveness ofthe methodology. These results also contributed to advanc-ing the ISI discipline by showing the applicability of themethodology to Dark Web data collection and analysis.
5. Conclusions and Future DirectionsCollecting and analyzing Dark Web information has chal-
lenged investigators and researchers because terrorists caneasily hide their identities and remove traces of their activi-ties on the Web. The abundance of Web information has madeit difficult to obtain a comprehensive picture of terroristsactivities. In this article, we have proposed a methodology toaddress these problems. Using advanced Web mining, con-tent analysis, visualization techniques, and human domainknowledge, the methodology exploited various informationsources to identify and analyze 39 Jihad Web sites. Infor-mation visualization was used to help to identify terroristclusters and to understand terrorist use of the Web. Our expertevaluation showed that the methodology yielded promisingresults that would be very useful to assist investigation of ter-rorism. The expert considered the visualization results veryuseful, having potential to guide policymaking and intelli-gence research. Therefore, this research has contributed todeveloping a useful methodology for collecting and analyzingDark Web information, applying the methodology to study-ing and analyzing 39 Jihad Web sites, and providing formalevaluation results of the usability of the methodology.
We are pursuing a number of directions to further ourresearch. As terrorists often change their Web sites to removetraces of their activities, we plan to archive the Dark Web con-tent digitally and apply our methodology to tracing terroristactivities over time. We will develop scalable techniques tocollect such volatile yet valuable content to visualize largevolumes of Dark Web data and extract meaningful entitiesfrom terrorist Web sites. These efforts will help investigatorstrace and prevent terrorist attacks.
6. AcknowledgmentsThis research was partly supported by funding from the
U.S. Government Department of Homeland Security andCorporation for National Research Initiatives and by SantaClara University. We thank contributing members of theUniversity of Arizona Artificial Intelligence Lab for theirsupport and assistance.
ReferencesAnti-Defamation League. (2002). Jihad Online: Islamic Terrorists and the
Internet, retrieved March 26, 2008 from http://www.adl.org/internet/jihad_online.pdf.
Blakemore, B. (November 23, 2004). Web posting may provide insightinto Iraq insurgency. ABC News, retrieved March 26, 2008 from http://abcnews.go.com/WNT/story?id=277421.
Carley, Kathleen M. Ju-Sung Lee and David Krackhardt, 2001, DestabilizingNetworks, Connections, 24(3): 3134.
Chambers, J., Cleveland, W., Kleiner, B., & Tukey, P. (1983). Graphicalmethods for data analysis. Wadsworth International Group (Belmont, CA)and Duxbury Press (Boston, MA).
Chen, H. (2005). Introduction to the special topic issue: Intelligence andsecurity informatics. Journal of the American Society for InformationScience and Technology, 56(3), 217220.
Chen, H., & Chau, M. (2004). Web mining: Machine learning for Web appli-cations. In M. E. Williams (Ed.),Annual review of information science andtechnology (Vol. 38, pp. 289329). Medford, NJ: Information Today, Inc.
Chen, H., Fan, H., Chau, M., & Zeng, D. (2001). MetaSpider: Meta-searchingand categorization on the Web. Journal of the American Society forInformation Science and Technology, 52(13), 11341147.
Chen, H., Schuffels, C., & Orwig, R. (1996). Internet categorization andsearch: A self-organizing approach. Journal of Visual Communicationand Image Representation, 7(1), 88102.
Chung, W. (2008). Visualizing E-Business Stakeholders on the Web: AMethodology and Experimental Results. International Journal of Elec-tronic Business, 6(1), 2008, 2546.
Chung, W., Chen, H., Chaboya, L.G., OToole, C., & Atabakhsh, H. (2005).Evaluating event visualization: A usability study of COPLINK Spatio-Temporal Visualizer. International Journal of Human-Computer Studies,62(1), 127157.
Chung, W., Chen, H., & Nunamaker, J.F. (2005). A visual framework forknowledge discovery on the Web: An empirical study on business intelli-gence exploration. Journal of Management Information Systems, 21(4),5784.
Chung, W., Zhang, Y., Huang, Z., Wang, G., Ong, T.-H., & Chen, H. (2004).Internet searching and browsing in a multilingual world:An experiment onthe Chinese business intelligence portal (CBizPort). Journal of the Amer-ican Society for Information Science and Technology, 55(9), 818831.
Department of State. (2003). Patterns of Global Terrorism 2002: The UnitedStates Government, retrieved March 26, 2008 from http://www.state.gov/s/ct/rls/crt/2002/.
Encyclopedia Britannica Online. (2007). Jihad. Retrieved March 26, 2008from http://www.britannica.com/ebc/article-9368558, Britannica ConciseEncyclopedia.
Eom, S.B., & Farris, R.S. (1996). The contributions of organizational scienceto the development of decision support systems research subspecial-ties. Journal of the American Society for Information Science, 47(12),941952.
Etzioni, O. (1996). The World Wide Web: Quagmire or gold mine?Communications of the ACM, 39(11), 6568.
Gellman, B. (June 27, 2002). Cyber-attacks by Al Qaeda feared.Washington Post, page A01, retrieved March 26, 2008 from http://www.washingtonpost.com/ac2/wp-dyn/A50765-2002Jun26.
He, Y., & Hui, S.C. (2002). Mining a Web citation database for authorco-citation analysis. Information Processing and Management, 38(4),491508.
1358 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008DOI: 10.1002/asi
-
Kealy, W.A. (2001). Knowledge maps and their use in computer-based col-laborative learning. Journal of Educational Computing Research, 25(4),325349.
Kelley, J. (July 10, 2002). Militants Wire Web With Links to Jihad. USAToday, retrieved March 26, 2008 from http://www.usatoday.com/news/world/2002/07/10/web-terror-cover.htm.
Krebs, V.E. (2001). Mapping network of terrorist cells. Connections, 24(3),4352.
La Porte, T. M., Jong, M. d., & Demchak, C. C. (1999). Public Organi-zations on the World Wide Web: Empirical Correlates of AdministrativeOpenness. Paper presented at the Proceedings of the 5th National PublicManagement Research conference, College Station, TX.
Last, M., Markov, A., & Kandel, A. (2006). Multi-Lingual Detection ofTerrorist Content on the Web. Paper presented at the Proceedings of thePAKDD06 International Workshop on Intelligence and Security Infor-matics, Singapore, Springer, Berlin / Heidelberg, pp. 1630.
Marshall, B., McDonald, D., Chen, H., & Chung, W. (2004). EBizPort: col-lecting and analyzing business intelligence information. Journal of theAmerican Society for Information and Science and Technology, 55(10),873891.
Middle East Media Research Institute. (2004). Jihad and Terrorism Stud-ies Project. Retrieved March 2004, retrieved March 26, 2008 from http://www.memri.org/jihad.html.
Mladenic, D. (1998). Turning Yahoo into an automatic web page classifier.Paper presented at the Proceedings of the 13 European Conference onArtificial Intelligence, Brighton, UK.
Nasukawa, T., & Nagano, T. (2001). Text analysis and knowledge miningsystem. IBM Systems Journal, 40(4), 967984.
Newman, M. (2004, May 11). Video appears to show beheading ofAmericancivilian. The New York Times.
Popp, R., Armour, T., Senator, T., & Numrych, K. (2004). Countering terror-ism through information technology. Communications of theACM, 47(3),3643.
Project for the Research of Islamist Movements. (2004). PRISM, 2004,retrieved March 26, 2008 from http://www.e-prism.org.
Sageman, M. (2004). Understanding terror networks. Philadelphia, PA:University of Pennsylvania Press.
Strickland, L.S., & Hunt, L.E. (2005). Technology, security, and individ-ual privacy: New tools, new threats, and new public perceptions. Journalof the American Society for Information Science and Technology, 56(3),221234.
Technical Analysis Group. (2004). Examining the cyber capabilities ofIslamic terrorist groups. Hanover, NH: Institute for Security TechnologyStudies at Dartmouth College.
Thomas, T.L. (2003, Spring). Al Qaeda and the Internet: The danger ofcyberplanning. Parameters, 112123.
Trybula, W.J. (1999). Text mining. In M.E. Williams (Ed.), Annual reviewof information science and technology (Vol. 34, pp. 385419). Medford,NJ: Information Today, Inc.
Tsfati, Y., & Weimann, G. (2002). retrieved March 26, 2008 fromhttp://www.terrorism.com/, Terror on the Internet. Studies in Conflict &Terrorism, 25, 317332.
Xu, J., & Chen, H. (2005). Criminal network analysis and visualization.Communications of the ACM, 48(6), 100107.
Young, F.W. (1987). Multidimensional scaling: History, theory, and applica-tions. Hillsdale, NJ: Lawrence Erlbaum Associates.
Zhu, B., & Chen, H. (2005). Chapter 4: Information Visualization. InB. Cronin (Ed.), Annual Review of Information Science and Technology(Vol. 39, pp. 139177). Medford, NJ: Information Today, Inc.
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGYJune 2008 1359DOI: 10.1002/asi