the b.e. journal of economic analysis & policy · 2019. 4. 8. · identifying terrorists using...

23
The B.E. Journal of Economic Analysis & Policy Advances Volume 13, Issue 3 2012 Article 3 F ORENSIC E CONOMICS Identifying Terrorists using Banking Data Steven D. Levitt * * University of Chicago Department of Economics, [email protected] Recommended Citation Steven D. Levitt (2012) “Identifying Terrorists using Banking Data,” The B.E. Journal of Eco- nomic Analysis & Policy: Vol. 13: Iss. 3 (Advances), Article 3. DOI: 10.1515/1935-1682.3282 Copyright c 2012 De Gruyter. All rights reserved. Brought to you by | Georgia Institute of Technology Authenticated Download Date | 11/24/14 10:12 PM

Upload: others

Post on 27-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

The B.E. Journal of EconomicAnalysis & Policy

AdvancesVolume 13, Issue 3 2012 Article 3

FORENSIC ECONOMICS

Identifying Terrorists using Banking Data

Steven D. Levitt∗

∗University of Chicago Department of Economics, [email protected]

Recommended CitationSteven D. Levitt (2012) “Identifying Terrorists using Banking Data,” The B.E. Journal of Eco-nomic Analysis & Policy: Vol. 13: Iss. 3 (Advances), Article 3.DOI: 10.1515/1935-1682.3282

Copyright c©2012 De Gruyter. All rights reserved.

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

kleessen
Schreibmaschinentext
12
kleessen
Schreibmaschinentext
12
Page 2: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Identifying Terrorists using Banking Data∗

Steven D. Levitt

Abstract

The fight against terrorism requires identifying potential terrorists before they have the op-portunity to act. In this paper, we investigate the extent to which retail banking data – which asfar as we know are not currently used by anti-terror intelligence agencies in any systematic man-ner – are a useful tool in identifying terrorists. Using detailed administrative records of a largeBritish bank, we demonstrate that a number of variables in the data are strongly correlated withterrorism-related activities. Having both an Islamic given name and surname, not surprisingly, areamong the strongest of these predictors, but a wide range of other demographic characteristics andbehaviors observed in the data are also correlated strongly with terrorist involvement. The real keyto our method, however, rests on the identification of one particular pattern of banking behavior(what we call “Variable Z”) which dramatically improves our ability to identify terrorists. Ourmodel is demonstrated to have substantial power to identify terrorists both within sample and outof sample.

KEYWORDS: terrorism, forensic economics

∗We would like to thank Gary Becker, Stephen Dubner, John List, Kevin Murphy, and Chad Syver-son for helpful discussions on this topic, as well as numerous individuals who are employed in theanti-terror intelligence effort and by the bank which provide the data for the analysis. Lint Bar-rage, Adam Castor, Dana Chandler, Steve Cicala, Marina Niessner, and Dhiren Patki providedoutstanding research assistance. Correspondence should be addressed to Steven Levitt, Depart-ment of Economics, University of Chicago, 1126 E. 59th street, Chicago, IL 60637. The secondauthor, an employee of the bank, writes under a pseudonym.

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 3: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

SECTION I. INTRODUCTION

Nearly 3,000 people died as a result of terrorist acts carried out on September 11,

2001. The full costs of terrorism, however, stretch far beyond the pain and

suffering of the direct victims of a particular attack. Terrorism induces fear and

disutility among the broader population (Becker, 2004). One manifestation of this

fear is behavioral distortions that are far more extreme than might appear to be

warranted based upon the actual risk of being a victim of a terror attack. These

behavioral distortions are accompanied by additional costs. The substitution away

from air travel, which is much safer than traveling by automobile, contributed to a

spike in motor vehicle fatalities after the September 11th

terrorism (Blalock,

2009). On a much grander scale, terrorist activities were one of the primary

motivations for initiating the wars in Afghanistan and Iraq, during which it is

estimated as many as 873,000 civilians1 and 6,598 American soldiers

2 have died.

Even terrorist attempts that fail can impose large costs. Eight years after Richard

Reid’s bungled attempt to detonate a shoe bomb on a transatlantic flight, airline

travelers continue to be required to remove their shoes during security screening.3

Preventing terrorism is a difficult task because there is almost no limit to

the variety of potential terrorist attacks. While the September 11th

terrorism was a

large, well-coordinated, and well-funded effort, much simpler schemes have also

proven effective. During the three week period when the “Washington snipers”

were shooting innocent victims largely at random, economic and other activities

in the area were sharply curtailed as a consequence of the actions of a man, a

child, and a rifle. Because there are so many potential targets, focusing efforts on

safeguarding these targets is a difficult and costly endeavor.

An alternative approach to terrorism focuses on identifying individuals

likely to engage in terror acts. Anti-terror efforts of this kind are built on three

sources of information: human informants, surveillance of communications via

phone, email, or internet, and following the international money trail. In this

paper, we explore a fourth potential source of information that, up until now, has

not been used extensively in the fight against terrorism:4 the data generated by

daily retail banking transactions. Specifically, we combine depersonalized

1 Estimates of civilian casualties come from the Casualties in Afghanistan & Iraq project of

www.unknownnews.org as of August 10th

2010. 2 Estimates of U.S. service member deaths since the beginning of the Afghanistan and Iraqi

military operations come from the Faces of the Fallen project of the Washington Post as of

October 31, 2012 (http://apps.washingtonpost.com/national/fallen/). 3 Over 500 million travelers pass through security in American airports each year. If each of them

spent just one minute removing their shoes and putting them back on since the policy was

instituted, this additional step has absorbed roughly 8,000 person-years of traveler time. 4 This assertion is based on private conversations with numerous leading figures in the intelligence

community engaged in fighting terrorism.

1

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 4: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

administrative data from a large British bank with a host of other publicly

available data sources to investigate whether personal banking data can be used as

a tool for identifying terrorists. These data include demographic characteristics of

the customers, account characteristics, and banking transactions.

Over the period from October 2007 until February 2009, more than 100 of

the bank’s millions of customers were arrested or investigated by law

enforcement for suspected terrorist activity.5 We denote these individuals

“positives.” In the statistical analysis, we attempt to identify similarities in traits

and patterns of behavior among these positives that distinguish these individuals

from the rest of the sample. Perhaps unsurprisingly, the single best demographic

predictors in the data set are having Islamic first and last names. This

characteristic alone increases the likelihood of being a positive fifty-fold.

Positives also tend to be young and male. A number of behavioral characteristics

also prove useful in identifying positives: living in close proximity to mosques,

types of financial products used, whether the person owns their place of residence,

and the fraction of transactions made during traditional Muslim prayer hours on

Fridays. In addition to these variables, there is one further behavioral indicator

which proves to be extremely powerful in predicting positives – so powerful that

the cooperating bank has asked us not to disclose the precise nature of the

measure in the interest of national security. We refer to this variable henceforth as

“Variable Z.”6 Without revealing too much about Variable Z, we can say that the

idea for it came out of economic theory, and the select individuals who have been

told the nature of Variable Z immediately recognized why it would be such an

effective predictor. On the other hand, we have had many people attempt to guess

the identity of Variable Z, and none of those guesses has been remotely close to

accurate.

The model estimated in this paper has substantial power to identify

positives in the data.7 The overall prevalence of positives in the bank’s customer

pool is .00073 percent, or roughly 1 in 140,000. Among the actual set of positives,

the estimated predicted likelihood of being positive is approximately 700 times

higher than for the population as a whole.8 For roughly six percent of the

5 These arrests were made prior to our study and were not based on our analyses.

6 Indeed, maintaining the confidentiality of Variable Z is consistent with prior academic research

on fraud detection which, for obvious reasons, has paid more attention to analytical tools than to

specific methods of detection (Bolton and Hand 2002). 7 The model we develop targets a very specific type of terrorist; clearly in another time and place,

a different model would be necessary, although we suspect that the same principles would be at

work. 8 In making these predictions for a particular positive, we estimate the model using all of the data

except the information for that one individual, and then fit the model that excludes that individual

to that person’s data. This avoids the obvious bias that arises if one fits the model using this

individual’s data and then makes predictions based on that fit.

2

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 5: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

positives, the predicted likelihood of their being a terrorist is estimated to be

greater than .8 percent; among the overall bank population, only one in 65,000

people cross that threshold.

The model also generates strong predictions regarding particular

individuals who have not been identified by authorities as positives, but appear to

have sets of characteristics that make them likely to be positives. Based on data

from October 2007 –February 2009, we identified 90 bank customers whom we

viewed as serious terrorist risks. Over the next 14 months, 0.00054 percent of the

bank’s customers were arrested on terrorist related charges9. Two of these arrests

were of individuals identified on our 90 person watch list. The likelihood of this

occurring by chance is vanishingly small.10

The analysis in this paper contributes to two separate economic literatures.

The first of these is what some have dubbed “forensic economics,” (Zitzewitz,

2012) in which economists use subtle data patterns to ferret out evidence of

cheating and corruption in settings as varied as tax evasion (Fisman and Wei,

2004), illegal weapons dealing (DellaVigna and Le Ferarra, 2007), procurement

(e.g. Di Tella and Schargrodsky, 2003; Olken, 2007), corruption (Fisman, 2001;

Olken, 2009), employee sabotage (Krueger and Mas, 2004), cheating on

standardized tests (Jacob and Levitt, 2003), and sports (Duggan and Levitt, 2002,

Zitzewitz, 2006).

Terrorism poses unique challenges not present in these earlier forensic

applications. First, most prior examples of forensic economics have centered on

detecting cheating or fraud in a narrowly defined context (e.g. cheating on

standardized tests, collusive bidding, the building of roads, etc.) after the fact.

The goal of this analysis is different: to identify potential terrorists before they

have actually carried out the terrorist attack. A second difference between this

application and prior research is that in the settings studied previously, the

prevalence of cheating was generally high among the target population, whereas

terrorists represent an extremely small share of the population. Reliable estimates

of the number of terrorists are difficult to obtain, but given the near absence of

terrorism on United States soil since the attacks that took place on September 11,

2001, it would not seem unreasonable to argue that the prevalence of terrorists in

the United States is less than one per one million residents. In the United

Kingdom, the target of this analysis, terrorist acts and arrests of suspected

terrorists have been more frequent in recent years, suggesting the prevalence of

terrorists is likely to be greater. According to Jonathan Evans, the Director-

9 In order to protect anonymity of the bank, we report only the percentage and not the exact

number. 10

Upon discovery that our methods appeared efficacious, the cooperating bank provided the list

of names to the appropriate authorities. As of this writing, we have received no information

regarding the value of the list in the war on terror.

3

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 6: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

General of MI-5, there were 4,000 operating terrorists as of November 2007

which implies roughly 65 terrorists per million residents of the UK.11

The second literature with which this paper connects is the growing body

of research focused directly on terrorism and its effects. The work that is most

similar in spirit is Krueger (2007), which addresses many terrorism related issues,

including the fact that terrorists are disproportionately drawn from amongst the

relatively well educated. Krueger (2007) identifies characteristics observed

among terrorists, but unlike the current work, does not attempt to extend his

analysis to predicting which individuals are likely to be terrorists. Pape (2005),

based on an analysis of an exhaustive database of suicide bombing and terrorist

attacks between 1980 and 2003, concludes that foreign occupation is the strongest

predictor of terrorism. A number of papers have tried to measure the costs

associated with terror. Becker and Rubinstein (2004) focus on the fear-related

costs of terrorism. Other research has tried to measure its impact on economic

activity (Abadie and Gardeazabal, 2008; Blomberg, Hess, and Orphanides, 2004;

Eckstein and Tsiddon, 2003; Zussman and Zussman, 2006).

The remainder of this paper is structured as follows. Section II describes

the data used in the analysis. Section III presents the statistical model and the

results. Section IV concludes.

II: DATA DESCRIPTION

Three main sources of data are used in this analysis. The first source of data

comes from the depersonalized administrative records of a large bank in the

United Kingdom. Because of privacy concerns, only data with all personal

identifiers removed, (including names, account numbers, and addresses) have

been made available to members of this project who are not bank employees. As a

further privacy safeguard, the bank provided only coarse data on variables such as

age and when the account was opened. In addition to demographic and account

data, there are also records corresponding to individual banking transactions, such

as debit card purchases. As with the demographic information, the transactions

data provided to researchers outside the bank have been veiled in a manner that

fully protects the privacy of bank customers.

The second data source used in the analysis is a list of people who have

been arrested or investigated on terrorist charges in the United Kingdom. This list

of names was constructed primarily based on information provided at the website

www.salaam.co.uk, which has a database of arrests on terrorism related charges.

Information from this list was supplemented with data from other newspaper

11

Rise in number of terrorists.05 November 2007. Manchester Evening News. Retrieved: 26 April

2010 (http://www.manchestereveningnews.co.uk/news/special_reports/editors/s/

1022830_rise_in_number_of_terrorists)

4

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 7: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

reports of terrorism arrests, as well as customer accounts whose banking records

had been requested by law enforcement in anti-terror investigations. Of the names

gathered from these various sources, 112 were determined to be bank customers.12

The third source of data compiled for the paper is a list of predominantly

Islamic names, which was constructed using a variety of public sources including

telephone books from Islamic nations and baby naming books. Because of the

great variety in names, a large number of less common names are missing from

the list, introducing noise into our name-based classifications.

CHOICE OF SAMPLE

With millions of customers, the bank’s activities produce an extraordinary amount

of data. Consequently, the bank keeps data in active storage for only 14 months at

a time. After that period, the data are archived. We were not able to obtain access

to any archived data for this study.

Because of the massive scale of the data, our main analysis is limited to a

small subset of the bank’s customers that includes the 112 positives as well as a

random sample of roughly 19,000 individuals that over-weights people with given

names or surnames that appear on the list of Islamic names described above. The

sampling rate of individuals with no Islamic names is roughly a one in 2,000. For

those with exactly one name identified as Islamic, the sampling rate is about one

in 100. People with both names on the Islamic list are sampled at a one in 35 rate.

Ultimately, we have between 6,000 and 7,000 individuals in the sample in each of

the three categories. Additionally, after the importance of Variable Z became

apparent, a separate extract of 294 customers with high values for Variable Z was

carried out. This last extract is not used in estimation of the model, but rather,

only to identify those non-positive customers with the greatest likelihood of being

involved in future terrorist activities. The high Variable Z sample, like the

positives, includes all individuals who fit that category.

Table 1 presents summary statistics for our six different groups: the bank’s

overall population of customers, the positives, the three random samples stratified

by Islamic name status, and the sample of non-positive individuals who have high

values of Variable Z. Fewer than 1 in 100,000 of the bank’s customers are

positives. We explicitly exclude positives from the other samples, although given

their rarity in the data, very few positives would be expected in the sample sizes

we use. The three variables included under the heading Islamic name are mutually

12

Since the bank provided us with depersonalized data, we are unable to ascertain what fraction of

the 112 arrestees was convicted on terrorism charges. However, data from the UK Home Office

reveal that between 2001 and 2012, 23 percent of the cumulative 2174 individuals arrested under

anti-terror laws have been convicted (http://www.homeoffice.gov.uk/publications/science-

research-statistics/research-statistics/counter-terrorism-statistics/hosb1112/).

5

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 8: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

exclusive indicator variables corresponding to whether both the given name and

surname are identified as Islamic, neither the given name nor the surname of the

individual is identified as Islamic, or only the given name or surname is identified

as Islamic. Those with Islamic names are overrepresented in the list of positives.

Whereas only 1.3 percent of the bank’s customers have two Islamic names, 72.3

percent of the positives fall into that category. The share of positives with one

Islamic name is almost three times higher than the corresponding share in the

overall customer pool.

The next row in the table corresponds to Variable Z, which is a continuous

variable with a weight of mass near zero and a long right-tail. For the banking

population as a whole, the mean of Variable Z is very close to zero. Among the

positives, the mean of Variable Z is near 20. Those with two Islamic names have

a mean value of Variable Z that is much higher than those with no Islamic name,

but still over 70 times lower than that of positives. Inclusion in the non-randomly

drawn sample in the final column is predicated on having a high value for

Variable Z, and that fact is reflected in the high mean for that variable in the last

column: 306.

The next row in the table corresponds to gender. The bank’s overall pool

of customers is split nearly evenly by gender. In contrast, more than three-quarters

of the positives are male. Males are also overrepresented in the Islamic name

samples, most likely because our algorithm for identifying Islamic names works

better for males. Age reflects a series of indicator variables corresponding to

whether the individual falls into various age windows. Almost two-thirds of the

bank customers are over the age of 45, whereas less than 10 percent of the

positives fall into this category. Half of the positives are between the ages of 26

and 35. Those with Islamic names and the sample of high Variable Z individuals

are also younger on average than the bank’s average customer. The next three sets

of indicators correspond to marital, employment, and residential status. These

variables are captured at the time that a customer signs up with the bank and are

only sporadically updated over time, limiting their usefulness. The next set of

variables reflect patterns of ATM usage: the average value of ATM withdrawal,

and indicator variables for the percentage of ATM withdrawals made during the

nighttime, between 8pm and 6am, or during periods that coincide with Muslim

prayers on Friday. Positives are more likely to make late-night withdrawals and

much less likely to make withdrawals during Friday prayers (although,

interestingly, among those with Islamic names, withdrawals in this these time

windows are not that much lower than for the overall customer base).

6

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 9: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Table 1: Summary Statistics (Means)

Islamic Name

Overall

bank

customers

Positives

Both

names

Muslim

One

Muslim

Name

No

Muslim

Names

High

Variable

Z

sample

(1) (2) (3) (4) (5) (6)

N 18,929 112 5,802 6,210 6,805 294

Positive 0.0007% 100.0%

Muslim Names: None 93.5% 13.4%

100.0% 54.4%

Muslim Names: First name only 2.4% 8.0%

45.8% 3.1%

Muslim Names: Last name only 2.8% 6.3%

54.2% 13.9%

Muslim Names: Both 1.3% 72.3% 100.0%

28.6%

Variable Z -0.039 20.339 0.274 0.197 -0.057 305.658

Gender and Age

Male 49.1% 78.6% 65.8% 56.2% 48.5% 55.8%

Age: Under 16 0.1% 0.0% 0.0% 0.3% 0.1% 0.0%

Age: 16 to 25 4.6% 21.4% 8.1% 15.9% 3.9% 18.0%

Age: 26 to 35 8.3% 50.0% 18.3% 21.8% 7.5% 28.6%

Age: 36 to 45 19.4% 18.8% 25.3% 19.5% 19.3% 25.5%

Age: Over 45 67.4% 9.8% 48.2% 42.3% 69.1% 27.6%

Age: Unknown 0.1% 0.0% 0.1% 0.1% 0.1% 0.3%

Marital Status

Single 20.8% 50.9% 22.8% 31.8% 20.1% 38.8%

Married 55.4% 37.5% 62.6% 40.2% 56.1% 44.6%

Other (Widowed, divorced, etc.) 10.2% 1.8% 4.9% 7.0% 10.4% 5.4%

Unknown 13.7% 9.8% 9.7% 21.0% 13.4% 11.2%

Employment Status

Employed 47.8% 57.1% 46.1% 52.7% 47.6% 61.9%

Self-Employed 4.3% 5.4% 8.7% 6.2% 4.1% 3.4%

Retired 19.3% 0.0% 10.6% 9.5% 20.0% 3.7%

Unemployed 1.7% 6.3% 3.7% 3.3% 1.6% 4.8%

Full-time Student 2.4% 13.4% 5.3% 8.9% 2.0% 4.4%

Housewife 3.5% 4.5% 10.0% 4.9% 3.3% 12.2%

Unknown 21.0% 13.4% 15.7% 14.5% 21.4% 9.5%

Residential status

Owner 55.2% 11.6% 51.6% 35.6% 56.4% 21.1%

Renter 17.7% 45.5% 19.4% 28.1% 17.1% 42.5%

With parents 7.0% 22.3% 14.4% 15.2% 6.5% 18.0%

Other 3.4% 5.4% 4.3% 4.4% 3.3% 6.5%

Unknown 16.6% 15.2% 10.4% 16.7% 16.7% 11.9%

Proximity to Mosque 10.4% 32.1% 22.0% 16.0% 9.9% 24.8%

Notes: Column 1 shows weighted averages for the overall bank customers who were randomly sampled at

different rates depending on their first and last names. Column 2 shows data for the 112 positives we

identified. Columns 3 through 5 separate out our random sample by name status and Column 6 shows only

the sample that we specially selected because they were high on Variable Z

7

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 10: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Table 1 (continued): Summary Statistics (Means) Islamic Name

Overall

bank

customers

Positives

Both

names

Muslim

One

Muslim

Name

No

Muslim

Names

High

Variable

Z

sample

(1) (2) (3) (4) (5) (6)

ATM Usage

Average withdrawal amount (£) 84 97 98 77 84 73

% of Withdrawals during Nighttime

(8pm-6am) 5.2% 15.6% 8.6% 8.6% 4.9% 8.3%

% of Withdrawals during Friday

prayer (10am-12pm) 4.2% 1.9% 2.8% 3.3% 4.3% 3.1%

% of Withdrawals during Friday

prayer (12-1pm) 2.0% 1.2% 1.6% 1.8% 2.0% 1.7%

% of Withdrawals during Friday

prayer (1-3pm) 3.7% 3.0% 3.3% 3.6% 3.7% 3.1%

Types of financial products

Business customer 2.5% 2.7% 2.3% 5.7% 2.3% 2.7%

Debit/credit cards 56.6% 52.7% 55.4% 45.1% 57.3% 49.0%

Loans (excluding Mortgages) 43.0% 19.6% 30.1% 32.4% 43.8% 26.5%

Mortgages 9.9% 0.9% 6.5% 7.9% 10.1% 5.8%

Life Insurance 41.0% 25.9% 34.5% 31.6% 41.6% 39.8%

Savings products 65.3% 25.9% 51.6% 54.0% 66.1% 50.3%

Extras 0.9% 0.9% 0.3% 0.9% 0.9% 0.7%

Longterm 47.0% 7.1% 30.6% 29.1% 48.2% 20.7%

Notes: Column 1 shows weighted averages for the overall bank customers who were randomly sampled at

different rates depending on their first and last names. Column 2 shows data for the 112 positives we

identified. Columns 3 through 5 separate out our random sample by name status and Column 6 shows only

the sample that we specially selected because they were high on Variable Z

Proximity to mosque is an indicator variable that takes on a value of one if

the customer’s postal code is within one mile of a registered mosque. 10 percent

of all customers fall into this category. 22 percent of those with two Islamic

names live near a mosque; nearly one-third of the positives do.13

The remaining

rows are indicators for the types of products associated with the customer’s

accounts. These categories are not mutually exclusive. The precise definitions of

these variables are presented in the data appendix. Positives are less likely to use a

number of the bank’s services, including loans, mortgages, savings accounts, and

“long-term” products such as life insurance.

13

Other than proximity to a mosque, no other geographic identifiers (e.g. region of the country)

were made available for the analysis.

8

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 11: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Table 2: Distribution of Variable Z

Islamic Name

Overall

bank

customers

Positives

Both

names

Muslim

One

Muslim

Name

No

Muslim

Names

High

Variable

Z sample

(1) (2) (3) (4) (5) (6)

% Positive on Variable Z 0.24% 8.93% 0.41% 0.14% 0.04% 100.00%

Distribution of Variable Z (when Variable Z > 0)

10th percentile 52.5 113.7 51.6 37.6 93.2 72.3

25th percentile 71.8 150.6 56.5 74.8 93.2 132.1

Median 109.0 169.8 75.1 190.7 165.4 209.7

75th percentile 190.7 229.1 127.9 318.2 195.1 339.1

90th Percentile 318.2 509.1 182.9 480.2 195.1 474.3

Mean 155.2 234.5 102.8 208.0 151.2 305.7

Notes: The first row illustrates the percentage of customers in each of our groups who have positive values of

Variable Z. In the bottom portion of the table, we include only those people who have positives values of

Variable Z and show those values at different points in the distribution. The last row shows the mean value of

Variable Z conditional on the customer having a positive value of Variable Z.

Because Variable Z plays such a critical role in the analysis, Table 2

provides greater detail on the distribution of Variable Z in the various samples.

The top row of the table shows the share of individuals with non-zero values for

Variable Z. The subsequent rows present the value of Variable Z at various

points in the distribution, conditional on having a non-zero value. Only 0.24

percent of banking customers have a positive value for Variable Z, compared to

8.93 percent of the positives. 0.41 percent of those with two Islamic names have a

non-zero value for Variable Z. Conditional on having a positive value for

Variable Z, there are no strong patterns of difference across the positives and the

randomly drawn samples.

SECTION III: ESTIMATION APPROACH AND RESULTS

We estimate probit models in which the dependent variable is an indicator

variable equal to one if the customer has been identified as a suspected terrorist by

the police (i.e. is a “positive”), and otherwise is equal to zero.14

Included on the

right-hand-side of the equation are all of the variables described in the summary

statistics above. The results are presented in Table 3. Column 1 pools all of the

14

Here, as elsewhere in the paper, we exclude the special extract of high variable Z individuals,

since it was not randomly drawn, but rather constructed ex post to allow identification of the most

suspicious customers.

9

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 12: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

data. Columns 2 and 3 divide the sample by Islamic name status. Column 2 is

restricted to those with both Islamic given names and surnames and column 3

limits the sample to those with only one Islamic given name or surname or those

with no Islamic names. In all columns, the estimates are probability weighted to

make the data set representative of the bank’s overall customer pool (column 1),

or subsets of the data by Islamic name status (columns 2 and 3). The values

reported in the table are the Z-statistics for each variable from the probit

estimation. Because the coefficients have no easy interpretation with respect to

magnitude, the discussion of Table 3 focuses on statistical significance; in Table 4

we present results that speak more directly to the magnitude of the impact of the

key explanatory variables.15

For those sets of variables which are mutually

exclusive and exhaustive indicators (e.g. housing status, age, etc.) the omitted

category is identified in the table notes.

Column 1 presents the pooled results. As would be expected given the

summary statistics presented earlier, having two Islamic names enters strongly

positive with a z-stat over 18. Having either a first or last name identified as

Islamic is also associated with an increased likelihood of being a positive, but to a

much lesser extent. Variable Z enters with a positive sign and a z-stat over five.

Only a handful of the other covariates achieve statistical significance in column 1.

One of these is being older than 45 years of age (relative to the omitted category

of age 25 or less), which enters negatively. Being a renter or having an unknown

housing status (relative to being a home owner) enters positively. Living close to

a mosque also carries point estimates that are positive and significant. Having a

savings account and the fraction of a customer’s ATM transactions that occurred

during Friday prayers enter negatively and with significance. Note that a number

of variables which were highly correlated with being positive in the raw data (e.g.

being male, having “long-term” products with the bank), are not statistically

significant after controlling for other factors.

15

Note that we do not follow the more common practice of reporting the implied marginal effects

of the probit estimates evaluated at the sample mean in Table 3. Because the fraction of positives

is so small in our data set, evaluating marginal effects at the mean, or even in the 90th

percentile of

the distribution proves not to be particularly informative.

10

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 13: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Table 3: Z-scores for Probit Estimation

Overall

bank

customers

Two

Muslim

Names

One or No

Muslim

Names

(1) (2) (3)

Muslim Names: First name only 4.44***

4.68***

Muslim Names: Last name only 2.07*

3.00***

Muslim Names: Both 18.09***

Variable Z 6.50*** 5.79*** 4.47***

Gender and Age

Male 2.20* 2.69** 0.79

Age: 26 to 35 0.78 2.19* -2.06*

Age: 36 to 45 -2.25* -0.65 -2.79**

Age: Over 45 -4.50*** -3.33*** -3.50***

Employment Status

Self-Employed -0.43 -0.55 0.24

Unemployed 1.52 1.89 0.19

Full-time Student 0.02 1.18 -1.16

Homemaker -0.28 -0.12 1.01

Marital Status

Single 1.64 0.87 0.95

Married 1.65 0.63 1.87

Residential status

Renter 3.91*** 3.54** 1.35

With parents 2.12* 1.87 1.44

Other 1.96* 1.45 1.40

Unknown 4.57*** 3.75*** 2.42*

Proximity to mosque 2.51* 0.92 3.43***

ATM Usage

Average withdrawal amount 2.56* 1.51 2.44*

Withdrew during Nighttime (8pm-6am) 2.74** 2.23* 2.01*

Withdrew during Friday prayer (10am-12pm) -1.50 0.34 -2.04*

Withdrew during Friday prayer (12-1pm) -0.81 -1.00 -0.07

Withdrew during Friday prayer (1-3pm) -1.44 -0.02 -1.80

Types of financial products

Business customer 0.89 -0.63 1.50

Debit/credit cards 2.33* 2.03* 1.37

Loans (excluding Mortgages) 0.69 0.18 0.80

Mortgages -1.02 -0.28 -

Life Insurance -0.84 -1.46 0.07

Savings products -4.05*** -3.42*** -2.18*

Extras 1.36 1.99* -

Longterm -1.94 -0.66 -2.05*

Number of observations 18,929 5,883 13,046

Notes: Column 1 shows estimated z-scores for each variable for our entire sample of randomly sampled

customers and positives. Because the high Variable Z customers were non-randomly selected, we cannot use

them for estimation. Column 2 estimates a separate model for customers with two Islamic names. Column 3

estimates the model for customers with zero or one Islamic names. Significance levels are indicated by * p <

.050, ** p < .010, *** p < .001. Omitted categories include: females under 25 (for age and gender), those

who are employed (as well as a small number of other miscellaneous employment statuses such as unknown,

no response, other or retired), persons divorced or separated (or with unknown/no responses), and

homeowners.

11

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 14: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Because having two Islamic names is so strongly correlated with the

dependent variable, columns 2 and 3 divide the sample according to whether the

customer has two Islamic names.16

Column 2 presents the results for those with

two Islamic names. With one exception, the variables that were statistically

significant in the pooled regression continue to be significant and enter with the

same sign in column 2. The lone exception is the variable measuring proximity to

a mosque, which remains positive, but is no longer statistically significant. A few

additional variables gain statistical significance when the sample is restricted to

those with two Islamic names: being male is associated with a higher rate of being

positive, as is having opened an account within the past two years, being

unemployed, and having “extras” associated with the account.

The estimates in column 3, which excludes those with two Islamic names,

generally paint a similar picture to those of column 1 which includes the whole

sample. One difference is that the coefficients on 26-35 and 36-45 year olds take

on statistically significant negative coefficients roughly the same magnitude as

those for customers over the age of 45. Among the population that does not have

two Islamic names, it is those under the age of 25 who are most heavily

represented among the positives.

DOES THE MODEL HAVE SUFFICIENT EXPLANATORY POWER TO BE OF USE IN

PROSPECTIVELY IDENTIFYING POSSIBLE TERRORISTS?

One goal of this analysis is to provide an additional tool for anti-terror law

enforcement efforts. A necessary condition in that endeavor is that the model

generates sufficiently strong predictions in the right-hand tail to warrant allocating

investigative resources towards those deemed suspicious.17

Table 4, which

presents the distribution of fitted values from the three specifications in Table 3,

sheds light on that question. The columns of Table 4 correspond to the same

columns in Table 3, i.e. column 1 includes the entire sample, column 2 is limited

to those with two Islamic names, and column 3 excludes those with two Islamic

names. The top panel of Table 4 shows estimated values for the randomly drawn

16

Note that only the positives with two Islamic names are included in column 2, and the opposite

is true in column 3. We pool those observations with fewer than two Islamic names because the

frequency of positives is so low among this group, especially for those with no Islamic names.

There are only eight positives among the more than ten million customers in this category, and

thus little information to identify the coefficients. 17

Of course, a second necessary condition to make this model useful to legal authorities is that the

predictions of the model in the extreme right tail are accurate. Evidence on this point will arise

with the passage of time as more positives appear in the data. If, indeed, anti-terror authorities

(who have been provided a list of the most suspicious individuals by the bank) are sufficiently

moved by the arguments in this paper to investigate those deemed most suspicious, the learning

process will occur much more rapidly.

12

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 15: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

sample that is representative of the banks overall customer base. The second panel

shows results for the actual positives in the sample. The bottom panel presents

estimates for the non-randomly drawn subset of the bank’s customers who have

high values for Variable Z. For the top and bottom panels, the numbers presented

in the table are the fitted values from the probits in Table 3. For the positives, we

generate these fitted values by running the same specifications reported in Table

3, but excluding that particular individual from the specification when creating the

predicted value. This approach guarantees that no information about a specific

positive is used when constructing his or her predicted value.

The top panel of Table 4 presents results for the random draw of bank

customers. As would be expected given the low frequency of positives in the data,

the predicted mean for this group is vanishingly small: 0.000007 for the overall

sample, and still quite small (0.000392) in column 2 when the sample is restricted

to those with two Islamic names. Even at the 99th

percentile of the distribution of

all bank customers, the estimated likelihood of being positive is only .000085, or

about one in 12,000.

The predicted mean for the actual positives (0.005184) in the middle panel

of Table 4, is not high in absolute terms – the model predicts that there would be

only about .58 positives among the 112 true positives whereas in reality all of the

positives were positives– but the difference relative to the predictions for the

random sample of customers is impressive. If one restricts the sample of positives

to those with two Islamic names, then the model identifies the actual positives in

the sample as being roughly 700 times more likely to be positive than a customer

randomly drawn from the bank’s pool (.005184 versus .000007) and over five

times more likely to be positive than a randomly drawn bank customer with two

Islamic names (.001998 versus .000392).18

Actual positives with two Islamic

names (.001998) have a predicted value that is roughly 90 times larger than

positives with zero or no Islamic names (.000023).

18

It is important to stress that there is nothing mechanical about this relationship since the model

that generates predicted value for the positives in the sample excludes that individual.

13

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 16: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Table 4: Predicted Probability of Being a Positive

at Different Points in the Distribution

All

customers

Two

Muslim

Names

One or No

Muslim

Names

(1) (2) (3)

Random sample

Mean 0.0007% 0.0392% 0.0002%

Median 0.0000% 0.0060% 0.0000%

90th percentile 0.0004% 0.1086% 0.0003%

99th percentile 0.0085% 0.3924% 0.0033%

Positives

Mean 0.5184% 0.1998% 0.0023%

Median 0.0676% 0.1072% 0.0008%

90th percentile 0.4280% 0.5324% 0.0064%

99th percentile 5.5188% 1.4764% 0.0119%

High Variable Z

Mean 5.3669% 14.6104% 2.2638%

Median 0.0805% 0.9256% 0.0116%

90th percentile 3.4010% 95.4286% 0.6030%

99th percentile 100.0000% 100.0000% 89.4580%

Notes: Columns 1-3, defined as before, show the predicted probabilities for being positive based on the three

models in Table 3. The top panel shows various points in the distribution for the randomly selected sample.

The middle panel shows estimated probabilities for the positives (when making these estimates, each positive

is first excluded from estimation). The bottom panel shows people who were selected for having values of

Variable Z. Because the sample of High Variable Z individuals is so small, we omit the results for the 99th

percentile for this group.

The final panel of the Table 4 reports results for individuals with high

Variable Z. The entries in this bottom panel should be thought of differently than

the other entries in the table, because this high-value sample is not randomly

drawn, but rather, explicitly selected on this trait. Because there is almost no data

in this range for Variable Z included in the estimating specifications,

extrapolating the estimates to this sample is highly speculative and the functional

form assumptions of the probit will be critically important, since these values for

14

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 17: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Variable Z are largely out of sample.19

With that important caveat squarely in

mind, the bottom panel reports results for this group. This sample has by far the

highest mean predicted value for being positive (far higher, even, than the actual

positives): 0.053669.

For our high Variable Z sample with two Islamic names, we predicted a

mean value of .146104 which is a more than 370 times improvement over

selecting randomly from among people who have two Islamic names.

Because our primary interest is in the ability of the model to identify

suspects in the far-right hand tail of the distribution, we report other moments of

the data in addition to means. The same patterns emerge when moving from the

means of the data into the right tail. For instance, the model’s predictions for the

actual positives, evaluated at the 90th

percentile of the data, are more than 1,000

times larger than for the randomly drawn sample (.004280 versus .000004).

The extreme results for the specially drawn sample of high Variable Z

customers highlight the critical role that this variable plays, especially in the

upper tail of the distribution; the 90th

percentile individual in our high Variable Z

sample is estimated to have a 3.4 percent chance of being a positive (and among

those with two Islamic names, the estimated probability is near one), although

again we must emphasize that estimate is based on extrapolating the probit results

far out of sample.20

The analysis above focuses on the performance of the full model.

Additional perspective can be gained by analyzing how the performance of the

model degrades when particular information sources are not utilized. Table 5

presents these results. Each entry in the table corresponds to the expected number

of positives who would be identified based on the number of individuals

identified as suspicious (the rows) and the particular model under consideration

(the columns). Moving down the rows of the table, we systematically reduce the

number of customers flagged as suspicious. In the top row, the 10,000 customers

with the highest predicted values from the model are flagged; in the bottom row,

only 250 customers are flagged.

19

Under a logit specification, the results do not change significantly except that predictions in the

far right-tail become slightly more extreme. 20

The key feature of Variable Z, relative to our other explanatory variables, is that it a continuous

variable with a long right tail, whereas the other covariates are indicator variables. Consequently,

the potential value of Variable Z in identifying terrorists is far out of proportion with its degree of

statistical significance in the probit.

15

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 18: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Table 5: The Tradeoff Between False and True Positives Across

Models

Number of

people

identified

At least 1

Islamic

Name

2 Islamic

Names

Full model

without

names or

Z

Full model

without Z

Full model

with Z but

without

names

Full model

(1) (2) (3) (4) (5) (6)

10,000 0.20 3.98 6.62 25.95 10.98 30.10

5,000 0.10 1.98 4.12 15.78 8.92 19.80

2,500 0.05 0.99 2.61 9.18 7.28 13.14

1,000 0.02 0.39 1.33 4.34 5.99 8.07

500 0.01 0.20 0.69 2.41 5.27 5.89

250 0.00 0.10 0.46 1.33 4.41 4.40

Notes: The table entries are the predicted number of terrorists identified, when using the model specified at

the top of the column. Each row corresponds to a differing stringency of screen, e.g. the top row reports the

expected number of terrorists among the 10,000 individuals with the highest predicted likelihood of being a

terrorist; row two is the same information, but for only the 5,000 most likely terrorists according to each

model. Expected values are determined by multiplying the actual flagged list size by the mean fitted

likelihood of being positive within that list. For positives, fitted values are generated by running the model,

but excluding that particular individual from the specification when creating the predicted value.

The columns of Table 5 correspond to different models. In column 1, the

only information exploited is whether a person has at least one Islamic name.

Column 2 conditions only on a person having two Islamic names. Column 3 uses

all the variables in the model except the names and variable Z. Column 4 is based

on the full model, including names, but excluding variable Z. Column 5 uses the

full model, including variable Z, but leaves out the names variables. The final

column corresponds to the full model. The rows of the table capture how many

tight the screen is.

Columns 1 and 2 are pure religious profiling. As is evident from column 1,

one Islamic name, by itself, is not a powerful signal. Screening 10,000 randomly

chosen individuals fitting that criteria would yield only one-fifth of a terrorist.

Screening purely on two Islamic names does better, but still performs quite poorly

compared to the fuller models. An important shortcoming of names as an

indicator is that they do not make strong predictions in the far tail; there are many

thousands of customers with Islamic names; each of these are predicted to be

equally likely to be terrorists when only names are used.

Column 3 takes the opposite approach to Columns 1 and 2, excluding

information on Islamic names. Variable Z is excluded as well. Column 3

outperforms the name-only models across the board, and does especially well as

the number of suspicious individuals screened is reduced.

16

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 19: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Column 4 uses names and behavioral data – everything except Variable Z.

Combining names and behavior produces results that are far better than relying on

either one alone. For instance, when 10,000 customers are screened as suspicious,

three to four times as many positives are detected as when information on names

is excluded, despite the fact that names, by themselves, are not particularly good

predictors.

Columns 5 and 6 add Variable Z, first without names included, and then in

the full model in the final column. When a wide net is cast for suspects, Variable

Z is not that helpful because it takes on a large value for very few bank customers.

The power of Variable Z, however, becomes clear as the number of suspicious

customers screened shrinks. If only the 250 most suspicious customers are

considered, including Islamic names in the model adds nothing; implying that

large values of Variable Z are equally predictive of terrorists whether Islamic

names are present or not. When casting a narrow net, the inclusion of Variable Z

quadruples the power of the model to identify terrorists.

A PROSPECTIVE TEST OF THE MODEL

In September 2009, based on the data set described above, we delivered to the

bank a list of 90 customers who had not previously been investigated on terrorist

charges, but who appeared to be at high risk for such activities. We assembled this

list based on the regression analysis described above, in combination with a more

subjective analysis of Variable Z.21

At the time the list was compiled, the data we

had received from the bank stopped in February 2009. Thus, the period from

February 2009 to May 2010 provide a prospective, out-of-sample test of the

model’s predictions.

Over the 16 month period of the test, the fraction of the bank’s customers

who became “positives” (i.e. became suspected of terrorism by the authorities)

was 0.00055 percent; roughly one in 180,000 bank customers became a positive

over this period. For those on our watch list, however, 2.22 percent (2 of 90 were

arrested) became positives. The individuals on the watch list became terrorism

suspects at a rate that was 4,000 times greater than that of the general banking

population.

While two successful predictions out of 90 may not sound particularly

impressive, it is nonetheless a difficult feat to accomplish.22

The likelihood that a

randomly drawn subset of ninety bank customers would include two new

positives in this time frame is less than one in 8 million. The odds against even

21

It is impossible to describe in precise detail how we carried out this subjective analysis without

revealing the nature of Variable Z. 22

Indeed, based on our model, we had roughly estimated that the expected number of transitions

to positive in this time window for those on our watch list was approximately one.

17

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 20: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

one positive out of ninety, based on a random draw, are 2,025 to 1. To put our

results further into perspective, TSA recently released the results of its Screening

of Passengers by Observation Techniques (SPOT) program, an anti-terror

initiative carried out in U.S. airports over the period May 29th

2004 to August 31st,

2008 in which TSA agents trained as Behavioral Detection Officers identified

suspicious looking travelers based upon behavioral and appearance indicators.

Over that time period, more than 150,000 travelers were singled out for further

investigation by the screeners. These 150,000+ investigations failed to yield a

single terrorism-related arrest.23

Upon the discovery that our watch list had succeeded in predicting arrests

out of sample, the bank forwarded the watch list to the relevant anti-terror

agencies.

SECTION IV: CONCLUSION

Combining a variety of data sources, most notably account information from a

large British bank, this paper analyzes the correlates of terrorist involvement, and

explores the extent to which incorporating this type of data might be of use in the

fight against terrorism. A number of demographic factors, especially having an

Islamic first and last name, are strong predictors of being arrested for terrorist

activities. Additionally, a number of behaviors, most notably our Variable Z,

which we do not fully reveal because of its explanatory power, correlate with

terrorism. The efficacy of our approach was subsequently verified in a

prospective, out-of-sample test.

The analysis in this paper provides further evidence that the tools of

forensic economics can be applied to pressing social issues, not simply to

trivialities like sumo wrestling (Duggan and Levitt, 2002) or figure skating

(Zitzewitz, 2006). More broadly, this paper provides an example of the unique

possibilities that arise out of academic-business collaborations (see Levitt and

List, 2009). The human capital required to carry out analyses such as those in this

paper are scarce outside of academics; businesses are the repository of data of a

scale and scope far beyond what academics typically have available.

23

U.S. Government Accountability Office. May 3 2010. Efforts to Validate TSA’s Passenger

Screening Behavior Detection Program Underway, but Opportunities Exist to Strengthen

Validation and Address Operational Challenges. Publication No. GAO-03-631. available from

http://www.gao.gov/new.items/d10763.pdf

18

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 21: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

DATA APPENDIX

DEMOGRAPHIC VARIABLES:

Positive: Indicator variable of whether the bank customer was a suspected

terrorist. The customer was either identified from publicly available data on

terrorism-related arrests from salaam.co.uk or from a terrorism-related inquiry by

law enforcement about a customer’s account.

Muslim name status: Islamic first and last names taken from the Contemporary

African Database to include 9,014 common names from Egypt, Tunisia, Morocco,

Libya, Algeria and several other North African or heavily-Islamic countries.

Proximity to mosques: Whether the postal code of a customer’s address is located

within 1 mile of a mosque.

Variable Z: A variable associated with a particular pattern of banking behavior

which dramatically improves our ability to identify terrorists. Because of its

predictive power, we have been asked not to make the nature of the variable

known.

Gender: Gender of the primary account holder.

Age variables: To protect confidentiality and ensure anonymity, data on the

customers’ age were only made available over ranges: Under 16, 16 to 25, 36 to

45 and over 45.

Residential Status: This variable describes the living situation of primary account

holder. Includes owner (with and without a mortgage), tenant (private or

government-sponsored), with parents, or other.

Employment status: Broadly-defined category of employment: Employed,

Unemployed, Retired, Full-time student, Housewife, Self-employed, and

Unknown. Although we also have limited data on job types (further broken down

into categories such as Manager, Clerical, Laborer, etc.), our sample of positives

is not large enough to make use of the data.

ACCOUNT VARIABLES:

Types of financial products: List of what other account types are associated with

the account. Business – whether the customer account is a business account.

19

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 22: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Cards – whether there are associated debit or credit cards. Mortgages – whether

the customer has a mortgage outstanding. Loans – whether the customer has

outstanding loans or other borrowing products (excluding mortgages). Longterm –

includes insurance, any kind of investment, retirement or pension product.

Protections – income protection or other life insurance products (excluding home

or building insurance). Insurance – home and building insurance. Savings –

includes Savings Products, Term Deposits, Stepped Bonds and Flexible Savings

accounts. Extras – Includes miscellaneous financial products (i.e., Marketlink,

Practice Call, Business Cheque products not included in the Business category).

ATM USAGE PATTERNS:

Frequency of withdrawals during Muslim prayer times: The percentage of a

customer’s ATM withdrawals that occurred during Friday prayer hours (10am-

11pm, 12-1pm, 1-3pm).

Fraction of nighttime withdrawals at ATMs: The percentage of a customer’s

ATM withdrawals that occurred at night (from 8pm until 6am).

Average ATM withdrawal amount: Average amount withdrawn per ATM

transaction.

REFERENCES

Abadie, Alberto, and Javier Gardeazabal (2008). “Terrorism and the World

Economy,” European Economic Review, vol. 52(1), pp. 1-27.

Becker, Gary S., and Yona Rubinstein (2004). “Fear and the Response to

Terrorism: an Economic Analysis,” Working paper, University of Chicago

Blalock, G., Kadiyali, V., Simon, D.H. (2009). “Driving Fatalities After 9/11: a

Hidden Cost of Terrorism,” Applied Economics, vol. 41 (14), pp. 1717-

1729.

Blomberg, S.B., Hess, G., Orphanides, A. (2004). “The Macroeconomic

Consequences of Terrorism,” Journal of Monetary Economics, vol. 51(5),

pp. 1007–1032.

Bolton, Richard, and David Hand (2002). “Statistical Fraud Detection: A

Review,” Statistical Science, vol. 17(3), pp.235-249.

DellaVigna, Stefano, and Eliana La Ferrara (2007). “Detecting Illegal Arms

Trade,” mimeo, U.C. Berkeley.

Di Tella, Rafael, and Ernesto Schargrodsky (2003). “The Role of Wages and

Auditing During a Crackdown on Corruption in the City of Buenos Aires,”

Journal of Law and Economics, vol. 46(1), pp. 269–92.

20

The B.E. Journal of Economic Analysis & Policy, Vol. 12 [2012], Iss. 3 (Advances), Art. 3

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM

Page 23: The B.E. Journal of Economic Analysis & Policy · 2019. 4. 8. · Identifying Terrorists using Banking Data Steven D. Levitt University of Chicago Department of Economics, slevitt@midway.uchicago.edu

Duggan, Mark, and Steven Levitt (2002). “Winning Isn't Everything: Corruption

in Sumo Wrestling,” American Economic Review, vol. 92(5), pp. 1594-

1605.

Eckstein, Zvi, and Daniel Tsiddon (2004). “Macroeconomic Consequences of

Terror: Theory and the Case of Israel," Journal of Monetary Economics,

vol. 51(5), pp. 971-1002.

Fisman, Raymond (2001). “Estimating the Value of Political Connections,”

American Economic Review, vol. 91(4), pp. 1095-1102.

Fisman, Raymond, and Shang‐Jin Wei (2004). “Tax Rates and Tax Evasion:

Evidence from ‘Missing Imports’ in China,” Journal of Political

Economy, vol. 112(2), pp. 471–500.

Jacob, Brian, and Steven Levitt (2003). “Rotten Apples: An Investigation of the

Prevalence and Predictors of Teacher Cheating," Quarterly Journal of

Economics, vol. 118(3), pp. 843-877.

Krueger, Alan (2007). What Makes a Terrorist : Economics and the Roots of

Terrorism. Princeton: Princeton University Press.

Krueger, Alan, and Alexandre Mas (2004). “Strikes, Scabs, and Tread Separation:

Labor Strife and the Production of Defective Bridgestone/Firestone Tires,”

Journal of Political Economy, vol. 112(2), pp. 253-289.

Levitt, Steven, and Stephen Dubner (2009). Super Freakonomics: Global

Cooling, Patriotic Prostitutes, and Why Suicide Bombers Should Buy Life

Insurance. New York: William Morrow.

Levitt, Steven, and John List (2009). “Field Experiments in Economics: The Past,

the Present, and the Future,” European Economic Review, vol. 53(1), pp.

1‐18.

Olken, Benjamin (2007). “Monitoring Corruption: Evidence From a Field

Experiment in Indonesia,” Journal of Political Economy, vol. 115(2), pp.

200‐249.

Olken, Benjamin, and Patrick Barron (2009). “The Simple Economics of

Extortion: Evidence from Trucking in Aceh,” Journal of Political

Economy, vol. 117(3), pp. 417-452.

Pape, Robert (2005). Dying to Win: The Strategic Logic of Suicide Terrorism.

New York: Random House.

Zitzewitz, Eric (2006). “Nationalism in Winter Sports Judging and its Lessons for

Organizational Decision Making,” Journal of Economics and

Management Strategy, vol. 15(1), pp.67-99.

Zitzewitz, Eric (2012). “Forensic Economics,” Journal of Economic Literature

vol. 50(3), pp.731-69.

Zussman, Asaf, and Noam Zussman (2006). “Assassinations: Evaluating the

Effectiveness of an Israeli Counterterrorism Policy Using Stock Market Data,”

Journal of Economic Perspectives, vol. 20(2), pp. 193–206.

21

Levitt: Terrorist banking data

Published by De Gruyter, 2012

Brought to you by | Georgia Institute of TechnologyAuthenticated

Download Date | 11/24/14 10:12 PM