in data we trust? the big data capabilities of the ...km541mf6208... · data needs. it explores the...
TRANSCRIPT
In Data We Trust? The Big Data Capabilities of the National Counterterrorism Center
A THESIS
SUBMITTED TO THE
INTERSCHOOL HONORS PROGRAM IN INTERNATIONAL
SECURITY STUDIES
Center for International Security and Cooperation
Freeman Spogli Institute for International Studies
STANFORD UNIVERSITY
By
Benjamin Mittelberger
May 2016
Adviser:
Dr. Martha Crenshaw
ii
Abstract
The National Counterterrorism Center (NCTC) sits as the central nexus for counterterror intelligence analysis and strategy for the entire American intelligence community (IC). It exists to offset the fragmentation issues that came to light in the aftermath of the 9/11 attacks. Its role is to integrate all types of intelligence information using traditional and computational Big Data techniques. Unfortunately, the NCTC is not in a position to take full advantage of contemporary Big Data analytics, severely reducing its analytical capabilities in the current information environment. These weaknesses stem from its organizational structure, not its technological capabilities.
The NCTC’s deficiencies manifest themselves both its internal structure as well as its position within the community as a whole. Internally, the NCTC does not have the information-‐sharing environment that it should and provides poor incentives for collaboration and innovation. Externally, it has a poorly defined role within the community and lacks the power that is required in order to truly centralize big data analytics. The private sector is leaps and bounds ahead of the government when it comes to the implementation of Big Data analytics. Internet giants such as Google and Facebook may serve as models for future improvements to the NCTC. With full understanding that certain limitations apply to government agencies, this thesis provides recommendations on how the NCTC may improve its ability to take advantage of large-‐scale data analytics and provide a better service to the rest of the Intelligence Community.
iii
Acknowledgements
This thesis would have been impossible without the support from my adviser, Professor Martha Crenshaw. I started the process completely clueless, and she guided me through it expertly. She began nudging me in the right direction six months before I even started writing. She has taught me so much, most notably how much I have left to learn. Without her direction, I would not have been able to formulate a reasonable question, let alone answer it. I would also like to thank Professor Coit Blacker. His pointed questions often forced me to defend and sometimes reconsider my positions. They were not always convenient to struggle through but eventually helped me produce better work. It was nice to know he was always there to keep me honest. I also want to thank Shiri Krebs and Shelby Speer for the support that they gave me throughout the process. To the rest of my CISAC cohort, I’m not sure that I could have finished this without all of you. From groggily meeting each other at six a.m. for a day at Gettysburg, to commiserating over tacos about the state of our theses, your presences have kept me grounded in this process. I won’t soon forget the extent of our late night working sessions and long conversations. I’ve made some amazing friends that I hope to keep for a long time, regardless of where we all end up. To my friends outside CISAC, thank you for keeping me sane as I worked through this project. I was not always the most fun to be around, but I appreciate that you kept me around anyway. Finally, to my parents, I’m not sure I can really thank you in a paragraph in the acknowledgements section of my thesis. Just know that everything that I’ve been able to accomplish stems from the opportunities and guidance that you gave me.
iv
Table of Contents
Abstract ...................................................................................................................................... ii Acknowledgements ............................................................................................................... iii Introduction .............................................................................................................................. 1 The Evolution of the Modern Counterterrorism Information Environment ...... 5 Defining “Intelligence”: Evidence and the Knowledge Production Process ............. 14 Modern Counterterrorism Intelligence Paradigms ............................................................. 22
A Novel Intelligence Toolbox: Computational Analytics in Practice .................... 28 The growth of “Big Data” ................................................................................................................ 29 Computational Methods in Counterterrorism ...................................................................... 31 Technical Requirements for Big Data success ....................................................................... 36 Organizational Requirements for Big Data Success ............................................................ 39 Conclusion ............................................................................................................................................. 51
Organizational Successes and Failures of the NCTC .................................................. 53 Successful components of the National Counterterrorism Center’s Structure ....... 54 Concrete Points of Failure in the Counterterror Community ......................................... 58 Implicit Organizational Deficiencies Within the NCTC ...................................................... 66 External Organizational Deficiencies of the NCTC .............................................................. 74 Conclusion ............................................................................................................................................. 78
Looking to the Future: Innovative Models for the NCTC .......................................... 80 Modern Big Data Tools .................................................................................................................... 81 Successful Big Data Structures ..................................................................................................... 84 The Comparability of the NCTC ................................................................................................... 95 Conclusion ............................................................................................................................................. 99
Conclusion: ........................................................................................................................... 102 Recommendations for the National Counterterrorism Center .................................... 107 Lessons Beyond the National Counterterrorism Center ................................................. 112
Works Cited .......................................................................................................................... 114
1
Introduction
Visiting the National Counterterrorism Center (NCTC) can seem like a trip to a
blockbuster movie set. The high levels of security expected of a clandestine intelligence agency
are all there: the armed guards, the metal detectors, even the radiation sealing boxes for all
electronics. Suited analysts and uniformed armed forces personnel fill the hallways.
In the bowels of the building sits a cavernous central command center with rows of desks,
each equipped with a set of triple monitors, all facing an array of wall-mounted screens. These
desks are manned 24 hours per day by analysts who monitor the intelligence streaming in from
the entirety of the counterterrorism intelligence community. In theory, these analysts receive the
most time critical and relevant information in counterterror intelligence.
The rest of the building also pulses with intelligence activity. Agencies from all over the
intelligence community are represented, working together in order to centralize American
counterterror efforts in the Global War on Terror. Everyone is trying to identify and prevent the
next inevitable terror plot.
The futuristic command space is a microcosm of the NCTC’s role in the counterterror
intelligence community. It is sleek, new, and places an emphasis on the use of modern
technology to perform intelligence work. It also represents one of the most modern
organizational developments in intelligence: a move to centralize counterterror information in
the Office of the Director of National Intelligence (ODNI). The creation of the NCTC is a union
of two ideas: combining centralized organizational structures with modern computational
technology can offset intelligence failures.
2
It is apparent that technology alone has been and will be insufficient to successfully
combat terrorism. Originally, I thought the focus of this thesis should be on the specific
computational methods the NCTC should employ or the type of data and algorithms it might use.
As I began this project, I (an idealistic computer scientist) thought I could write my own
analytical tools in order to prove that these methods were feasible. Surely the Islamic State’s use
of social media, which often foreshadowed its intent could be mined and analyzed using “Big
Data” techniques. Methods of analyzing datasets of this size have been around for a decade. Why
is the NCTC unable to tackle this threat? All that I thought was needed was someone competent
to unearth the required information.
But, after weeks of fruitless search through millions of social media posts mentioning
ISIS, I was stumped. I did not even know what I was looking for. Even if I did, I most likely
would have been unable to find it. I did not have the necessary tools to take advantage of the Big
Data that I had collected; I needed more robust hardware and software. However, the technology
was only one piece of the puzzle. I realized I needed topic experts to give me background on the
situation. I needed foreign language speakers to translate the dozens of languages I was
encountering. I needed tips on suspicious actors. I needed access to law enforcement to get
warrants for more information. I needed a dozen more resources. As it turns out, the process of
generating counterterror intelligence is extremely difficult and requires an effort from a diverse
set of skills.
The NCTC was created to provide these required resources. It is the organization that
theoretically combines all-source counterterror information in the Office of the Director of
National Intelligence (ODNI). Given the tools and analysts it has at its disposal, the NCTC is in a
better position to solve Big Data problems. However, it is not clear that it is able to conclusively
3
solve these analytic puzzles. Counterterror intelligence failures have plagued its record and
continue to do so. Major blunders, leading to harsh investigations, have created some doubt
about its effectiveness. Additionally, it has been the subject of many other critical reports, further
exposing flaws in its structure and processes. Nevertheless, even given these assessments, it is
still not obvious how to fix these problems.
There is no doubt that hardworking analysts are producing quality work at the center.
Furthermore, there is no question that it has considerable Big Data analytic means. But its
concrete failures and implicit faults raise questions about the efficacy of its Big Data efforts. As
computational analytics are becoming increasingly central in a data-rich world, the NCTC must
be prepared to lead the charge to transform counterterror analytics. However, it must first be able
to effectively leverage the technology that it has acquired. Given the doubts about the NCTC’s
performance, this thesis seeks to answer the question: Does the NCTC possess the requisite
structures and processes to successfully conduct intelligence operations in a Big Data
environment?
To address this question, this thesis is split up into five chapters. The first chapter
addresses counterterror intelligence in a modern context. Information environments have become
increasingly complex, and traditional analytic methods are unable to keep up. The second
chapter addresses the computational methods that have been developed in response to shifting
data needs. It explores the concept of Big Data and the organizational structures required to
leverage it effectively.
Chapter Three applies the ideas of modern Big Data analytics to the National
Counterterrorism Center’s capabilities. It assesses the performance of the center and examines
the ways in which it does not fulfill the organizational requirements of computational analytics.
4
It concludes that the NCTC has significantly improved the analytical capability of the
intelligence community, but it still has many structural and procedural flaws that must be
addressed.
Chapter Four looks to the private sector for inspiration on productive Big Data structures.
Through an analysis of the Internet Giants, Google and Facebook, alongside an analysis of
smaller analytic startups, a path forward may become clearer for the NCTC. The comparability
of these private sector organizations to the NCTC is also discussed.
The thesis concludes with a summary of the findings and a series of recommendations for
the NCTC. The recommendations include a call for the reorientation of the center’s goals, an
expanded approach to innovation, and an increased focus on employee quality. These
recommendations are given with an understanding that forward progress is difficult, but also
with the hope that some steps can be taken to better position the National Counterterrorism
Center in the complex world of Big Data analytics.
5
Chapter 1
The Evolution of the Modern Counterterrorism Information Environment
“There are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know. And if one looks throughout the history of our country and other free
countries, it is the latter category that tend to be the difficult ones.” -Donald Rumsfeld, 20021
The counterterror intelligence community has undergone major changes in recent
decades. Once relegated to the obscure, understaffed, and underfunded departments in various
scattered agencies across the American intelligence community, it is now one of the main
focuses of American security. Significant resources and effort have gone into improving its
performance and reach. Understanding the motivations for this transformation is necessary to
explore the community’s goals, methods, strengths, and weaknesses.
This chapter first investigates the developmental history of the community from a
structural perspective. The community has suffered from cultural, ethnographic, and
organizational issues for decades. Internal and external fragmentation have plagued its data
collection and analysis abilities, creating a system that was effectively designed to miss modern
informational trends. The community has gone through a tortured reform process, going through
major upheaval in the aftermath of the 9/11 attacks, culminating with the creation of the National
Counterterrorism Center (NCTC).
1 Rumsfeld, Donald. “DoD News Briefing - Secretary Rumsfeld and Gen. Myers.” February 12, 2002. http://archive.defense.gov/Transcripts/Transcript.aspx?TranscriptID=2636.
6
The adjustments made to the structure and methods of the intelligence community have
come in response to a shifting informational environment. This is one that is far more
decentralized, unstructured, and larger in scale than traditional methods were capable of
monitoring and interpreting. The explosion in computational technology has driven fundamental
changes in the ways that intelligence is collected and analyzed. The new security environment,
combined with the development of modern tools, has transformed the fundamental goals and
models of counterterror intelligence.
The Organizational Evolution of Counterterror Intelligence
A Brief History of Intelligence
American conceptions of intelligence only came to mature in the decades following
World War II. The CIA was created in 1947 to serve as a central nexus of intelligence for the
intelligence community (IC) as a whole. Other agencies, such as the NSA in 1952, joined the
fray as the Cold War escalated. In all, seventeen agencies work in conjunction to bring together
intelligence for policymakers in the White House, Capitol, and Pentagon.
Each agency has a specific role: to collect a certain type of intelligence, analyze it and
provide conclusions based on their collected data. These agencies have evolved in isolation,
building up extensive areas of expertise in their own specific fields. For example, the National
Geospatial Intelligence Agency (NGA) deals exclusively with aerial imagery as a source of
intelligence. Using a variety of different camera systems and detection methods, they screen out
what they consider unimportant images and propagate images of interest up the chain of
command. They are not responsible for providing or analyzing any other sources of intelligence
information. This type of isolation creates self-contained agencies that have their own structure,
7
culture, and operating procedures.2 The intelligence produced by these compartmentalized
organizations is presented up the hierarchy where it must be combined.
The segmentation of the IC ran deep, even manifesting itself at the level of the individual.
The employees of an agency like the CIA were often hired, evaluated, and promoted directly
within the agency, rarely interacting professionally with anyone outside of their own cloistered
setting.3 The clandestine nature of their work further isolated them from the outside world, often
preventing them from speaking about their professional lives with almost anyone outside of their
“home turf.” What resulted were strong internal communities, with intense loyalty to one’s own
agency and little regard for the intelligence community as a whole.4 In fact, the environment
fostered negativity among many of the members of the community, where an “us versus them”
attitude dominated interagency discourse.
The resulting structure was one of competition instead of collaboration. Information
became an asset to be used for personal or institutional gain, not something to be shared with the
rest of the community. Strict processes for “protecting” information on a need-to-know basis
choked off flows of information between agencies. Only high-ranking members of the
community could see anything other than the distilled conclusions from a massive pool of
collected data. Instead of working together, agencies worked in parallel down to the individual
level, generating their own intelligence products.
For decades, the security environment that existed suited the segmented nature of these
agencies. The separate agencies of the IC had co-evolved with the actions of the Soviet Union
and therefore were structured in ways that matched the Soviet structure. This meant that each
2 Zegart, Amy. Spying Blind: The CIA, the FBI, and the Origins of 9/11. Princeton Press, 2007. 67 3 Zegart, Spying Blind , 68. 4 Nolan, Bridget. “Information Sharing And Collaboration in the United States Intelligence Community: An Ethnographic Study of the National Counterterrorism Center.” PhD Dissertation, University of Pennsylvania, 2013, 57.
8
agency could work independently and still provide sufficiently detailed intelligence on its target.
Additionally, the Cold War presented an adversary that was monolithic in nature. There was little
confusion about the location or the intent of the Soviet Union, meaning agile inter-agency
collaboration was rarely necessary. As a result, a combination of the separate intelligence
conclusions from each agency was enough to make coherent and very often correct decisions by
senior leadership.
The fall of the Soviet Union marked a shift in the operational requirements as the
monolithic threat collapsed. The result for the intelligence community was certain listlessness in
action. There simply was no imposing enemy force to motivate work by the community. It was
not immediately obvious what the new threat was, if it even existed. Clearly there had to be a
shift in strategy, and intelligence officials commissioned dozens of reports on the new
intelligence environment, searching for new improvements and directives for the modern
American intelligence apparatus.5 Zegart counted 514 recommendations in the reports relevant to
counterterror intelligence. The recommendations focused on issues ranging from intelligence
targets to organizational structure. Divisions ran deep within the community as officials grappled
with the findings of the new reports and how to implement their suggestions. In the end, few
recommendations were ever actually implemented. Only 10% of the recommendations were
successfully fulfilled, with another 11% being partially implemented.6 80% resulted in no action
at all.
Hints of new security threats started to appear in the decade preceding 9/11. The
destabilization of Eastern Europe and the Middle East began to produce weaker states and rogue
groups. In 1998 a middle-aged Saudi exile named Osama Bin Laden faxed an article from
5 Zegart, Spying Blind, 29. 6 Zegart, Spying Blind, 36.
9
Afghanistan to a London based Islamic newspaper, declaring a fatwa in the name of a “World
Islamic Front”.7 The fatwa is a scholarly interpretation of Islamic law that Bin Laden used in
order to call for the murder of any American, anywhere in the world. In the same article he
praised the mounting terror attacks against Americans, including the 1993 World Trade Center
bombing and the 1993 Somali firefights, which caused an American military pullout. He even
claimed that he would bring the fight to American soil in the near future. The intelligence
community noted his declarations, and a few analysts were assigned to track him. Al’Qaeda was
on the IC’s radar, and Clinton even authorized a plan to kill him.8
However, there was a general lack of concern about his ability to actually threaten the
United States beyond simple car bombings.9 There were no major shifts in intelligence focus, or
in the methods employed to conduct counterterrorism intelligence. It just did not seem that a man
relegated to one of the poorest and most remote regions on earth could be capable of anything
truly destructive.
The 9/11 Attacks and Their Implications
The attacks on September 11th caught the counterterror intelligence community
completely off-guard. They simply had no idea that this attack was coming. However, in another
strange sense, there was little surprise that an attack had occurred.10 The Al ‘Qaeda threat had
been present for years, and those assigned to track it knew something was going to happen.
Furthermore, there had been a decade of ignored recommendations and non-adaptation to a new
7 Zelikow, Phillip. “The 9/11 Commission Report: Final Report of the National Commission on Terrorist Attacks Upon the United States,” July 22, 2004. https://www.gpo.gov/fdsys/pkg/GPO-911REPORT/content-detail.html, 47 8 The New York Times. “Many Say U.S. Planned for Terror but Failed to Take Action.” The New York Times, December 30, 2001, sec. National. http://www.nytimes.com/2001/12/30/national/30TERR.html. 9 Zelikow, 9/11 Comission Report, 74-85. 10 Zegart, Spying Blind , 86.
10
intelligence environment. Senior leadership at the national intelligence level acknowledged the
importance of counterterror intelligence, but was either unable or unwilling to overcome massive
organizational inertia to make major changes and accommodate better intelligence practices.
Analysts in many of the isolated agencies reported a sense of dread and helplessness as they tried
desperately to conceptualize what they considered to be a looming threat.11
9/11 spurred an unprecedentedly large investigation on the intelligence failures leading
up to the attack. The federal government commissioned a massive report, and over three years it
combed through logs and timelines to determine what had gone so wrong that there had been no
warning for an attack of this scale.12 The resulting 600-page report explored the history of the
intelligence community and its investigations of Al ‘Qaeda. Social scientists also rushed to
uncover the causes for the failure of such a large and powerful system. The result was a growing
body of scholarship focusing on the deficiencies of the intelligence community and how to fix
them.13 They also discovered a number of missed opportunities to uncover the plans for 9/11
before it happened. No single agency was to blame; the failures were spread across the IC.
Unsurprisingly, the majority of the failures by the community can be attributed to poor
organizational management and cultural issues. The inter-agency nature of counterterror
intelligence simply was not constructed to deal with a well-funded clandestine group that
spanned multiple continents. The traditional setup that had dominated intelligence efforts broke
down targets by country or region and completely separated domestic and foreign intelligence
gathering. This resulted in both internal and inter-agency fragmentation. No single person or
11 Zelikow, 9/11 Comission Report, 259. 12 Zelikow, 9/11 Comission Report. 13 Silke, Andrew. “Research On Terrorism.” In Terrorism Informatics: Knowledge Management and Data Mining for Homeland Security. University of East London School of Law, 2008, 29
11
agency was definitively in charge of monitoring Al ‘Qaeda, providing disjointed and incomplete
monitoring.14
An illustrative example of internal fragmentation occurred in early 2000, as the CIA was
tracking suspected Qaeda operates travelling through Southeast Asia. The CIA clandestine
officers believed (correctly) that the suspects were planning something big. However, the way in
which they were tracking them was disorganized and improperly segmented. The CIA had
distinct sets of personnel that focused on each nation state, meaning that the operatives
conducting the surveillance depended on the location of the targets. As the Qaeda operatives
travelled from Malaysia to Thailand, the jurisdiction for their surveillance changed hands. The
Malaysian CIA outpost desperately sent messages to the Thailand-based operatives, but by the
time the messages were received, the suspects had disappeared.15 In a post-9/11 investigation, it
was found that two of these suspects continued on to California, where they planned the 9/11
attacks for 18 months. They had both managed to obtain US visas – despite their high-risk
designation by the CIA.
Inter-agency communication, or lack thereof, allowed for the Qaeda operatives to enter
the country and make arrangements without detection. The FBI, the agency in charge of
domestic counterterror intelligence, had poor procedures for procuring necessary information
from other sources, namely the CIA. In mid-2001, flight school employees tipped off the FBI
about a group of suspicious men who had requested Boeing 747 flight simulator lessons.16 The
FBI was even in possession of their names and addresses. However, because they did not
regularly communicate with the CIA about their operations, they did not know that the men that
they were following were known Al ‘Qaeda agents who had eluded surveillance and capture
14 Zegart, Spying Blind 103. 15 Zegart, Spying Blind, 100-104. 16 Zelikow, 9/11 Comission Report, 273
12
abroad. The pieces of the puzzle were all there: the CIA knew that these men were members of a
militant group dedicated to the slaughter of Americans and the FBI knew where they were and
what they were doing. Unfortunately, the fragmentation within and between the agencies
prevented the puzzle pieces from being put together.
Movements Toward Reform
The 9/11 attacks provided the necessary impetus to begin instituting substantive
organizational and procedural change in the intelligence community. The bureaucratic inertia
that had plagued the community for a decade only became surmountable after a tragedy of this
scale. Within two months, major legislation was drafted to improve US counterterror intelligence
efforts.17 The PATRIOT act of 2001 significantly increased both manpower and funding for
counterterror activities, including intelligence efforts. It created new laws and requirements for
border crossing and money transfers and set more relaxed limits on government surveillance. It
also made attempts to remedy some of the fragmentation problems that existed within the
counterterror intelligence community. Information sharing requirements were laid out,
incentivizing the dissemination of collected data.
Three years after the signing of the PATRIOT act, Congress drafted a more
comprehensive bill to fix the major weaknesses within the US intelligence community.18 The
National Security Intelligence Reform Act of 2004 introduced sweeping changes in the process
and structure of counterterrorism intelligence. Most notably, it commissioned the creation of a
National Counterterrorism Center to serve as the central nexus for American counterterror
17 United And Strengthening America By Providing Appropriate Tools Required To Intercept And Obstruct Terrorism (USA PATRIOT ACT) ACT OF 2001, 2001. http://www.gpo.gov/fdsys/pkg/PLAW-107publ56/pdf/PLAW-107publ56.pdf. 18 Intelligence Reform and Terrorism Prevention Act of 2004, 2004. http://www.nctc.gov/docs/pl108_458.pdf.
13
efforts. The stated mission of the center is: “To serve as the primary organization in the United
States Government for analyzing and integrating all intelligence possessed or acquired by the
United States Government pertaining to terrorism and counterterrorism.” It has the additional
responsibility of coordinating overall counterterror strategic operations for the United States by
“integrating all instruments of power, including diplomatic, financial, military, intelligence,
homeland security, and law enforcement.” However, this is in a purely managerial sense, as the
center is not responsible for any execution of counterterror operations.
The NCTC also integrates the staff of the 16 agencies that conduct counterterror
intelligence operations.19 It is close to 50% NCTC specific employees, including analysts,
managers and other staff. The other 50% are veterans of other parts of the IC that are in the
NCTC on loan for a few years at a time. This is meant to foster better inter-agency relationships
and give different personnel the opportunity to broaden the way in which they conceive of
counterterror intelligence operations. The hope is that these employees will no longer experience
cultural barriers between them and employees at other agencies, fostering better collaborative
efforts.
Overall, the NCTC is an agency that was created to offset the entrenched structural
deficiencies that have existed within the intelligence community for decades. Instead of tearing
down the existing structure and rebuilding an entirely new system, lawmakers created a central
nexus to which all relevant information must flow and where all long-term strategy decisions
must be made. It takes hard lessons – learned in the years picking up the pieces from 9/11 – and
turns them into policies that can be applied to the community as a whole. In essence, the NCTC
is a modern agency, meant to respond to the requirements of a modern intelligence environment.
19 Nolan, 4.
14
Defining “Intelligence”: Evidence and the Knowledge Production Process Defining intelligence in a rapidly shifting information environment is difficult. On a basic
level, intelligence in this context refers to knowledge that is integral to effective tactical and
strategic decision-making at many levels of government.20 Though there is significant debate
among scholars about the formal definition of intelligence, the CIA weighed in with this
relatively concise description: “Reduced to its simplest terms, intelligence is knowledge and
foreknowledge of the world around us—the prelude to decision and action by U.S.
policymakers.”21 However, it is important to note that intelligence is not simply information
about a specific problem that needs to be addressed. It is knowledge that is produced as the end
result of an arduous intelligence creation process. This knowledge produced is a pre-requisite to
understanding the operational environment in which policy decisions will be made.
Intelligence production requires three main steps: evidence collection, analysis, and
synthesis.22 Evidence collection refers to the creation of data points for analysts to work with,
through a variety of means. The result of evidence collection is a pool of unstructured and
unlinked snippets of data that require further refinement. The next step, analysis, refers to the
process of resolving data points into fundamental parts in order to begin building up a contextual
model of the data. This step is important both for determining what the data can reveal, as well as
what it cannot. Analysis is also essential to establish what data points are missing from the
previously collected evidence. Finally, synthesis is the process of constructing feasible 20 Waltz, Edward. Knowledge Management in the Intelligence Enterprise. Artech House, 2003. Pages 16-18. 21 Warner, Michael. “Wanted: A Definition of ‘Intelligence.’” Journal of the American Intelligence Professional 46, no. 3 (2002). https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol46no3/article02.html#rfn7. 22Lazaroff, Mark. “Anticipatory Models for Counter-Terrorism.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006, 52-53.
15
explanations or solutions from the fundamental components that were produced from analysis.
This step creates intelligence, as it provides knowledge of a situation and its context, not simply
factoids describing it. Without this intelligence it is often impossible to produce sound policy
decisions.
Evidence Collection
Data is the raw material from which intelligence is formed.23 In order to construct
comprehensive intelligence reports, agencies must first collect as much relevant data as possible.
Failure to collect enough meaningful data can render a final analysis impossible or incomplete.
American counterterrorism intelligence agencies rely on many different methods for data
collection. Perhaps the most traditional method still in use today is the process of human
intelligence (HUMINT). The CIA is the primary producer of human intelligence in a foreign
setting, while the FBI conducts the majority of human intelligence domestically.24 Gathering
information through HUMINT entails human espionage through the use of spies. Traditionally,
the CIA would commission intelligence officers: spies who infiltrate foreign soil and make
contact with foreign government officials in order to extract information. While traditional
intelligence officers no doubt still work for American agencies, modern counterterrorism
intelligence requires a much more diverse set of agents that span both domestic, foreign, state,
and non-state actors.
Human intelligence is often pre-structured by its source and provides data that is useful
for its specificity and context-aware origins. For example, an informant in a possibly radicalized
group could provide detailed and relevant information on the future actions of that group. These
23 Waltz, 19. 24 Margolis, Gabriel. “The Lack of HUMINT: A Recurring Intelligence Problem.” Global Security Studies 4, no. 2 (Spring 2013). http://globalsecuritystudies.com/Margolis%20Intelligence%20(ag%20edits).pdf , 45.
16
hints come from a person who is cognizant of the relevant factors that shape analytical
conclusions, and therefore the information requires less structuring by intelligence analysts. On
the other hand, human intelligence is difficult to implement and maintain, requiring extensive
planning, training and resources. It is also risky for the agents providing the information,
meaning the source can be cut off without warning. This makes HUMINT an exceedingly
valuable but often unreliable source of basic evidence for intelligence production.
Geospatial intelligence, commonly known as GEOINT is based upon collecting images
from various different optical, infrared, and radar sensors. The most used sources are high-
resolution satellite images and footage from unmanned aerial vehicles. Though it is primarily
used as tactical information for military operations, GEOINT is still useful for modern
counterterrorism intelligence purposes.25 It can provide nearly real-time tracking of uncovered
targets, giving analysts better indications of terrorist capabilities. It is also useful for determining
physical, not simply conceptual, relationships between different intelligence targets. The
downside of GEOINT is its subjective nature, requiring extensive inference from a viewer to
make any conclusions. The rooftop of a suspicious building or a group of vehicles moving in a
line can be helpful as supplementary pieces of evidence but are not necessarily noteworthy in a
contextual vacuum. Judgments based on imagery must therefore be made in conjunction with
conclusions from other sources of evidence.
Signals Intelligence (SIGINT) is the act of intercepting and decoding adversarial
communications, generally through electronic means. Hailing back to the days of World War II,
this mostly involved capturing radio waves sent between enemy military commanders.
Contemporary SIGINT now encompasses a significantly larger set of communications.26
25 Margolis 47. 26 Margolis 48.
17
Everything from satellite messages to electronic information being sent on the open Internet can
theoretically be captured and treated as SIGINT information. The technical prowess required to
implement successful SIGINT operations is significant, and the National Security Agency (NSA)
employs an estimated 35-40 thousand people to devise systems of interception for the world’s
communications.27 As intelligence targets shift away from traditional military targets to
accommodate counterterror efforts, signals intelligence probes must be more diverse and
involved than they have ever been before. Capabilities have evolved to the point where major
constitutional objections have been raised about the extent of American signals collection.28
The value of signals intelligence lies in its unfiltered nature. Parties that are unaware of
any eavesdropping are likely to communicate candidly, often providing significant insights into
their intentions. A classic example of SIGINT interception is the capture of Osama Bin Laden’s
satellite phone communications in the late 1990’s.29 Foundational knowledge about Al ‘Qaeda
was generated from the content of his conversations with other ranking operatives.
Unfortunately, when he discovered that his calls were being monitored, he immediately dropped
his use of satellite phones, closing off an invaluable source of information. This example exposes
a weakness in SIGINT operations: an adversary must be using an electronically interceptable
method of communication for it to be of any value.
The aforementioned types of information collection are all methods that somehow
procure data from sources that were previously inaccessible. However, not all evidence is
intentionally hidden from view. Often, valuable pieces of information are openly available
27 NSA. “NSA 60th Anniversary Book,” 2012. https://www.nsa.gov/about/cryptologic-heritage/historical-figures-publications/nsa-60th/, 3. 28 Greenwald, Glenn, Ewen MacAskill, and Laura Poitras. “Edward Snowden: The Whistleblower behind the NSA Surveillance Revelations.” The Guardian, June 11, 2013. http://www.theguardian.com/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance. 29 Margolis, 50.
18
through a variety of public sources. Using widely accessible information for intelligence
purposes is known as open source intelligence (OSINT).30 Through the compilation of data from
publications, databases, and other large unstructured datasets, counterterror analysts can create
detailed and comprehensive overviews of emergent targets and situations.
Potential open sources for intelligence data are plentiful. Analysts can read newspapers
and magazines from regions of interest, or access public information repositories that store
aggregate socio-economic statistics and records. They can even go as far as data-mining financial
transaction logs from the Internet, creating real time tools to monitor suspicious activity and
provide supplementary evidence to other investigations. Complex interactions between separate
sources may reveal insights that were not previously discernable. However, it is essential to note
that OSINT is not a replacement for other types of classified “procurement” methods of data
collection. Instead it provides a backdrop upon which more targeted information sources can be
placed.
The strength of open source data gathering, namely its breadth and contextual value, is
also one of its weaknesses.31 The open source world is an incoherent stream of data. If it is not
mined with skill and specificity, it can confound an analytical team, rendering them unable to
produce any meaningful conclusions from the information that they have collected. However, the
variety and breadth of collection must be large and diverse enough to ensure that enough relevant
information is being collected to adequately back analytical conclusions. Furthermore, the
structure of the data is often extremely heterogeneous. Intelligence departments must work
tirelessly to extract the information from whatever medium they are pursuing and make it
30 Uffe, Wiil. Counterterrorism and Open Source Intelligence. Springer, 2011, 1. 31 Uffe,Wiil, 3.
19
consumable by a larger audience base. This includes acts of translation, content tagging, and
extensive summarization.
Information Technology and Knowledge Production
The information revolution of recent decades has precipitated major changes in the ways
that intelligence collection and analysis are undertaken. More specifically, the combination of
two separate but related trends – increased globalization and technological requirements of many
activities – is shaping modern approaches to counterterror intelligence.32
First, agencies can significantly improve the rate at which they collect vast amounts of
data from sources that vary in context, content, language, and geographic locality. With the aid
of computational techniques, agencies can accrue information at dizzying rates. The extreme
breadth of the Internet reveals massive opportunities for open source and signals intelligence.
Retrieval of an effectively unlimited amount of open data from nearly any geographic region
becomes simple and easy. Online communications and public Internet posts also have become
ubiquitous, even among clandestine organizations, allowing for extremely comprehensive signals
interception. The constitutionality of these actions, while important and requiring further
consideration, is not the focus of this thesis.
The recent technological acceleration even applies to more traditional forms of
intelligence gathering, such as human intelligence. Take, for example, a 16-gigabyte flash drive
(as of now, considered to be fairly small), which can fit in the pocket of any informant. This
32 Choucri, Nazli, Stuart Madnick, and Michael Siegel. “Improving National and Homeland Security Through Context Knowledge Represenation and Reasoning Technologies.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006.
20
flash drive can store slightly fewer than 10 million pages of text, significantly more than can be
jammed into a briefcase.
The truly difficult task in modern knowledge production comes after the collection phase.
Open source and Signals intelligence have been established as extremely unstructured and
difficult to make sense of. However, the real challenge may come from integrating all of the
aforementioned intelligence sources into a single knowledge repository. Human intelligence may
not be easily integrated with geospatial, and may differ greatly in structure from collected signals
intelligence. And yet, it is imperative that they be integrated in some way so as to build a
complete picture of an operating environment. Without doing so, intelligence operations would
simply create siloed data stores that are not cross referable.
To integrate intelligence data after it is collected, its meaning must first be resolved, then
structured appropriately, then collated, and finally integrated into an already existing knowledge
store.33 Given the size of the data being collected in a modern intelligence context, it is
impossible to expect analysts to be able to perform all of these steps on each piece of
intelligence. The tasks must be done at least semi-autonomously, using computational tools to
infer content, proper structure, and its relevance to existing data.34 Semi-autonomous refers to the
action of human verification of correct behavior by computational algorithms.
There are major complications to writing these algorithms, and they must be at least
somewhat context-aware as they resolve the meaning of diverse documents.35 Take, for example,
the challenges of language differences. An American human intelligence informant may submit a
comprehensive report on suspicious activity that is occurring in “Brussels.” An intercepted text
message (SIGINT) between two suspected French terrorists might mention a possible future
33 Choucri, 145. 34 Choucri 141. 35 Choucri 141.
21
attack in “Bruxelles.” Police surveillance videos (GEOINT) from Belgian police could capture
suspicious activity occurring in front of a specific location in the city of “Brussel.” Taken
together, these pieces of information could be critical for foiling a possible terrorist plot in the
city. And yet, the key piece of information that links them all is distinct for each source. A
human analyst could see this with ease, as they understand the language barrier context, and
could infer that they are all referring to the same city. However, a computer is not this smart, and
must be specifically programmed to make these types of inferences on its own. Computational
methods already exist for scenarios exactly like this but making them function well in the general
case is an extremely difficult engineering challenge.
It is important to note that what requires integration in the previous example is not the
data itself. Video footage cannot be “integrated” with an intercepted text message. What is
instead being integrated is the context. This is the required step for intelligence data to become
intelligence knowledge. Only once the data has been collected, resolved into its constituent parts,
and integrated with real-world context can it be of any real use for making policy decisions.
A complex skillset is required in order to actually perform analysis on these massive and
heterogeneous datasets. Analysts must be capable of using computational techniques for
managing, structuring, and integrating complex data. This requires a culture of competence,
where analysts iteratively improve their abilities in order to complete increasingly complex tasks.
They must continually evaluate both themselves and their methods in order to stay ahead in an
intelligence environment where the amount of valuable data is skyrocketing while the proportion
of useful information to total collected is shrinking rapidly.
22
Modern Counterterrorism Intelligence Paradigms The Limits of Intelligence
There is a common expectation by the public that intelligence collection and synthesis
should result in a certain level of omniscience. Many assume that given the afforded resources
and manpower, counterterror agencies should not be surprised by anything. However, it is not
possible to siphon up every piece of data possible, build a one-to-one symbolic map of the
universe, then bring retribution to every terrorist and militant, and prevent all future terror
attacks. Intelligence production must instead manage the complications of the modern
information environment by creating intelligent models to improve situational awareness and
reduce the probability of successful attacks.
It is therefore imperative to temper expectations looking back as well as going forward.
There is a tendency to blame intelligence failures for many of the attacks that Americans have
experienced, from Pearl Harbor to 9/11.36 Even though it may seem obvious in retrospect that
many opportunities to interdict the plot were missed, this may simply be a case of hindsight bias,
also known as creeping determinism. Though intelligence reform in the 1990’s may have made it
much more likely that the plot was stopped, it is impossible to prove either way. Many convince
themselves in the aftermath that there were objectively enough clues to “connect the dots,” and
stop the worst terror attack in American history. To do so would be to assume the omniscience of
intelligence efforts, and misunderstand its true purpose.
Traditional Intelligence Paradigms
36 Lazaroff 59.
23
Intelligence is gathered in order to produce better contextual knowledge with which to
make policy decisions. Decision makers with knowledge of relevant factors and possible
consequences have a distinct advantage over those who operate in relative darkness. Their
strategic knowledge is broader and deeper, and their tactical tools are more refined. However, in
order to make hard choices, there have to be criteria upon which to judge the knowledge that is
generated during intelligence production. Models must be developed that can define the process
of transforming contextual knowledge into action.
Traditional intelligence paradigms developed during the Cold War era were based upon
the actions of Soviet counterintelligence and military divisions.37 These paradigms were formed
in response to observational data that had been collected over decades of experience with Soviet
grand strategy and tactics. The models focused on large-scale changes. Major events such as tank
division deployment, naval fleet movement, ICBM silo construction, rocket launches and trade
embargoes were used to make large-scale decisions. Once evidence of these events was found,
there was a defined process for moving forward with the investigation.
A classic example of the traditional intelligence model is the Cuban Missile Crisis of
1962. Geospatial intelligence from U-2 spy planes gave very clear indications of what the
intelligence community calls “signatures” of soviet long-range missile sites.38 First they noticed
the extremely wide roads, with massive circular turns designed to accommodate large missile
trucks. Next they noticed the classic triple fence perimeter that was instituted at every other
known Soviet long-range missile silo. From these signatures, they inferred that the Soviet Union
had constructed a missile-launching platform on the island of Cuba. They were right.
37 Zegart, Spying Blind, 69. 38 Sagan, Scott, and Kenneth Waltz. The Spread of Nuclear Weapons: An Enduring Debate. New York, NY: W. W. Norton, 1995, 62.
24
In this example, intelligence analysts received geospatial data to analyze. They did not
have to integrate this information significantly with other forms of intelligence. Instead, they
simply applied the data that they had received to an empirically proven model. This model was
based upon the past observation of known missiles sites. Extensive knowledge of other parts of
the Soviet military or political structure was not necessary to come to a correct conclusion. The
knowledge production process was very straightforward: source à data à model à conclusion.
The overarching model that guided these intelligence efforts against the Soviet Union is
known as a reductionist model.39 This model decomposes a problem (or adversary) into its
constituent parts, where solving each part solves the “problem”. In order to function a
reductionist model requires strict causal relationships through every part of the process, creating
a chain of cause-and-effect relationships that fully explain a conclusion. This model assumes that
the final synthesis of intelligence is equal to the sum of its parts. An advantage to a reductionist
model is the awareness that it gives to gaps in information; it is often obvious what pieces are
missing from the puzzle. On the other hand, the strict causal nature of reductionism requires a
pre-defined model where data can be plugged in to produce relevant conclusions.
The attacks of 9/11 were an indication that the reductionist model no longer worked as
well as it had in the past, and that a major re-imagining of intelligence was required in order to
tackle a threat as decentralized and unpredictable as terrorism. Both the methods used to collect
intelligence and the models to which they were applied needed to be rebuilt for the modern era.
39 Lazaroff, 55.
25
Modern Counterterror Intelligence Paradigms
Modern intelligence paradigms have begun to move away from static reductionist
models, and move toward dynamic emergent models.40 Emergent models eschew the assumption
that future outcomes are predictable simply by matching observable causes with their associated
effects. Reductionist models, in their aggressive decomposition of problems, isolate potential
causal properties from each other, eliminating the possibility for interaction. During the
combination phase, where the sub-problems are recombined back into the whole, the model no
longer matches reality. Because there is no room for interaction between the collected data
points, the overall conclusion can be interpreted incorrectly, or may be completely incoherent.
The shift from static to dynamic models is the result of a security environment that has
transformed from a complicated one to a complex one. Complicated systems can be modeled in
siloed manners, with many analysts working extremely deeply in one subject or another,
funneling their conclusions up a hierarchy to a central command. Complex systems have
properties that exhibit extensive co-evolution, where changes in one part of a suspected causal
chain can cause ripples through an entire security environment. For example, 20th century
intelligence analysts looking for possible expansions of communism into the third world did not
have to communicate extensively with analysts charged with tracking Soviet weapons
developments. These problems, while both connected to a single monolithic entity, were distinct.
On the other hand, modern counterterror analysts studying the border permeability between
Afghanistan and Pakistan must also monitor the weapons capabilities of militant groups in the
area, or at least be in constant contact with analysts who do. Changes in one area of interest can
cause major changes in the other.
40 Lazaroff 56.
26
The shift from complicated to complex security problems requires emergent intelligence
paradigms because often no pre-defined pattern exists to which analysts can apply their data
points. They must instead dynamically monitor the situation, and synthesize an emerging pattern
from the observation of the interacting elements of the environment. This marks a move away
from traditional conceptions of intelligence predicting the future, to a model where intelligence
efforts are made to anticipate what future patterns might look like, based on data collected today.
It is no longer as simple as saying that past patterns will take on the same form as future ones.
Instead, the anticipatory paradigm avoids producing models that attribute cause where it may not
exist. Lazaroff refers to this phenomenon as “pattern entrainment,” where analysts repeatedly
rely on a certain pattern, leading to unpleasant surprises down the road.41
Applying this model to the 9/11 attacks can give insight on how predictions were so
wrong about the capabilities of Osama Bin Laden and Al ‘Qaeda. Counterterror agencies were
extremely focused on previously detected patterns of terrorist plots, namely bombings and
shootings.42 They collected extensive evidence and attempted to map it to previously attempted
plots in the United States and abroad. They were not spending nearly enough time looking for
types of attack that diverged from the traditional methods. There is no guarantee that this type of
analysis will stop an attack, but it can increase the probability of doing so.
In practice, implementing a system that handles emergent intelligence models is difficult.
It requires constant dynamism and iteration in the realms of collection, analysis, and synthesis.
Furthermore, the complexity of interaction between intelligence data points requires extensive
integration efforts of both information and analysis. Collaboration between all types of data
collection specialists and analysts is therefore required to conduct successful counterterror
41 Lazaroff 57. 42 Zelikow, 9/11 Comission Report, 73.
27
intelligence operations. However, simply sharing data and conclusions is not enough. As
Lazaroff puts it: “the real value of collaboration is about sharing context (thinking), not data.”43
True collaboration reveals that intelligence conclusions are more than simply a sum of the data
points that support it.
Conclusion
Modern paradigms for counterterror intelligence modeling are no longer as focused on
pure prediction of the future. They exist more abstractly and dynamically, ready to adapt to a
dizzyingly large set of complex interactions of intelligence targets. Major strides have been taken
to develop methods and structures to support the new models of intelligence. Computational
advances in collection have allowed for larger datasets to be captured. Information technology
has bolstered capabilities in managing and structuring collected data. Though still in their
infancy, computer algorithms have also provided value in semi-autonomously analyzing and
synthesizing new knowledge out of structured information. These methods will be explored in
the following chapter.
The organizational changes that have plagued the intelligence community for decades are
obstacles to proper implementation of new intelligence generation methods. The traditional
structure cannot meet the technical requirements that the new environment needs. Similarly, the
collaborative elements of knowledge integration have been proven to be untenable in the multi-
agency setup. The National Counterterrorism Center seeks to remedy these issues, but given the
entrenched problems that continue to face the community, it faces quite a challenge
implementing a dynamic, modern paradigm of counterterror intelligence.
43 Lazaroff, 61
28
Chapter 2
A Novel Intelligence Toolbox: Computational Analytics in Practice
“There were five exabytes of information created between the dawn of civilization through 2003, but that information is now created every two days”
– Eric Schmidt, CEO of Google, 20101
The new information environment as encountered by contemporary intelligence analysts
has precipitated a new wave of computational tools. These tools are designed to accommodate
the massive amounts of data that is now collected on a daily basis. This chapter explains the
function and basic implementation of these relevant new technologies. Though they are
perpetually in development, computational methods already provide value in the intelligence
production process. Modern algorithms allow machines to ingest and understand both structured
and unstructured information sources. They allow for enhanced investigative breadth and depth
in the form of data management and comprehensive visualization tools. Finally, initial forays are
being made into predictive analytics, in which computers forecast changes in real-world models
that they construct from collected data.
The evolution of big data technology has significantly increased the feasibility of many
of these analytical techniques. The computational power that was out of reach for most
organizations ten years ago is becoming a reality today. Big data analytics are becoming
ubiquitous in both private sector and government agencies. There are especially major pushes in
1 Schmidt, Eric. Presented at the Techonomy Conference, 2010, http://techonomy.com/tag/eric-schmidt/.
29
the public sector to become more technologically advanced, and the barrier to entry is constantly
lowering.
However, as more organizations attempt to jump on the Big Data bandwagon, they may
be outpacing their ability to actually use it to its full potential. Relying on Big Data analysis is a
large departure from traditional forms of intelligence collection, counterterror or otherwise.
Tried-and true systems of communication, hierarchy, and even analytical culture may no longer
make sense in a data-oriented organization. While the first half of the chapter describes the
computational methods, the second half details the structural and organizational steps required of
taking full advantage of Big Data analytics. Its unique qualities often flip the classical
conceptions of management; meaning major departures from standard practice may be needed.
While the observations about organizational shifts are presented in a more abstract manner, they
are still relevant to multi-agency organizations such as the National Counterterrorism center and
will be applied directly to its structure in Chapter Three.
The growth of “Big Data”
As the information revolution accelerated through the 1990’s, organizations around the
world began to capture increasing amounts of data. The ever-growing speed of computation
coupled with plummeting data storage prices fueled an entirely new practice of data-collection.2
It seemed as if everything could be stored: financial transactions, medical records, historical data,
entertainment media, user profiles; effectively any task that was completed with a computer.
Nothing was too inconsequential to be captured for analysis, as it might be useful later.
2 Manyika, James, Michael Chui, and Brad Brown. “Big Data: The next Frontier for Innovation, Competition, and Productivity.” McKinsey Global Institute, June 2011, l2.
30
The datasets that many private and public organizations had captured quickly ballooned
in volume. By the late 1990’s, these datasets had already surpassed the abilities of human
comprehension in both size and scope. By the early 2000’s, some massive data repositories had
surpassed the ability of most commodity hardware to process. Enterprises began to develop tools
to deal with a phenomenon that was just beginning to have a name: “Big Data.” So big in fact,
that traditional methods of storage and analysis were simply unable to manage it. It instead
required complex custom data-centers. The term Big Data does not (and likely never will) have a
specific meaning. Due to constantly improving computational capabilities, there is no strictly
defined threshold on size and no precise measure of required information complexity. However,
there are three factors that are inherent to any Big Data problem: volume, velocity, and variety.3
Volume corresponds to the amount of data that must be stored. It is generally agreed
upon that if it can fit into a single rack of servers (around 100 terabytes as of 2016), then it most
likely cannot be classified as Big Data.4 The rest of the chapter will use this definition because it
means that the data can only be stored using technologically heavyweight solutions. For
example, the modern datacenters of the Internet giants are massive. As far back as 2014, Amazon
was reported to have over two million separate servers, and though Google remains evasive on
providing a hard number, it is rumored to have far more.5 Velocity refers to the pace at which
more data is collected, requiring constant upgrades in storage capacity and sophisticated methods
to configure new data warehouses. The amount of new information being created on the Internet
per day can get quite large. For instance, as of 2015, Facebook logs over 4.75 billion posts per
3 Mills, Steve, and Steve Lucas. “Demystifying Big Data: A Practical Guide To Transforming The Business of Government.” IBM, 2012. https://www-304.ibm.com/industries/publicsector/fileserve?contentid=239170, 2 4 Mills, 4. 5 Clark, Jack. “5 Numbers That Illustrate the Mind-Bending Size of Amazon’s Cloud.” Bloomberg Business, November 2014. http://www.bloomberg.com/news/2014-11-14/5-numbers-that-illustrate-the-mind-bending-size-of-amazon-s-cloud.html.
31
day.6 Each of these posts is logged with a specific user, time, content, and context. That is on top
of the 300 million photos, 4.5 billion “likes,” and 10 billion messages that Facebook also
processes daily. Finally, variety corresponds to the heterogeneous nature of data that is stored by
a typical data-oriented organization. On a basic level, there is the difference between tabular (i.e.
in rows and columns, such as in an excel document) and unstructured data. Unstructured data
could be anything: a news article, an image, or even a sound recording. The combination of these
three “V” factors makes Big Data extremely hard to store efficiently.
The three V’s of Big Data, while proving difficult to manage, can provide significant
advantages to organizations that can harness them.7 Large-scale storage and analytics can help
uncover patterns that are hidden in the data. Economic trends, unpredictable correlations, and
unforeseen interactions all come into focus when a larger snapshot of the information
environment is analyzed. In counterterrorism intelligence, analyzing massive datasets can
provide value that is found nowhere else.
Computational Methods in Counterterrorism
The first step in knowledge production is the collection and storage of information.
Counterterrorism intelligence can benefit from analyzing the vast amounts of data that comes in
through its various intelligence channels. Unfortunately, the volume, velocity, and variety of the
data make it impossible for analysts to structure by hand.8 This is an especially difficult task
given the real-time nature of counterterrorism. There is simply no way that an analyst or a group
6 Ho, Kevin. “41 Up-to-Date Facebook Facts and Stats,” April 2015. http://blog.wishpond.com/post/115675435109/40-up-to-date-facebook-facts-and-stats. 7 Wegener, Rasmus. “The Value of Big Data: How Analytics Differentiates Winners.” Bain & Company, 2013. http://www.bain.com/Images/BAIN%20BRIEFThevalueofBigData.pdf. 8 Mills, 3
32
of analysts can stay on top of a data pipeline that comprises even a tiny fraction of a modern
system’s data collection capacity. In order to process the flood of data coming in, computational
methods for tagging and structuring are required.
A large amount of raw data without basic structuring or information retrieval tagging is
effectively useless.9 Having a database filled exclusively with text files (articles, books,
communications, etc.) is only marginally better than having hundreds of reams of papers sitting
in boxes. All the information that one might require is technically present but is not easily
knowable. Finding any specific piece of information in an unstructured pool of data requires
searching through everything, hoping to find a specific keyword or sentence. Relating one
snippet of an article to another is an arduous process and has to be done using multiple passes
through millions (or possibly billions) of files. Performing high-quality analytics on data stored
in this manner means slow and incomplete information retrieval, leading to lower quality
conclusions.
Incoming information must therefore be processed in ways that determine its meaning,
relate it to other pieces of data in the database and then store in an easily searchable format.
There are various techniques that are used to accomplish this, and all of them involve adding
significant amounts of metadata to each incoming piece of evidence.10 Analysts working with
properly structured data can quickly find relevant pieces of evidence using a variety of searching
methods. They can also discover new data, as relevant information can be presented that the
analyst did not necessarily know existed.
9 Boschee, Elizabeth, and Natarajan Premkumar. “Automatic Extraction of Events from Open Source Text for Predictive Forecasting.” In Handbook of Computational Approaches to Counterterrorism, 1st ed. Springer Science, 2013, 51. 10 Schrodt, Philip, and David Brackle. “Automated Coding of Political Event Data.” In Handbook of Computational Approaches to Counterterrorism. Springer, 2013, 32.
33
However, Algorithms that can accomplish accurate tagging and information extraction
are exceedingly difficult to implement correctly. No group of engineers can think of every
possible way in which a certain idea may be expressed in natural language. There are too many
variables to consider and too many exceptions to rules. And yet, this impossible task is what they
are assigned to do. Many clever methods have been developed to build these meaning extractors.
The most popular approach is to use algorithms that teach themselves, or train, on how to
interpret vast amounts of information and extract meaning correctly.11 This process involves
running algorithms on a training dataset to learn the patterns and rules required for real-world
information extraction. The learned rules and patterns are then evaluated against a previously
unseen test dataset. Performance is gauged by measuring correct and incorrect extractions and
information tags. Generally, many iterations of training must occur for these tools to be deemed
adequate.12 In fact, knowledge extraction algorithms often are built with a combination of
engineering talent, and pure trial and error. Additionally, these algorithms are continually
evolving as data changes and new improvements are found. The resulting software products must
be powerful, highly mutable, and extremely fast.
Speed becomes a high-value objective because the information revolution has also given
rise to “real-time” data streaming. Real-time information is ingested from the outside world at
such rates that analysts can see extremely up-to-date versions of the data that exist.
Computational methods that provide rich information extraction, tagging, and at reasonably real-
time rates can give employees knowledge stores that would allow them to significantly
outperform those without it.13
11 Boschee, 54. 12 Sharkey, Brian. “Information Processing at Very High Speed Data Ingestion Rates.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006, 3. 13 Mills, 7.
34
However, even the most powerful information preprocessor cannot fully bridge the gap
between humans and important data. The methods used for large-scale data storage use formats
that are completely incomprehensible to humans. Analysts cannot sift through and organize data
that is indexed (sorted) in machine-optimized ways. They need special programs that are able to
traverse and present the massive information sets that are stored within databases. Classic
programs used for data analysis include R and Stata; however these programs begin to lose
effectiveness as data sizes get extremely large.14 More advanced and specialized programs are
used to deal with massive and constantly changing big data projects.
These programs represent the second step in the computational intelligence generation
process. They allow the current information in the database to be knowable as opposed to simply
being present. They offer the ability to sort and aggregate, opening up opportunities for data
transformation and summarization. Analysts can visualize and reshape the data, finding links that
the automatic preprocessors may have missed. They can also concretize the abstract patterns that
exist in the collected information store.
A classic example of this type of concretization is computational network analysis.15 In
an abstract sense, social networks are easy to conceive of, but difficult to visualize in their
entirety. Relationships between entities vary in type, strength, and direction, and as more entities
are added, complexity grows quickly.16 Formal analysis of these networks has existed for
decades, but often is limited in scope and dynamism due to older methods. Networks of
thousands of people were stored as matrices or even using pen and paper. Modern software
packages allow for these networks to be mapped and presented to the user in more complete and
14 Pavlo, Andrew. “A Comparison of Approaches to Large-Scale Data Analysis.” Paper presented at ACM SIGMOD International Conference on Management of data New York, NY, 2009. http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf, 2. 15 Subrahmanian, V.S. Handbook of Computational Approaches to Counterterrorism. Springer Science, 2013, 3 16 Everton, Sean. Disrupting Dark Networks. Cambridge, UK: Cambridge University Press, 2012, 3.
35
coherent ways. The networks are displayed in two and three-dimensional spaces. Links are easily
created, hidden, changed, and destroyed. Entities can have helpful metadata attached to them,
and various graphing algorithms can even produce structure in networks that seem chaotic. These
tools enable the mapping of previously incomprehensible networks, providing useful context on
various terrorist groups.17
Powerful software can even provide tools that find “hidden” links within networks, given
a set of other relevant factors.18 Similar to the way in which a company like Facebook might
present a list of “suggested friends,” a counterterror approach may present a list of possible
associates. This is necessary because data that allows for the modeling of terrorist social
networks is notoriously lacking in breadth and consistency, leading to incomplete representations
of reality. In the past, using existing social ties to determine possible missing links was
cumbersome and impossible to do at scale. New software that uses machine learning to train on
network patterns can perform prediction on thousands of links simultaneously.19 While not
perfectly accurate, these approaches can give clues to analysts on which directions to further
investigate.
Network analysis is only one example of an entire suite of computational tools that
analysts might have at their disposal. There are a number of ways in which software developers
have enabled the dissection, investigation, and visualization of data. These tools can help
analysts make better investigatory decisions.
There is room for more involvement from algorithmic analysis in the intelligence
community. There are movements that are pushing for a far more automated process for the
17 Everton, 12. 18 Fire, Michael, and Rami Puzis. “Link Prediction in Highly Fractional Data Sets.” In Handbook of Computational Approaches to Counterterrorism. Springer, 2013, 3. 19 Fire and Puzis, 292.
36
entire knowledge production process.20 These take the form of end-to-end prediction algorithms.
Currently, terrorist prediction methods are still in their infancy. The operational environment is
exceedingly complex, and the methods of collection and analysis have not yet reached a level
where definitive structure can be created. Nevertheless, some researchers are performing
experimental research in this field and are producing computational models that can allow for
automated event prediction.21 These models involve structuring text and events into data chunks
that can be processed quickly by computers. The algorithms then train on the processed events
and produce models that theoretically allow for future event prediction.
The effectiveness of experimental techniques notwithstanding, computational methods
provide the ability to accommodate the growing amounts of data that counterterror analysts face.
Properly applied, they can significantly improve the investigative capabilities of employees.
However, the technical systems that can store the required volume of data run the algorithms are
not exactly straightforward to set up.22
Technical Requirements for Big Data success
The foundation of any computational platform is the hardware that it runs on. Big Data
systems are assembled using a combination of many different hardware building blocks. The
most fundamental block of these systems is the server (computer). It provides the storage,
20 Mannes, Aaron. “Qualitative Analysis & Computational Techniques for the Counter-Terror Analyst.” In Handbook of Computational Approaches to Counterterrorism. Springer, 2013, 84. 21 Sliva, Amy. “SOMA: Stochastic Opponent Modeling Agents for Forecasting Violent Behavior.” In Handbook of Computational Approaches to Counterterrorism. Springer Science, 2013. 22 Mills, 8.
37
networking, and processing power that houses and parses the data for the system.23 In large-scale
data clusters, these servers can be multiple times more powerful than the average personal
computer. However, in the context of massive datasets, a single machine is a drop in the bucket.
As mentioned earlier, it takes millions of machines to provide the computational power required
of many large Internet companies. This method of utilizing multiple computers to execute tasks
in parallel is known as “distributed computing”.24
The creation of a modern data cluster from these fundamental building blocks is far from
an easy task. It requires herculean efforts from experts in many different fields. First, datacenters
use an enormous amount of power and produce huge amounts of heat. They require specialized
infrastructure that can handle the electricity requirements of both running computers and cooling
them. Furthermore, given the real-time nature of modern intelligence, there must be multiple
redundant systems to ensure that the datacenters do not fail for any reason. For example, most
datacenters have backup generators to ensure service even when the power goes out. Second,
these centers are far more than simply stacks of computers on shelves. They are highly complex
networks of machines, all interconnected in ways that are optimized for the reliability of
networks and the laws of physics. The science of building datacenters has been developed over
decades, and it becomes more complex every year.25
The market for computers evolves extremely quickly. According to Moore’s law,
compute power in new processors doubles every 18 months.26 This is great news for data
scientists, but a headache for datacenter architects. While data scientists reap the rewards of
23 Singh, Arjun. “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network.” Paper presented at ACM Sigcomm. London, UK, 2015. http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43837.pdf, 14. 24 Valacich, Joseph, and Christoph Schneider. “Managing the Information Systems Infrastructure.” In Information Systems Today: Managing in the Digital World, 2013, 129. 25 Singh, 3. 26 Valacich, 134.
38
faster processors, datacenter architects must deal with computer clusters that are perpetually in
the process of becoming obsolete. Clusters must therefore be built in modular ways that facilitate
constant replacement of machines to keep up with contemporary processing speeds.
Given these difficulties, maintaining a modern datacenter would be out of reach for all
but the richest and most tech-savvy organizations. Recently, however, large strides have been
made in the private sector to provide data center solutions for less capable enterprises (including
the government). The data-management industry has gotten so large that open-source software
has been created to manage clusters, making it much easier to both get started and manage
compute clusters. 27 This software is created as a collaborative effort in the community and is
completely free to use, even in a commercial context. While the technical capabilities required to
actually build a center remain exceedingly high, the access to these skills has increased
dramatically, putting big data analytics in the reach of many more organizations and significantly
reducing the barrier to entry to create big data infrastructure.
These services have become so comprehensive that some massive companies are moving
the entirety of their operations to cloud services provided by other enterprises. For example,
Netflix, which at times can account for thirty percent of American Internet traffic, delegates the
entirety of its hardware infrastructure to Amazon’s cloud services.28 Netflix’s delegation of
responsibility to Amazon for its cluster management allows it to focus on its core services as
opposed to putting significant resources into its own hardware infrastructure.
Government agencies have the same access to these types of private sector contractors,
meaning that even though they do not have comparable technical capabilities, they are still able
27 “Apache Hadoop,” n.d. http://hadoop.apache.org/. 28 Meyer, Robinson. “The Unbelievable Power of Amazon’s Cloud.” The Atlantic, April 23, 2015. http://www.theatlantic.com/technology/archive/2015/04/the-unbelievable-power-of-amazon-web-services/391281/.
39
to field large compute clusters for their needs. The question is no longer whether they can build a
datacenter effectively, but instead whether they are able to use it.
Organizational Requirements for Big Data Success
Simply possessing the requisite technical capabilities of running a functional big data
operation is not enough to generate valuable, data-driven insights. The fastest, most powerful,
and most intuitive tools are of no use if an organization is not set up in a way to take advantage
of them. The machines and algorithms on their own will not produce any meaningful
intelligence; knowledge is still only relevant in the context of people and the decisions that they
make. Big data analytics therefore cannot be integrated without a focus on the humans that it
affects at every level of the organization. There have to be structures and processes in place that
allow for the seamless use of big data with other activities. Without them, there may be
fragmented and inconsistent application of analytic capabilities, leading to mismatched goals and
conclusions.29
There are innumerable organizational factors that are relevant when considering the
overall effectiveness of an organization. The majority of these will not be addressed. However,
the organizational requirements of deploying specifically big data operations can be separated
into four main categories:
1. Commitment to big data analytics 2. Open information sharing environments 3. Feedback channels and iteration 4. Engineering talent and culture
29 Galbraith, Jay. “Organization Design Challenges Resulting From Big Data.” Journal of Organization Design 3, no. 1 (2014), 3.
40
Failure to sufficiently meet the requirements of these factors can significantly reduce the
effectiveness of big data infrastructure. Each step of the intelligence generation process relies on
specific organizational conditions to ensure that they are being completed correctly. Poor
organizational structure can result in decreased knowledge caliber and scope. In fact, the
expensive computational architecture can have a negative impact on the organization by sucking
away money and human resources on fruitless pursuits. Furthermore, none of these factors exist
in a vacuum and the implementation of one requires consideration of the others as well.
Organization-Wide Commitment to Big Data
The first step to creating a big data optimized structure is to understand and frame the
goals of the organization in the context of large-scale analytics.30 Implementing a massive scale
big data infrastructure is a huge undertaking, and the design process requires significant
forethought. Operational requirements must be strictly defined, and each module of software and
hardware must be built for a purpose. Building complex software products is notoriously
difficult, and having poorly defined capabilities can lead to massive headaches down the road. In
fact, it has been acknowledged for decades that the planning phase should actually take up the
largest proportion of time for software development, even over the actual implementation and
testing of the software.31
Lack of adequate planning has proved disastrous for government software projects in the
past. For example, the FBI’s “Trilogy” information technology modernization efforts went
catastrophically wrong throughout the entire process. In 2004, after years of development and
nearly half a billion dollars invested, the FBI’s “modernized” virtual case file (VCF) system was
30 Mills, 7 31 Brooks, Frederick. The Mythical Man Month. Addison-Wesley, 1974, 20.
41
still mostly non-functional and had almost no buy-in from analysts.32 The undertaking was such
a mess that the FBI requested the assistance of the National Research Council (NRC) to
determine the cause for the failure.33 After a thorough investigation, the NRC determined that
among the many faults, the major problem was a lack of understanding of the operational
requirements of the VCF system. When the project hit difficulties inevitable in any large project
and the system became more complicated, there was no adequate plan to keep the project moving
forward. As the project became more derailed, implementation became more haphazard until it
came apart at the seams.
Even when the software project does not completely fail, bad decisions made in the
planning and early implementation phases can have extremely negative consequences down the
road. Inconsistent design choices often cause mismatches between system components and
provide major barriers to the extensibility of the product. This is a well-explored concept in the
software industry and is known as “technical debt.”34 Opportunities to cut corners to save on
time or money incur debt that must be repaid later in the form of extra labor. Features that were
hastily implemented may need to be modified to connect properly with new components.
Complex documentation that was inadequately compiled has to be edited or even rewritten.
These debts often come with interest, meaning they can take longer to rectify than it would have
taken to simply do it correctly the first time. Sometimes architectures are so poorly conceived
that they cannot be modified and must be completely redesigned.
32 Knorr, Eric. “Anatomy of an IT Disaster: How the FBI Blew It.” InfoWorld, March 21, 2005. http://www.infoworld.com/article/2672020/application-development/anatomy-of-an-it-disaster--how-the-fbi-blew-it.html. 33 Lin, Herbert, and James McGroddy. “A Review of the FBI’s Trilogy Information Technology Modernization Program.” National Research Council, 2004. 34 McConnel, Steve. “Managing Technical Debt.” International Conference on Software Engineering, 2013. http://2013.icse-conferences.org/documents/publicity/MTD-WS-McConnell-slides.pdf.
42
Once the big data system is operational, it must still be integrated effectively into daily
activities. In order to do this, the organization as a whole must be committed to the integration of
computational analytics at every relevant level.35 Data-driven analytics must be seen as one of
the essential functions of the organization, equal to the many other essential functions, such as
intelligence collection or administrative support. However, there is a key difference between
these traditional functions and data science. Data amplifies the effectiveness of other functions of
the enterprise while being of significantly less utility in a vacuum.36 Without organization-wide
commitment, data-driven analytics can be sidelined in a bureaucracy whose inertia pulls its daily
workflows in directions that do not include data science.
There are many ways to accomplish this, but nearly all of them involve enhancing the
bureaucratic influence of data analytics. This extends all the way up to the executive level (c-
suite).37 Some organizational theorists posit that a truly committed big data organization must
have a Chief Digital Officer (CDO), or some managerial equivalent.38 Just as Chief Financial
Officers (CFO’s) are responsible for organization-wide finances and Chief Operating Officers
(COO’s) are responsible for organization-wide operations, CDO’s should oversee the digital
analytics at every level. This gives data science a representative at higher levels of management,
bolstering the integration process. Without a high-ranking advocate, data scientists may find
themselves unable to get the resources and credibility that they need to effect real change.
However, the shift in influence should not be relegated to the top levels of the
organization. Those who are making important decisions are often not the ones generating the
insights from big data. A function of committing to big data analytics is imparting additional
35 Galbraith, 3. 36 Galbraith, 12. 37 Grossman, Robert. “Organizational Models for Big Data and Analytics.” Journal of Organization Design 3, no. 1 (2014), 21. 38 Galbraith 4.
43
agency to data scientists to make their own analytical choices and letting them influence larger
organizational decisions. Just as data scientists need an executive championing their insights,
they also need their own expanded powers in order to be truly embedded in an organization.39
Integrating large-scale data analytics into well-established bureaucratic workflows is a
challenging and involved process. It requires comprehensive and precise planning, an overall
commitment to the use of computational analytics, and a delegation of power to lower-level
analysts. Without making these changes, the existing bureaucratic inertia can push data science
to the fringes of usefulness, all but guaranteeing that it remains a highly specialized and only
marginally useful tool.
Collaborative Information Environments
The modern information environment is changing rapidly, and new analytic processes
must be used to tackle it. As explored in Chapter One, the heterogeneity and size of modern data
means that any single piece may be relevant for many different analytical investigations. The
growth of open source and signals intelligence sources dictates that the complexity of analytical
workflows also grows.40 The interconnectedness of information and increasing globalization
mean that many analytical conclusions must also draw on a larger body of data in order to be
relevant. Furthermore, as mentioned in the previous section, when analytics becomes embedded
in traditionally non-technical departments, data scientists no longer work in a single “big data”
group, but instead with the teams in which they are integrated. This has the potential to isolate
data scientists from each other.
39 Grossman, 23. 40 Choucri, Nazli, Stuart Madnick, and Michael Siegel. “Improving National and Homeland Security Through Context Knowledge Represenation and Reasoning Technologies.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006, 155.
44
These broadening information sets and changing organizational structures must be offset
by increased collaboration between departments and agencies.41 Traditionally, teams produce the
work that they are assigned and then report it up the chain of command. However, instead of
simply reporting their conclusions vertically in the hierarchy, they should be sharing their
processes horizontally with other teams. This allows them to hone both their datasets and their
methods. The collaborative effects go both ways, and teams that help others also often find they
are performing at a higher level due to increased horizontal access.
The improved performance is brought about by what Grossman terms a “critical mass” of
data scientists.42 He defines critical mass as the required number of employees whose combined
skills encompass what is needed to derive insights. A large number of employees becomes
necessary because each individual data scientist and analyst might only possess a small subset of
the domain knowledge required to create a full picture from the provided set of data. In the
intelligence fields, these domains exist in two categories: analytical and technical. There must be
enough analysts and data scientists that are able to adequately manage big datasets (technical),
while at the same provide meaningful insights (analytical). These two categories have their own
domain spaces in which employees have even more specific expertise. An analytical problem’s
required skillset can span the specialties of analysts from multiple teams or even agencies, and
isolating employees from each other shrinks the available talent pool.
Avoiding isolation involves creating an information environment that is based around
openness and collaboration. To the extent that it is possible (private and proprietary data can
cause issues), data should be made available to all analysts and data scientists. Beyond simply
data, organizational incentives must exist for teams to collaborate with each other. Pitting teams
41 Sukumar, Sreenivas, and Regina Ferrell. “‘Big Data’ Collaboration: Exploring, Recording and Sharing Enterprise Knowledge.” Journal of Information Services and Use 33, no. 3 (July 2013), 259. 42 Grossman, 21.
45
against each other, or preferring one piece of analysis over another can produce competitive
effects that reduce collaboration. There should be systems to encourage camaraderie across
teams, even in an informal sense. In fact, a meta-analysis of organizational studies has found that
employees often perform better when they are constantly in contact with people outside of their
team due to constant exposure to new ideas.43
Beyond just making data available, concrete steps must be taken in order to produce a
structure that facilitates effective collaboration. The first focuses on the structure data itself. Each
team may collect and structure its data in a unique way, especially if the nature of the
information is distinct from that of other teams. Having knowledge of one’s own data structure
and content is what Sukumar calls “Domain Knowledge.”44 Lack of domain knowledge about a
specific data store can stupefy even the best analyst. It is therefore necessary to impose structural
requirements for each team to document the format of its own data and expose this
documentation to the rest of the organization. This is done in two ways. First, a team must
explain how its data is stored. What technology is used? What are the names of different
categories in the store? What are the sources for this data? Second, a team must explain why the
data is stored the way it is. Without this crucial information other teams may have issues
integrating it into their own analytic workflows and run into problems that have already been
solved by other teams.
Domain knowledge also extends past the syntactic representation of the data and into the
semantics of the data. The data semantics are the actual knowledge content of the data, not just
the way in which is formatted. What context does the data exist in? What conclusions have
already been found? What conclusions have not been found? Requirements for a general
43 Owen-Smith, Jason. “Workplace Design, Collaboration, and Discovery,” 2013. http://sites.nationalacademies.org/cs/groups/dbassesite/documents/webpage/dbasse_085437.pdf 44 Sukumar, 260.
46
semantic description of a dataset can be exceedingly helpful for outside teams. Otherwise, they
may have to re-do analytical work that has already been done. Requiring semantic
documentation can improve collaborative efforts and also help teams understand their own data.
There must be strict requirements about effective horizontal communication, data
sharing, and documentation. The system for collaboration and sharing should be clear, and
analysts should be encouraged to reach out horizontally across the organization. Left without
specific processes and structures that abet communication, analytic teams may flounder as they
find themselves up against problems that they either do not have the data or the expertise to
solve.
Feedback Loops and Iteration
Building software and analytical models is difficult. It is an attempt to map an
exceedingly complex real world onto a digital representation that can be used to find novel
insights. Even with clearly defined goals of operation, creating well functioning tools and
accurate models is an undertaking that cannot be done all at once and must be executed in many
iterative steps. The goals of a big data enterprise must take into account the gradual nature of
software development and define checkpoints along the way.45
In his seminal work, No Silver Bullet, Frederick Brooks (a Turing Award recipient and
former Director of Engineering at IBM)46 likens large projects to organic processes – they grow
and evolve over time and should be functional at nearly every stage of the process. As tools are
45 Mills, 28. 46 Frederick Brooks is considered one of the fathers of modern software development. His works on the organizational theory of software development earned him the Turing Award in 1999. The Turing Award is the most prestigious award in the field of computer science. Brooks, Frederick. “No Silver Bullet -- Essence And Acccident in Software Engineering.” Univeristy of North Carolina at Chapel Hill, 1986. Available at: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1663532.
47
developed, they must be evaluated, then augmented, and eventually expanded in scope. Attempts
to reach end goals in a single iteration will often lead to overly complex solutions that tend to
collapse under their own weight.
Analytical goals function in a similar way. Models start with smaller, more specific
focuses, which then broaden as the methods are honed and improved. However, big data
analytics solutions must be even more dynamic and capable of change than infrastructural
software. While traditional product development has to respond to market trends, analytical
focuses can change on a daily basis, with new types of data, techniques, and questions coming in
constantly.47 Any data analytics team that is unable to adapt quickly to changing information
environments will not perform well.
In a traditional hierarchical structure, achieving effective iteration becomes much more
difficult. When managers set tasks for lower level employees, they expect their goals to be
achieved as they conceived of them. Unfortunately, in an iterative environment it is impractical
to have a one-way channel of communication for task delegation.48 Managers do not have a
monopoly on knowledge of the models being deployed and therefore should query for feedback
from the lower level data scientists on their thoughts for next steps. Having a structure in which
analysts and data scientists simply execute tasks that are assigned to them from above silences a
major source of domain knowledge and hobbles the iterative process. Querying for feedback
from all levels of the organization about problems and possible new directions allows for more
effective iterations.
On the other hand, hierarchy does have its uses, and eventually information does have to
be funneled up to important decision makers at higher levels. Director level choices must be
47 “Fisher, Danyel. “Interactions With Big Data Analytics.” Interactions, June 2012, 52. http://dl.acm.org/citation.cfm?id=2168943 48 Galbraith, 7.
48
made after a careful combination of lower level analytical conclusions. Even with open
information sharing environments, most analytical teams are not able to see the entire picture put
together by all teams in the organization and are therefore unable to see how they are performing
in their tasks. Are their areas of focus relevant to the organization’s goals? Are their models
providing actionable insights? Are their conclusions promoting correct decisions? Data scientists
require constant feedback on their performance from above. Without it, the next step in the
iterative process is difficult to determine.
The analytical requirements of an organization are constantly evolving and must
therefore be approached with a structure that abets dynamic and additive changes. Two-way
vertical communication between different levels of the organizational hierarchy must be
implemented. Otherwise both managers and data scientists will not get the feedback that they
need in order to determine the next steps for their work.
Analytic/Engineering Talent and Culture
Regardless of the powerful technology that is in place and the expertly crafted
organizational processes that have been constructed, in the end, an organization is only as
effective as its employees. They will be the ones leveraging the technology to create actionable
insights. A good structure can help employees do their jobs better but in the end it cannot do
their jobs. However, structure can improve the quality of the work of analysts and data scientists
in the organization. Strategies that attract talent, leverage it correctly, and retain it long-term will
pay dividends, even if they are difficult and expensive to implement.49
49 Davenport, Thomas, and D.J. Patil. “Data Scientist: The Sexiest Job of the 21st Century.” Harvard Business Review, October 2012.
49
It seems obvious that better data scientists will produce higher quality work in a smaller
amount of time. Those who have superior training, more experience, and are generally more
intelligent can respond to evolving scenarios in more effective ways. They can see paths to
insights where others might have missed them and can reveal patterns in data that seemed
impossible to find. One factor that still separates data scientists from other types of analysts is
their ability to code. They must still be engineers and should be evaluated as such. However,
what is not obvious how much better one engineer can be over another.
Effectiveness is a difficult measurement to quantify. It is still a debated topic of study,
but current research suggests that so-called “great” engineers can be entire orders of magnitude
better than simply average ones on various metrics.50 Studies have found that the best
programmers can code nearly twenty times faster than average, debug twenty five times faster,
and write programs that execute ten times faster. They are also capable of building solutions that
average engineers are simply unable to conceive of.51 On the flipside, incompetent engineers can
actually provide negative progress in the form of technical debt due to poorly written code.
Unfortunately, simply adding more engineers to a team cannot be a replacement for a
lack of “great” engineers. While having more team-members can help speed up projects and
improve their quality (especially when attempting to reach a critical mass), simply adding
employees can have diminishing returns. In fact, in his famous book, The Mythical Man Month,
Brooks claims that adding more engineers to an already late project often can make it even later,
more disorganized, and of lower quality.52 The structural challenges that arise from this
phenomenon thus become: how can an organization attract and retain “great” employees?
50 McConnel, Steve. “Origins of 10X – How Valid Is the Underlying Research?,” n.d. http://www.construx.com/10x_Software_Development/Origins_of_10X_%E2%80%93_How_Valid_is_the_Underlying_Research_/. 51 Brooks, The Mythical Man Month, 32. 52 Brooks, The Mythical Man Month, 25.
50
Furthermore, how can it provide an environment in which these employees will realize their
potential?
First, there must be incentives for data scientists to join an organization in the first place.
In the still nascent world of big data analytics, experienced talent is hard to find, and companies
compete ruthlessly in the big data space.53 Basic rules of supply and demand drive salaries up
significantly, and most data scientists start well above six figures, even in entry-level positions.
With so many options, data scientists are not simply looking a job; they are looking for the
highest paid offers, doing the most meaningful and most interesting work. Furthermore, given
that they have a high degree of choice, they can afford to demand an influential role on the team
that they are joining. Will their suggestions be listened to? Will they be given opportunities to be
independent and take control of their tasks? No one likes being a second-class employee.
Once hired, engineers must be incentivized to work hard and excel at what they do.54
Clear paths of advancement in both pay and responsibility must exist. Without these paths,
engineers may not feel the desire to exceed the demands of their position, causing the quality of
their work to stagnate. At best, a lack of motivation can cause employees to produce sub-par
results. At worst, it can create incentives for employees to look for opportunities at other
organizations.
Retaining motivated engineers for longer periods of time also improves their overall
effectiveness. Not only do they gain domain experience in their specific field, they also gain
institutional knowledge: an understanding of the processes of an organization.55 This improves
the efficiency of their work as they have an easier time navigating the idiosyncrasies of their
respective team that may stymie a junior engineer.
53 Davenport, 3. 54 Eccles, Robert. “The Performance Measurement Manifesto.” Harvard Business Review, February 1991, 135 55 Sukumar, 262
51
Finally, as outlined in the section on organizational commitment to big data, data
scientists need the freedom to be entrepreneurial in their pursuits.56 Micromanagement and strict
delegation of tasks and workflows is detrimental both to the working environment and to the
overall effectiveness of the data scientists. Obviously there must be an overarching strategic
theme, but engineers should have the freedom to pursue their own leads in pursuit of that goal.
As they explore, they find novel solutions that may not have been discovered had they stuck to
the beaten path of established protocol.
Overall, engineers, and analytical employees in general, are resources that must be
recruited and retained through organizational decisions. Poor structural and procedural layouts
can provide disincentives for talent to join the organization, or stifle the talent that already works
there. Agencies that are attempting to answer some of the world’s most difficult questions,
working with some of the most obfuscated data sets, need quality analysts. Without them, they
will no doubt fail in their mission.
Conclusion
The growth of big data has fundamentally changed the way in which many organizations,
including counterterror agencies, approach analytics. Over time, the techniques used to structure
and analyze these massive datasets have been refined. Novel approaches to information tagging,
visualization, and even prediction have been developed. The technical requirements for
implementing these algorithms in practice remain high. Datacenters are still extremely difficult
and expensive to build, maintain, and upgrade. However, the evolving big data market has
56 Newport, C.L., and D.G. Elms. “Effective Engineers.” International Journal of Engineers 13, no. 5 (1997), 331
52
provided solutions that significantly lower the barrier to entry on big data analytics, shifting the
challenges from technical to organizational ones.
In an emerging field, the correct ways to integrate large-scale data analytics into
organizations, let alone multi-agency government ones, remain on the cutting edge of
organizational theory. Traditional views on organizational structure must be challenged in order
to fully leverage the power of big data. Strict hierarchies and top-heavy power structures need to
be eschewed in favor of more distributed systems of influence. Cross-team informational barriers
need to be brought down and incentives to share and collaborate need to be put in place.
Similarly, communication between different levels of the structural hierarchy should be
prioritized as necessary objectives. Finally, recruiting and retaining top engineering and
analytical talent should become a top goal for any organization hoping to derive meaning from
big data.
These organizational structures are much easier to describe in theory than to implement
in practice. Most organizations that require this level of structure have huge amounts of
organizational inertia that stifle changes, especially large ones as proposed in this chapter.
Viewing these organizational requirements through the lens of intelligence agencies makes the
changes even more difficult. The 9/11 Commission found that the Intelligence community’s
structural reform attempts floundered for a decade before 9/11. Though the National
Counterterrorism Center was established as a new organization in order to develop these
structures from scratch, it remains to be seen if it has been successful in its endeavor.
53
Chapter 3
Organizational Successes and Failures of the NCTC
New data environments and the information revolution have brought new operational
requirements for the Intelligence Community. The National Counterterrorism Center was
established as a way to create a structure that was better equipped to produce intelligence in this
modern, complex environment. However, NCTC’s existence does not necessarily signal the end
of the community’s information woes. Integrating data and generating actionable insights using
modern information technology are daunting organizational tasks, and the traditional structures
of the Intelligence Community are deep-seated.
This chapter first explores the positive contributions that the NCTC has made to the
community. It has brought about many constructive changes to the way that intelligence is
generated, especially with respect to interagency communication and information sharing.
Additionally, its existence has solidified the modern paradigm of counterterror intelligence,
which embraces complexity and deep cross-referential analysis.
However, despite the positive changes that it has made, the NCTC still exhibits many
organizational flaws, some being nearly identical to those that motivated its creation. This
chapter investigates these issues in two distinct ways. First, it highlights specific instances in
which the NCTC, in coordination with the larger Intelligence Community, has been found to fail
in its mission. It walks through the specific failures and relates those to the organizational issues
that they either cause, or demonstrate exist. The two chosen points of failure are the 2009
Christmas Bomber and the currently unfolding foreign fighter crisis.
54
Finally, the investigation moves toward more implicit flaws that have not necessarily
contributed directly to the concrete failures outlined above. These flaws are found both within
the internal structure of the NCTC and in the relationships that it maintains with other agencies
in the IC. Furthermore, in the context of Big Data analytics, the NCTC provides many barriers to
the implementation and use of computational techniques, choking off sources of innovation for
the modern era. The existence of these flaws demonstrates that the NCTC is far from fulfilling its
mission of fully integrating all-source intelligence.
Successful components of the National Counterterrorism Center’s Structure Immediately following the events of 9/11, a major investigation was launched in order to
determine why the attack had come without warning. How had the United States intelligence
community (IC), the most capable in the world, completely missed a plot this ambitious? After
two and a half years of dedicated effort, a team of nearly fifty people produced The 9/11
Commission Report: Final Report of the National Commission on Terrorist Attacks Upon the
United States.1 Through extremely extensive work the team managed to trace a path through
millions of documents and decisions in multiple agencies, pinpointing the exact moments at
which major failures occurred.
The report allowed for specific structural components of the IC to be analyzed. For years,
leaders in the community had been pushing for reform.2 Dozens of reports produced hundreds of
structural and analytical suggestions to advance the IC into the modern world. The
recommendations had gone largely ignored for a decade. However, the result of the report
1 Zelikow, Phillip. “The 9/11 Commission Report: Final Report of the National Commission on Terrorist Attacks Upon the United States” July 22, 2004. https://www.gpo.gov/fdsys/pkg/GPO-911REPORT/content-detail.html. 2 Zegart, Amy. Spying Blind: The CIA, the FBI, and the Origins of 9/11. Princeton, NJ: Princeton Press, 2007.
55
proved that the proposed changes were now credibly grounded in reality. The inter-agency
fragmentation that had been identified now came to the forefront. The information that could
have potentially stopped the 9/11 terrorists had been collected but not analyzed by the agencies
or shared with the appropriate parties. Members of the IC were coming to realize that the modern
counterterror intelligence environment was far different than they had previously thought. The
work done in this period laid the foundations for the modern conceptions of effective intelligence
and agency organization.
The most concrete end effect of the report was the establishment of the National
Counterterrorism Center by the Intelligence Reform and Terrorism Prevention Act of 2004
(IRTPA).3 Though the bill does not specifically mention the intelligence failures leading up to
9/11, the events are clear motivators for the center. The legislation makes specific mentions of
access to “all-source intelligence” and “independent, alternative analyses” as main goals for the
new agency. With these objectives in mind, they set about building the new center.
Improved information access and dissemination
The most logical goal of creating a central hub for the counterterror intelligence
community was the centralization of information. Instead of sitting locked away in various
separate agency databases, it would also reside in a single repository, where those with access to
the NCTC data-stores would have access to the databases of the interagency community. This
would hopefully avoid the major catastrophes that were the 9/11 missed opportunities.
The classic example is the fundamental break between FBI and CIA data-stores. In their
own information stores, they housed ominous implications of what may come, but together
3 Intelligence Reform and Terrorism Prevention Act of 2004, 2004. http://www.nctc.gov/docs/pl108_458.pdf.
56
provided a relatively clear picture of what was going on: foreign Al’Qaeda agents had entered
the United States, were taking flying lessons, and should at the very least have been monitored if
not apprehended. Even without specific monitoring, the FBI was in possession of the full names,
addresses, bank information, and telephone numbers of some the terrorists.4 However, they
simply did not know that these men were dangerous.
The NCTC’s information centralization was aimed at alleviating this problem. First, it
houses the data of more than thirty different intelligence and law-enforcement networks, making
them available to the interagency community.5 Furthermore, it maintains a data-store known as
the Terrorist Identities Datamart Environment (TIDE), which stores information on international
terrorists. This data is compiled from a variety of domestic and international sources, providing
real-time information for analysts in many agencies.
The successful combination of these databases is a major technical, political and
bureaucratic feat that should not go overlooked. It is exceedingly hard to convince agencies to
part with any of their precious data and difficult to actually implement it organizationally.6
Furthermore, the technical requirements are massive, due to the fact that many agencies use
custom technologies, causing headaches with integration. The success that it has had here proves
that the NCTC is capable of achieving very technically challenging goals. Future federal
information centralization can be built upon the foundation laid by the NCTC’s data integration
efforts.
Interagency Communication and Collaboration 4 Zegart, 156. 5 Best, Richard. “The National Counterterrorism Center (NCTC)—Responsibilities and Potential Congressional Concerns.” Congressional Research Service, December 2011. https://www.fas.org/sgp/crs/intel/R41022.pdf, 5. 6 Peled, Alon. “Coerce, Consent, and Coax: A Review of U.S. Congressional Efforts to Improve Federal Counterterrorism Information Sharing.” Terrorism and Political Violence 1, no. 18 (August 2014).
57
The creation of the NCTC has done more than simply house more data under a single
roof. It has also produced a better environment for actual collaboration with the data. Before the
creation of the center, there was a dearth of opportunities to communicate directly with other
agencies, deepening the siloed nature of the IC. Each department simply collected its own data,
analyzed it in its own informational context and reported it up the chain to command. Final
decision-makers relied on extensive contextual aggregation at top levels to piece together the
varied reports into a single counterterror intelligence product.
Another of the NCTC’s stated goals is to provide a more comprehensive layer of
integration below the Director of National Intelligence level.7 The language of its defining
legislation suggests that information be funneled through the NCTC before it is presented to
policymakers. Reports are therefore available to more members in the intelligence community,
broadening overall understanding of the issues and allowing for alternative analyses.
Furthermore, the NCTC claims to improve the “situational awareness” of the community
as a whole. It does so by hosting three secure video conference calls each day. Some of these
calls happen in the early hours of the morning – regular schedules do not apply to counterterror
intelligence. The meetings occur between NCTC employees and employees of various agencies,
ensuring that they are constantly in contact. This type of contact no doubt fosters better
collaboration between the agencies.
The efforts put forth by the center put the IC leaps and bounds ahead of where it was
when the World Trade Center came down in 2001. The IC had not yet adapted to the modern
counterterror environment, and the NCTC brings a more modern structural element. However, it
is not entirely clear how much better equipped the IC is in its battle against terrorism. Even with
7 Intelligence Reform and Terrorism Prevention Act of 2004.
58
the massive changes instituted in 2004, major intelligence failures still happen, sometimes for the
similar reasons as on September 11, 2001.
Concrete Points of Failure in the Counterterror Community
Despite the massive budgetary and personnel increases in the past decade and a half, the
counterterror intelligence community still experiences unacceptable failures. First, as noted in
Chapter 1, it is unreasonable to expect any type of intelligence agency to be 100% accurate on
every piece of information. This type of “creeping determinism” produces expectations of
analysts that are impossible to satisfy and eventually counterproductive.
On the other hand, the members of the IC must learn from their mistakes and take
responsibility when they fail to do so. When major intelligence signals are missed due to
structural deficiencies that the NCTC was supposed to solve, there must be some accountability.
Therefore, while this is not an attempt to condemn the employees working at the National
Counterterrorism Center and its sister agencies, it is a condemnation of the structural weaknesses
that reduce their chances of success. Sometimes these weaknesses make it impossible for
analysts to correctly do their jobs. The two presented examples demonstrate the deficiencies that
reduce the effectiveness of the NCTC.
The 2009 Christmas Bomber
On Christmas day in 2009, Umar Farouk Abdulmutallab, a 23-year-old Nigerian man,
boarded a Detroit-bound plane in Amsterdam. In his underwear he had hidden a non-metallic
pouch filled with the chemical pentaerythritol tetranitrate, a major ingredient in some plastic
59
explosives.8 As the flight began its descent into Detroit Metropolitan airport, Abdulmutallab
went into the bathroom, where he attempted to inject the pouch with another liquid as a reagent
to begin an explosive reaction. Thankfully, he somehow botched the injection and only managed
to start a fire instead of setting off an explosion. Though there were no air marshals on the flight,
other passengers managed to subdue him and put out the fire that he had started.
Chemists posit that the amount of explosive that he had was more than enough to
puncture the hull of a commercial airliner and would most likely have done serious damage to
the airplane.9 It is impossible to know whether or not he could have brought down the aircraft,
but if he had, 278 lives would have been lost. Airline security is supposedly extremely robust,
especially following the events of 9/11. Furthermore, the flight was entering the US, meaning it
should pass American standards of security. This raises the question: how did this kind of attack
get through? What failures occurred that allowed this to happen?
The Senate Intelligence Committee put together an extensive report on the failures of the
intelligence community in this instance, and the findings were not favorable for any agency. The
National Counterterrorism Center sits at the center of these agencies and is intended to be the
glue that holds together the analytical efforts of the entire community.
First, the report identifies the Department of State as having made mistakes with
Abdulmutallab’s multiple-reentry visa to the United States. He had originally applied for it in
2008 while in a Master’s program in mechanical engineering in London. However, after he
abandoned his family to attend an extremist training camp in Yemen, his father went to the US
8 Burr, Richard. “Unclassified Executive Summary of the CommitteeReport on the Attempted Terrorist Attack on Northwest Airlines Flight 253,” May 2010. http://www.intelligence.senate.gov/publications/report-attempted-terrorist-attack-northwest-airlines-flight-253-may-24-2010. 9 Johnson, Carrie. “Explosive in Detroit Terror Case Could Have Blown Hole in Airplane, Sources Say.” The Washington Post, December 29, 2009, sec. Nation. http://www.washingtonpost.com/wp-dyn/content/article/2009/12/28/AR2009122800582.html.
60
embassy in Nigeria to report that his son had likely been radicalized.10 This information went to
both the State Department and the CIA. The State Department opted not to revoke
Abdulmutallab’s visa, despite the credible accusation made by his own father. However, even
had his visa been revoked, there was no automated electronic way for the State Department to
notify related parties (such as the NCTC), or even the airlines that would be processing him.11 It
is the responsibility of the NCTC to ensure that information like this gets disseminated to the
entire community, instead of being ignored.
The failures did not end with the State Department. The CIA had previously generated
reports on Abdulmutallab due to his involvement in Yemeni extremist camps. However, the
existence of this information was not reported widely, and many CIA offices had no idea that it
existed.12 As a result, regional divisions that were not focused on Yemeni extremism did not
search databases that contained the reports related to Abdulmutallab. The information was spread
across too many different sources for them to find anything coherent. And finally, the
information that was collected by the CIA was not given to the NCTC until after the attacks had
occurred. This means that the CIA was in possession of highly relevant counterterror intelligence
that it had opted to withhold from the rest of the community. Had the CIA shared the reports, it is
more likely that other agencies would have been able to identify the threat the Abdulmutallab
posed. This exact type of information centralization is one of the stated goals of the NCTC.
Investigations of the FBI revealed more failures. Even if all of the CIA information had
been centralized correctly in the NCTC, it may not have mattered. In the aftermath of the failed
attack, investigators found that critical FBI analysts did not have access to the CIA data streams
10 DeYoung, Dan Eggen, Karen, and Spencer S. Hsu. “Plane Suspect Was Listed in Terror Database after Father Alerted U.S. Officials.” The Washington Post, December 27, 2009, sec. Nation. http://www.washingtonpost.com/wp-dyn/content/article/2009/12/25/AR2009122501355.html. 11 Burr, 4. 12 Burr, 5.
61
coming through the NCTC.13 Due to a basic technical misconfiguration of security profiles, the
information was blocked from showing up in database searches, meaning the FBI personnel were
not even aware that they were unable to access critical information. Had the terror attempt not
occurred, it is likely that this misconfiguration would have persisted for a long time.
Finally, the rigid and incomplete standards set up between the agencies prevented
Abdulmutallab from being nominated for the no-fly list.14 Because each agency only had smaller
pieces of the puzzle, none had the entire picture of his extremism. This initial information
fragmentation in itself is a consequence of the complex intelligence environment and cannot be
avoided: no agency can expect to find all of the puzzle pieces on its own. However, there was no
specific mechanism to begin the process of integrating the information. No agency was tasked
with seeding an initial, though incomplete, profile that could be built upon by other
organizations. Overly complicated or rigorous standards for establishing new entries hindered the
ability of the NCTC to even begin the integration process.
The withholding of information and lack of collaboration (or any type of communication,
really) caused Abdulmutallab to be excluded from any type of potential terrorist database,
despite the wealth of evidence against him. Basic technical glitches added additional problems
on top of the basic organizational ones, creating an intelligence environment that removed the
possibility of predicting the threat that Abdulmutallab presented. He was therefore able to get on
a plane to Detroit with explosives sewn into his clothing, and only luck prevented the deaths of
hundreds of civilians. These intelligence failures occurred in spite of the NCTC’s specifically
stated goals to prevent them.
13 Burr, 8. 14 Burr, 11.
62
The Foreign Fighter Phenomenon
The Syrian civil war has raged since 2011. The clashes between Bashar Al-Assad’s
government forces and the separatist groups have seriously destabilized the region. The rebel
fighters splintered into subgroups and reformed in new ways. Eventually a new power emerged
from the chaos: the Islamic State of Iraq and Syria (ISIS).15
From the outset of the war, civilians from western countries began to leave their homes to
fight in the struggle against what they considered Assad’s “oppressive” regime. Huge numbers of
civilian deaths and the alleged use of chemical weapons further fueled the influx of what
eventually became known as “foreign fighters.” As the dynamic of the conflict changed and ISIS
gained ground and manpower, the foreign fighters began joining almost exclusively its ranks.
Massive recruitment campaigns began to flood the Internet as ISIS recruiters took to the web in
order to bolster their numbers.16
As its forces swelled with local and foreign fighters alike, ISIS became bolder, declaring
itself a renewed caliphate in mid 2014, and capturing major Iraqi cities such as Ramadi, Falluja,
and Mosul. With its newfound influence, it accelerated its recruitment apparatus and began to
engage heavily with the western world, convincing uncommitted extremists to join the nascent
caliphate. The methods used for recruitment often begin in publicly accessible channels over
open social media sites such as Facebook, Twitter, or Tumblr. Within months of the declaration
of the caliphate, an estimated ten thousand fighters arrived in Syria from nearly 80 countries.17 It
is estimated that hundreds of the recruited fighters are American citizens.
15 Wood, Graeme. “What ISIS Really Wants.” The Atlantic, March 2015. http://www.theatlantic.com/magazine/archive/2015/03/what-isis-really-wants/384980/. 16 “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel.” Homeland Security Committee, September 29, 2015. https://homeland.house.gov/wp-content/uploads/2015/09/TaskForceFinalReport.pdf. 17 “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel,” 10.
63
The foreign fighters that make their way to Iraq and Syria provide grave threats to the
American homeland. These threats manifest themselves in two main ways. First, they provide
English-speaking recruiters to ISIS, increasing their outreach in western spheres of influence.
These recruiters are capable not only of convincing more civilians to join the war in Syria and
Iraq, but to inspire local acts of violence as well. In fact, as more foreign fighters have joined
ISIS, the number of ISIS inspired terror attacks in the west have gone up dramatically. In 2015,
the number of attempted and successful attacks was nearly double that of 2014 (37 compared to
20).18
Concrete examples of ISIS’s increased influence continue to manifest themselves.
Perhaps one of the most notable is the San Bernadino attack, in which 14 people were killed and
21 people were injured. The couple responsible for the violence had pledged their allegiance to
the Islamic State.19
A possibly more dangerous consequence of the foreign fighter phenomenon is that of
fighters who return to their home countries. These are men and women have received training
and combat experience in a brutal civil war. Their expertise could prove deadly upon their return.
One of the most pertinent examples of the threats of returning foreign fighters is the November
2015 Paris terror attack. The mastermind of the attacks was a Belgian citizen named Abdelhamid
Abaaoud.20 He had previously made his way to Syria to fight for the Islamic State and received
combat training and experience. He eventually returned to Paris, where he planned the attacks
that took the lives of 130 people and injured nearly 400 more.
18 “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel,”15. 19 Koren, Marina. “How the San Bernardino Shooters Planned for Jihad.” The Atlantic, December 9, 2015. http://www.theatlantic.com/national/archive/2015/12/san-bernardino-shooters-radicalization/419610/. 20 Freytas-tamura, Aurelien Breeden, Kimiko De, and Katrin Bennhold. “Call to Arms in France Amid Hunt for Belgian Suspect in Paris Attacks.” The New York Times, November 16, 2015. http://www.nytimes.com/2015/11/17/world/europe/paris-terror-attack.html.
64
Out of the hundreds of Americans that have attempted to get to Syria and Iraq, only 28
have been successfully interdicted.21 Though many have died or opted to stay in the region, 40
American foreign fighters are thought to have made it back into the United States. Each
represents a possible threat. One has already been apprehended for plotting an attack against a
US military base.22 An extensive report conducted by the Department of Homeland Security
(DHS) has concluded that the counterterror community is not reacting effectively to this growing
threat and must improve its interdiction capability. Furthermore, the NCTC has acknowledged
the central role that it is playing in the identity resolution of potential foreign fighters leaving or
entering the country.23 This is not to say that the foreign fighter threat is the fault of the NCTC. It
is not expected to solve the problem on its own. However, it is not fulfilling its role, and
exploring this failure can provide a useful example to investigate the ways in which NCTC’s
structural flaws manifest themselves.
The NCTC has not been able to prove that its watch listing capabilities have improved
since failures in 2009 (Christmas) and 2013 (Boston). There have been no independent reviews
of its progress. Even so, what has been reported by DHS does not seem promising. In fact,
despite efforts to truly centralize information, the IC still relies on two separate terrorist watch
list databases. The NCTC manages the aforementioned TIDE data instance, while the FBI
maintains its own “Terrorist Screening Database” (TSDB).24 Both databases often contain only
partial information on hundreds of thousands of suspected terrorists. There is no guarantee that
the databases do not contain partially overlapping information, causing further information
21 “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel”, 23. 22 “Columbus, Ohio Man Charged with Providing Material Support to Terrorists.” Department of Justice, April 2015. https://www.fbi.gov/cincinnati/press-releases/2015/columbus-ohio-man-charged-with-providing-material-support-to-terrorists. 23 Rasmussen, Nicholas. Hearing before the House Committee on Homeland Security “Countering Violent Islamist Extremism: The Urgent Threat of Foreign Fighters and Homegrown Terror,” February 12, 2015, 3. 24 “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel,” 26.
65
fragmentation. Additionally, these databases are difficult to access, even for appropriate analysts
outside of the center. For example, the TSA reported that when it submitted a pattern-matching
query to the NCTC to run against its TIDE instance, the results came back from the NCTC
eighteen months after the initial request.25 By that time, the results were mostly meaningless. The
delay effectively signaled that the TSA does not have access to this database.
Additionally, the interagency community, including the NCTC, continues to disregard the
vital role that state and local fusion centers can serve in the intelligence process.26 Federal
intelligence services often fail to query local fusion centers for data, alienating a potentially
critical source of counterterror information. They rarely share their own data with the state and
local centers; meaning local investigations are often working partially blind. These state and
local fusion centers are important because they are often the closest geographically to suspected
terrorists and have a better understanding of the local environment.
Finally, and perhaps ironically, the DHS report itself represents an organizational schism
that exists in the community. Among the many recommendations found in the report, DHS posits
that it should be given more responsibility to coordinate terrorist watch-list databases, a task that
aligns better with the stated goals of the National Counterterrorism Center. Having both parties
attempting to centralize information in different locations is confusing and counterproductive.
These conflicts of interest in terms of responsibility only add to the fragmentation of data in the
intelligence community. This position taken by the DHS exacerbates the problems with the
power dynamics that exist within the IC.
25 Inspector General Roth, John. TSA: Security Gaps : Statement of John Roth Inspector General, Department of Homeland Security, Before the Committee on Oversight and Government Reform. US House of Representatives, November, 2015. https://oversight.house.gov/wp-content/uploads/2015/11/11-3-2015-Committee-Hearing-on-TSA-Roth-DHS-OIG-Testimony.pdf, 5. 26 Best, 2.
66
Implicit Organizational Deficiencies Within the NCTC
While the National Counterterrorism Center has exhibited concrete technical and
organizational deficiencies, it still suffers from a number of other issues that have not manifested
themselves directly in post-event reports. This does not mean that these structural problems have
not contributed to the aforementioned intelligence failures. It is likely that each of these have
exacerbated the reported issues in their own way.
Organizationally, the National Counterterrorism Center is in a unique position. It operates
in two distinct spheres of influence and responsibility. The first is inherent to any organization:
its internal processes. Any organization has multiple interacting components that must function
on their own and integrate smoothly with all other components. As discussed in Chapter Two,
this is not an easy feat to achieve. The second sphere is external: the NCTC’s position in the
interagency community. The NCTC is tasked with making a multi-agency group emulate a
centralized organization. The abstract concept of interacting components remains relevant, where
each agency in the space represents a structural component of the larger organization.
Ultimately, each component should integrate successfully with all others in order to achieve the
goals of the community.
This puts the NCTC in an exceedingly difficult position, as it must not only worry about
the function of its own complex inner-workings, but on the inner-workings and interactions of
the rest of the agencies in the counterterror intelligence community. Because it is responsible for
the goals of the community as a whole, the internal processes that exist in agencies outside of its
explicit control directly affect its ability to achieve those goals. Therefore, the failures of these
agencies as they interact in the interagency space can be seen as structural flaws that the NCTC
is responsible for.
67
Information Sharing and Access
The NCTC is meant to be a melting pot of analysts from a variety of agencies, a place
where they step out of their isolated information siloes and collaborate on important intelligence
issues. The first step to this collaboration is the sharing of “proprietary” data from the
intelligence collection pipelines of the individual agencies. As this is one of the main goals of the
center, it should have a well-defined set of processes for its analysts to follow when engaging in
information sharing. Likewise, it should have a robust and intuitive information-sharing platform
for the analysts to use.
Unfortunately, this is not the case. The information environment is set up in such a way
that it delegates responsibility to data-collectors to define who is allowed to access the
information.27 These data-collectors are wildly inconsistent in their decisions and do not have a
strictly defined process to determine what classification information should have. Often, this
information is only disseminated to those that have a very clear and documented “need-to-know”
reason for accessing the data.
The data collectors are held responsible for their choice of classification long after they
make their decision. It is a naïve assumption that theses collectors will err on the side of
openness, especially given the current climate in classification matters. For example, in the
Hillary Clinton email case, there are talks of indictments over emails that were retroactively
classified.28 This practice is not uncommon and creates incentive structures against sharing. This
means that any information that could possibly be sensitive in any way will be largely
27 Putbrese, Daniel. “Intelligence Sharing: Getting the National Counterterrorism Analysts on the Same Data Sheet.” Atlantic Council International Security Papers, 2006. http://www.atlanticcouncil.org/publications/reports/intelligence-sharing-getting-the-national-counterterrorism-analysts-on-the-same-data-sheet, 13. 28 Twitter, Krishnadev Calamur. “Some Clinton Emails Were Retroactively Classified.” NPR.org. Accessed April 13, 2016. http://www.npr.org/sections/thetwo-way/2015/05/22/408774111/state-department-to-release-more-clinton-emails-today.
68
inaccessible. Only analysts with an inarguable “need” to know the contents of the data will have
access.29 As a result, even though shared databases do exist, not everything that could be in them
is actually stored. Finally, in a modern data environment, where a massive amount of
information is being collected each day, having humans read and classify all information by hand
creates an unacceptable bottleneck. This bottleneck significantly slows the process of
information dissemination.
Furthermore, the data that does manage to get past the need-to-know filter may not be
easily navigable. As mentioned in the previous chapter, shared information requires significant
syntactic and semantic documentation to truly be of use, otherwise many analysts must orient
themselves on their own. This process is arduous and has a steep learning curve. In descriptions
of the database structures at the NCTC, it seems that these data stores are simply made available
to analysts with little documentation. 30
Finally, while the centralization of this data is technically impressive, it still falls short of
what is necessary. In fact, it still manifests many of the problems that the NCTC was supposed to
solve. In the late 1990’s a large technological problem came to the fore: each organization stored
its data in its own format and structure. This came to be known as “stove-piping,” where each
organization would funnel its data through its own specific pipeline. These pipelines were
inaccessible and inscrutable to others. Unfortunately, while analysts at the NCTC technically
now have access to more of these “stove pipes,” as of 2013, they still could not access them all in
a single search. The difficulty compounds when they attempt to switch databases, as the security
measures prompt them for passwords upon entry to each agency’s data-space.
29 Putbrese, 14. 30 Nolan, Bridget. “Information Sharing And Collaboration in the United States Intelligence Community: An Ethnographic Study of the National Counterterrorism Center.” PhD Dissertation, University of Pennsylvania, 2013.
69
These informational barriers can make many forms of computational analysis impossible,
especially when an analysis is attempting to synthesize information from multiple contexts.
Instead of being able to take advantage of modern advances in computational speed and data
access, data scientists might need to feed in one data point at a time, effectively removing the
advantage of even having a high-speed computational tool. Even if the analysts are able to access
all data-points in an entire database at a time, the fact that information is spread across more than
thirty different password-protected barriers makes the task an arduous one and difficult to iterate
on.
The flaws that exist within the NCTC’s data sharing model cause serious issues that must
be addressed. The process for determining what information to share is poorly defined. The
incentives for sharing are inverted, limiting the opportunities for the spread of information.
Finally, the information that is shared is not centralized in a way that promotes contextual
integration. It instead simply makes disjointed data stores available for individual queries. What
exists at the NCTC is much better than no access and no sharing, but the system must improve in
order to facilitate truly big-data oriented analytic practices.
Personnel Quality
The driving forces behind any organization are the employees that work there. The
NCTC’s final product, centralized and cross-referenced analysis, is a direct result of the analytic
teams that work there. The quality of this final product is dependent on the quality of the analysts
and data scientists doing the analysis. The employees working in counterterror must have
expertise in a wide range of topics and tools. They must also be willing to share their specific
70
talents with the rest of the community to get the best analysis. Finally, these employees must be
incentivized to work hard toward a mission and must have attainable objectives to achieve.
There are few signals coming directly from the NCTC about the quality of its analysts
and data scientists. However, the majority of its analytical and engineering staff comes from
other agencies, meaning the quality of its staff depends on the quality coming from CIA, NSA,
FBI, and others.31 In this space, there are plenty of signals that federal agencies are having issues
hiring quality new employees, especially among those under thirty.32 The intelligence and law
enforcement sectors are hit especially hard given the higher analytical and technical capabilities
that are often required of their employees.
Numerous factors affect the desirability of the intelligence community for talented
engineers, analysts, and data scientists. Two important ones that surface constantly are salary and
operational freedom. Operational freedom refers to an engineer or analyst’s ability to make some
of his or her own choices on what work to pursue and what tools to use.
First, the salaries offered in government positions are simply not competitive with
salaries in the private sector.33 In a sample of self-reported salaries from the company-rating site
Glassdoor.com, NSA and CIA positions for analysts and engineers reported an average salary of
close to $80,000.34 On the other hand, Facebook’s reported average base salary for data scientists
31 Nolan, 71. 32 Feintzeig, Rachel. “U.S. Struggles to Draw Young, Savvy Staff.” Wall Street Journal, June 11, 2014, sec. Careers. http://www.wsj.com/articles/u-s-government-struggles-to-attract-young-savvy-staff-members-1402445198. 33 Lunney, Kelly. “Public-Private Sector Pay Gap Remains at 35 Percent.” Government Executive. http://www.govexec.com/pay-benefits/2014/10/public-private-sector-pay-gap-remains-35-percent/96830/. 34 “CIA Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/CIA-Salaries-E41381.htm. “NSA Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/NSA-Salaries-E41534.htm. Glassdoor is a website dedicated to housing employee reviews, salary reports, and interview reviews. The content is self-reported, but with enough corroborating stories, can be deemed mostly trustworthy.
71
is closer to $140,000, with $40,000 in stock options, and a $100,000 signing bonus.35 It is not
uncommon for many people to take a slight pay cut to work for a cause that they believe in, but
when the difference is this colossal, only the most dedicated will choose federal intelligence
work. Unfortunately, dedication to the cause is not a replacement for talent and ability.
The working environment presented at the National Counterterrorism Center can deter
even those who are dedicated to service. Making an impact as an individual can be extremely
difficult. The working culture is notoriously rigid, with analysts and data scientists given little
freedom to pursue their own leads.36 The occurrence of “tasking”, where a higher-level manager
or policymaker assigns a very specific question to a lower level analyst is commonplace,
meaning analysts have much less agency in choosing important problems to tackle.
The restriction of essential tools can also serve as deterrents to would-be employees.
Security and structural requirements of the respective agencies create problems that manifest
themselves in technical ways. For example, giving an analyst access to a new database, perhaps
one from the NSA, would take a mere ten minutes for an IT professional.37 However, some
employees report that the process takes months due to bureaucratic red tape.38 These problems
provide major disincentives for desirable employees to seek employment in the Intelligence
Community.
The NCTC is actually in a worse position than the rest of the IC because it pulls analysts
and data scientists from other agencies to work at the center. The CIA, NSA, and FBI are
required to provide employees to work at the NCTC. However, sending employees to the NCTC
does little to further the specific mission of that agency. Therefore, each agency is not
35 “Facebook Research Scientist Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/Facebook-Research-Scientist-Salaries-E40772_D_KO9,27.htm. 36 Nolan, 90. 37 Nolan, 29. 38 Nolan, 29.
72
incentivized to provide their best analysts, because they would lose them. Instead, they are
motivated to send expendable employees, those whose absence would not severely impact the
operations of the agency. Colloquially, the NCTC is known as the “dumping ground” for
underperforming employees, especially during its early days.39
Note that this section is not an attempt to denigrate the analytical and engineering
capabilities of the employees working at the NCTC. It simply is an observation that there is a
considerable source of talent in the private sector that is going untapped by the intelligence
community.
Analytic Collaboration and Culture
In order to provide the “alternative analyses” that are in the main goals of the NCTC, the
analysts at the center must work collaboratively on its final intelligence products. The collective
set of talents and areas of expertise of the analysts and data scientists are meant to combine into a
“critical mass” of competence, allowing them achieve levels of clarity not possible at each of the
individual agencies.
These new analytical conclusions require the unselfish sharing of time, talents, and data
with other employees. Additionally, there must be a sense of mutual respect and camaraderie
when integrating data; otherwise collaboration efforts could be stymied by agency personality
differences. At the NCTC, the cultural bridges that allow for collaboration are shaky at best and
destructively counterproductive at worst. The ways in which analysts are trained at a specific
“home” agency and then sent to the NCTC as a representative of their own agency can create us-
versus-them attitudes that can severely diminish opportunities for collaboration.40 The home-turf
39 Nolan, 80. 40 Nolan, 71.
73
loyalties exist for the majority of analysts and manifest themselves in the attitudes that they have
toward the “other” agencies. The differences mostly arise in interactions between the largest
agencies: NSA, CIA, FBI, and DIA.
When interviewed anonymously about their opinions of other analysts, representatives
from each agency have no shortage of caustic comments about their counterparts. CIA analysts
were considered by all to be relatively competent, but snobby and arrogant about their status.
One DIA agent went as far as claiming that all CIA analysts were “WASPy, Harvard-educated
and fluent in Yiddish or whatever.”41 There was a general sense of distrust of CIA analysts, due
to their perceived cutthroat nature. People considered CIA employees to be only out for
themselves.
FBI employees were colloquially known as “those idiots with the guns in the building.”42
They were considered traditional and generally inept at pursuing the mission. Other analysts
disliked working with them because they thought of them as constantly “a step behind, with an
inferiority complex.”43 NSA did not fare much better than FBI or CIA, having labels such as the
“idiot savants.”44 A common joke for NSA analysts is that while they are intelligent, they are so
socially inept that their most extroverted employees manage to look at other people’s shoes
instead of their own. DIA is considered bottom-rung by the other major agencies, operating as
the agency that is “always certain – never right.” 45
The negative attitudes that these analysts have toward each other can significantly
discourage attempts to collaborate and share data. Why waste time sharing information and
41 Nolan, 72. 42 Nolan, 73. 43 Nolan, 74. 44 Nolan, 73. 45 Nolan, 74.
74
methods with employees when they are either too stupid to deal with it or would possibly throw
you under the bus?
The collaboration issue is exacerbated by the incentive structure that exists around
performance metrics. Analysts are not evaluated by how well they can collaborate and create
innovative new solutions with their colleagues. They are instead evaluated almost purely on the
number of reports that they author, regardless of whose data they used or whom they worked
with.46 This means that a month-long collaborative effort to develop new an innovative
computational model would put the analysts behind another analyst who simply authored ten 2-
page reports in the same time period.
Collaboration also requires a significant amount of trust between analysts. Some NCTC
employees report situations in which their work was effectively “stolen” by another analyst.47
This is often done by taking parts of work and classifying it, cutting off one analyst from all of
the work that they had done. The permissibility of these practices creates an environment in
which collaboration is a dice-roll, not a necessity for good performance.
External Organizational Deficiencies of the NCTC
The NCTC’s internal structural and cultural flaws can be seen as microcosms of the
larger intelligence community. The NCTC’s role as the centralized organizer of the interagency
space means it must have the power to affect change in the community as it sees fit.
Unfortunately, it appears that the NCTC is still subordinate to many of the more powerful
agencies in the community. This subordination manifests itself in ways that contribute to the
center’s internal struggles. 46 Nolan, 99. 47 Nolan, 102.
75
Clearly Defined Roles and Capabilities
As elaborated in Chapter Two, a prerequisite to constructing a successful Big Data
analytic operation is a clearly defined goal and a detailed plan on how to get there. The
operational requirements of a new system, technical or organizational, must be very explicitly
laid out. If this is done correctly, then the process of moving toward the objective can survive
setbacks and changes in the operational environment. If the goals are poorly defined or the plan
is lackluster, then inevitable setbacks can derail the project as the goals are reevaluated and
adjusted. Furthermore, technical debt can be acquired that makes repairing damage extremely
costly. Building a system haphazardly is akin to building a jet engine while the airplane is in
mid-air: it might be possible, but is not advisable.
The NCTC has suffered in this regard since its conception. It begins with the descriptions
of its functions in the Intelligence Reform and Terrorism Prevention Act of 2004. The center is
described as being the “primary organization” for analyzing and integrating all-source
intelligence. Furthermore, the act states that the NCTC should ensure that agencies have
“appropriate” access to the intelligence that is “needed” to accomplish their analytical goals. All
of this seems very reasonable. The community suffered from fragmentation and required
centralization to overcome it. But under closer examination, it becomes clearer that these
descriptions do not robustly define the roles and powers of the center. Whether or not access is
“appropriate” is not an objective term. Likewise it can be impossible to know beforehand
whether or not data is “needed” to complete an assignment.
76
These weaknesses have been repeatedly been surfaced in reports on the NCTC’s
performance. One report claims that the NCTC’s planning apparatus is “rudimentary,” especially
given the fact that the center has no authority to mandate the implementation of its plans.48 It
claims that the center takes on the role of a “non-confrontational think tank” instead of a
centralized authority that integrates information and leads the analytical work of the rest of the
community. Another report claims that the NCTC’s lack of formal authority required that its
director “persuaded, embarrassed, created consensus, or invoked higher authorities” instead of
simply delegating what needed to be done.49
Information Sharing in the Wider Community
The NCTC’s lack of definitive authority in the IC can cause further information-sharing
problems. Because it cannot mandate the dissemination of data, it must therefore ask for it.
While orders must be followed, requests can be denied.50 The tendency that many agencies have
of “protecting” their data makes this denial relatively common, especially when the data may
contain sensitive information that could compromise one of their collection sources.
This type of prejudice does not exist only for the NCTC. The aforementioned caustic
attitudes held by many in the IC about other agencies can prevent sharing as well. The CIA, if it
does not trust the capabilities of the FBI, may choose to not make sensitive information available
to it. The NCTC, the supposed arbiter of information sharing problems, is powerless to stop this
type of behavior. The 2009 Christmas bombing debacle demonstarted that often the NCTC and
48 Col. Brian Reinwald. “Assessing the National Counterterrorism Center’s Effectiveness in the Global War on Terror.” Masters Thesis, Army War College, 2007, 9. 49 Kravinsky, Robert. “Toward Integrating Complex National Missions: Lessons From The National Counterterrorism Center’s Directorate of Strategic Operational Planning.” Project On National Security Reform, February 2010. http://0183896.netsolhost.com/site/wp-content/uploads/2011/12/pnsr_nctc_dsop_report.pdf. 50 Putbrese, 10.
77
other agencies are completely unaware that information is being withheld. The NCTC does not
have the authority to audit all collection of any agency.
Feedback and Iteration Within the Community
Even if the authority existed for the National Counterterrorism Center to impose its
organizational will on other agencies, it might not matter. There is a lack of directive on what
intermediate or final structures should look like. When is the restructuring “complete”? What Big
Data capabilities are required in order to say a certain milestone has been reached? Admittedly it
is a difficult question to answer, but there must be a definition of what a “working as intended”
NCTC looks like.
Making progress in an environment like this is exceedingly difficult. Furthermore,
measuring that progress based solely on outcomes may actually be impossible. A review of the
NCTC’s Directorate of Strategic Operations has found that the center struggles with its “impact
assessment” reports.51 It is difficult to make credible correlations between NCTC actions and the
state of the war on terror. How easily can one prove that it is “making a difference”? In the 18
months leading up to 9/11 America experienced little in terms of terrorist attacks, and yet the IC
was failing catastrophically every day.
As mentioned earlier, a sense of creeping determinism can cloud judgment of past errors,
making them seem more egregious than they actually were. In this way, measuring outcomes can
be an inexact science and can give inaccurate measurements on the performance of the center.
These measurements are still important, but they should be combined with an analysis of the
function of the structures that the NCTC employs to affect change. Was the community put in a
position where it could identify threats and stop them? How well did it react to changes in the 51 Kravinsky, 18.
78
operation environment? Did the changes mandated at top levels make their way down to an
operational context? This is just a small sample of possible feedback mechanisms to use in order
to evaluate performance of the center. It remains difficult to do this when there are no structural
goals to evaluate.
Conclusion
Before the creation of the NCTC information fragmentation was rampant within the
counterterror community. With it, the NCTC brought some semblance of authority and process to
information sharing and collaborative intelligence generation. And yet, while the National
Counterterrorism Center has brought with it a number of improvements to the IC, many issues still
remain.
Complete failures in engaging in the collaborative environment significantly reduced the
possibility of interdicting major threats. Events like the 2009 Christmas bomber showed that
sometimes the only thing preventing the deaths of hundreds of civilians is a large dose of luck.
Similarly, the foreign fighter threat continues to grow, and the intelligence community is attacking
the problem in a disjointed manner.
Other, less obvious problems still exist within the NCTC and continue to plague its place in
the interagency community. It is in a poor position to hire the best analysts and data scientists, it has
a toxic culture in which analysts are pitted against each other and bureaucratic requirements cause
major technical headaches. Furthermore, its poorly defined mission and lack of concrete authority put
it in an often subordinate position within the larger community, meaning it is unable to offset the
failures that exist within other agencies. Overall, the National Counterterrorism Center is a step
forward in terms of what it represents in the IC. Its existence is a general acknowledgement that in a
modern data environment, modern data and analytic practices must be adopted. However, since its
79
inception it was set up to struggle with this mission. Major changes must be made to get it to work
effectively.
80
Chapter 4
Looking to the Future: Innovative Models for the NCTC
“Lots of companies don’t succeed over time. What do they fundamentally do wrong? They usually miss the future.”
– Larry Page, Founder of Google1
The data problems brought on by the Internet age can be intimidating. The scale of data
created every day is beyond human comprehension. Data sizes often have to be abstracted into
units of billions and trillions of bytes (giga and tera). Currently, the size of the Internet is
measured in zetta-notation, which represents a twenty-one-digit number of bytes.2 Even so,
companies exist that aim to discover, index, and expose nearly the entirety of this information.
Many business models depend solely on the ability of the company’s engineers and analysts to
control this massive data effectively. The most obvious examples of this phenomenon are Google
and Facebook, each presiding over their own massively interconnected data empires. They both
handle heterogeneous datasets that are similar to counterterrorism data in size and scope.
This chapter explores the way in which the Big Data analytical products that these
companies build and use are applicable to data science. The tools that consumers use can have
strong parallels with the main analytic workflows that are used in counterterror intelligence
operations. These commonalities make these companies prime candidates for comparison, and
sources of inspiration for improvements to the NCTC’s Big Data analytics structure.
1 “Computing Is Still Too Clunky: Charlie Rose and Larry Page in Conversation.” TED Blog, March 19, 2014. http://blog.ted.com/computing-is-still-too-clunky-charlie-rose-and-larry-page-in-conversation/. 2 “The Zettabyte Era—Trends and Analysis.” Cisco. Accessed April 22, 2016. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/VNI_Hyperconnectivity_WP.html.
81
These companies are optimized not only for the creation of the tools, but their
deployment as well. Some of these companies are the original innovators on Big Data and have
the authority to claim that their structures are suited for its use. In fact, many of the modern
organizational theories on data analytics stem from the work that these companies have done.
The inspiration found in these company’s structures may provide useful ideas on how to
improve the NCTC. However, the NCTC is not a private sector organization and has specific
limitations in its scope and capabilities that must first be addressed before attempting to
implement private sector structures in its environment. Even so, significant lessons can be
learned from these tech giants and provide a path forward for the National Counterterrorism
Center.
Modern Big Data Tools
Google truly is an “all source” company. It attempts to index every webpage, sound file,
video, and image on the Internet. It allows for efficient pattern matching for these objects on the
web. For example, if someone is looking for a specific phrase that appears in a text somewhere
on the Internet, Google can find that phrase quickly. Twenty years ago, search on this scale
would be impossible, and yet Google has managed to build out a system where the average user
can navigate billions of pieces of information with ease.
Contemporary Google search is actually far more ambitious than simply pattern
matching. It also attempts to build a “knowledge graph” that semantically connects knowledge
“entities” using links.3 This means that its algorithms no longer treat data simply as information
that can be matched with other information, but knowable entities that exist in a contextual 3 Sullivan, Danny. “Google Launches Knowledge Graph To Provide Answers, Not Just Links.” Search Engine Land, May 16, 2012. http://searchengineland.com/google-launches-knowledge-graph-121585.
82
fabric. Google built its search engine with an understanding that information does not exist in a
vacuum; each piece of data exists relative to every other one.
A possible workflow with the knowledge graph feature is as follows: users search the
name of a specific actor that they would like to know more about. They receive the usual
relevant blue links that Google is known for. However, they also receive possible continuations
of their search, such as movies that the actor has appeared in, other related actors, or events that
the actor is famous for participating in. The user is therefore more aware of the context that this
entity (in this case the actor) exists in.
The current knowledge graph is extremely extensive. When it first launched, it housed
close to 500 million entities, with 3.5 billion links between them.4 The overarching goal of the
feature is to allow users to better answer queries that require structured answers and context, not
simply relevant links. On the other hand, Google’s product is built with the average Internet user
in mind and is therefore not necessarily as useful for a task as specific as counterterror analysis.
A custom product or a modified version of Google’s would be necessary to be useful in practice.
Facebook’s monolithic platform also stores and curates large amounts of heterogeneous
data. Facebook’s engineers make this data nearly instantaneously available to each of its 1.7
billion active monthly users.5 Unlike Google, Facebook has the luxury of controlling the format
of the majority of the data that it uses. This is possible because the majority of the data that it
houses is generated by the users of the service, and these users can only interact with the
platform in predefined ways, with predefined input types.
4 Sullivan. 5 “Number of Facebook Users Worldwide 2008-2016 | Statistic.” Statista. Accessed April 30, 2016. http://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/.
83
The control that Facebook exercises over its environment allows it to produce even more
robust and accurate models of its burgeoning networks.6 It builds its own knowledge graph of the
Facebook data environment, producing a complex set of entities and links between them. Using
this data structure it is able to produce detailed models of the behaviors of its users. Based on
users’ friends, self-proclaimed interests, and browsing activities, Facebook can generate
personalized feeds that match their information needs. It can even go as far as guessing which
unconnected users might be in the specific social network of another user, just based on their
activities.
Facebook’s acumen in this field has produced a significant number of new discoveries
about the behaviors of social networks. For example, it recently conducted a study on its users’
networks, mapping the degrees of separation that existed within its user-base.7 It showed that
classical graph formulation techniques were useful in practice and paved the way for future
research into social network interconnectivity.
The value of tools like these in counterterror cannot be understated. One of the most
difficult aspects of counterterror intelligence is navigating the immensely rich context that each
investigation takes place in. It is easy to miss very important but subtle semantic links between
entities. For example, a shared hometown by the perpetrators of seemingly unrelated attacks
could signal an exploitable pattern to be used in future analysis. However, links like this are not
immediately obvious, especially given that human analysts cannot check every possible attribute
of each militant that they research. However, a computer can. Automatically generated
knowledge graphs can provide invaluable clues to stumped analysts.
6 Novet, Jordan. “How Facebook Matured Its Data Structure and Stepped into the Graph World,” June 25, 2013. https://gigaom.com/2013/06/25/how-facebook-matured-its-data-structure-and-stepped-into-the-graph-world/. 7 Edunov, Sergey. “Three and a Half Degrees of Separation.” Research at Facebook. Accessed May 1, 2016. https://research.facebook.com/blog/three-and-a-half-degrees-of-separation/.
84
The type of data analytic programs that are employed at Facebook and Google are not
foreign concepts to organizations in the intelligence community. The NSA has built its own
graphical database of the information that it collects.8 The agency began working on it in 2007
and by 2013 had an efficient and useful product. In 2016, the system is no doubt far more robust.
The NSA uses this tool in counterterror intelligence and claim that it derives value from it.
However, even the NSA accepts the limitations of its scope given the information sources that it
has access to. This shows that intelligence agencies are aware that these technologies are useful,
and are actively pursuing ways to integrate them into their analytic workflows.
Both the Internet giants and NCTC-integrated intelligence agencies have systems that can
structure heterogeneous data into context-aware graphical knowledge stores. Yet their successes
are markedly different. In the previous chapter, the intelligence community exhibited major
procedural and structural shortcomings with respect to these methods. How do the Internet giants
structure themselves to take advantage of the technology that they build?
Successful Big Data Structures
The success of modern technology companies has spurred massive organizational studies
over the past few decades.9 The challenges of computational technology and the advent of
massive datasets have produced companies that eschew the traditional models of hierarchy and
delegation. These companies embrace the complexity that modern information environments
provide. The complicated models that sufficed in a less globalized world are no longer tenable,
and organizational structures must reflect this shift. 8 Gallagher, Sean. “What the NSA Can Do with ‘Big Data.’” Ars Technica, June 12, 2013. http://arstechnica.com/information-technology/2013/06/what-the-nsa-can-do-with-big-data/. 9 Galbraith, Jay. “Organization Design Challenges Resulting From Big Data.” Journal of Organization Design 3, no. 1 (2014).
85
The technology industry is built on the littered corpses of failed technology companies. In
fact, it is thought that nearly 90% of all Silicon Valley startups fail. 10 Only those that have a
combination of powerful technology and the ability to leverage it effectively can survive in the
market. As Google and Facebook emerged in the early 2000’s, they became the poster-children
(among a few others) for data-driven companies that could thrive in a globalized information
environment. When compared to the traditional intelligence agencies that have existed for
decades (in the FBI’s case, over a century), their structures appear almost diametrically opposed.
In order to effectively explore these differences, it is important to revisit the
organizational requirements for big data success. For ease of access, they are repeated here:
1. Commitment to Big Data analytics 2. Open information sharing environments 3. Feedback channels and iteration 4. Engineering talent and culture
These factors do not exist in a vacuum and are in fact tied to many of the lessons that were
learned during the creation of the very companies that are being examined in this section. It
therefore follows that these companies perform very well when evaluated using these categories.
Even so, it is important to investigate the ways in which these successful structures are built and
how they have changed.
The Internet Giants
Google is famous for its innovative approaches to hard information problems. Often, it
solves problems that many people do not even know exist and continues to invent products that
revolutionize the way users interact with data. Google search has always been the beating heart
10 Caroll, Rory. “Silicon Valley’s Culture of Failure … and ‘the Walking Dead’ It Leaves behind” The Guardian. June, 2014. https://www.theguardian.com/technology/2014/jun/28/silicon-valley-startup-failure-culture-success-myth.
86
of this pursuit. It continues to drive nearly 40% of all traffic on the Internet.11 The company’s
search platform is an extremely complex piece of software that exists almost organically in the
Internet, constantly self-adjusting and evolving as the information around it changes. Google
implemented this fantastically successful system by maintaining an almost militant commitment
to important organizational principles.
First, Google has a very specific mission with its search product. It aims to solve the
information needs of its users in the best possible way, in the most efficient way.12 Google’s
steadfast commitment to this mission informs its analytical tools. The process of implementing a
search engine is entrenched in Big Data ideals by nature. There is little doubt that Google is
firmly committed to its belief in the power of Big Data.
Google dedicates significant resources to building out its analytic capabilities to
capitalize on this mission. In fact, it was the pioneer for a prevalent paradigm for large-scale
analytics in practice today.13 Since then, it has continued to improve its analytic capabilities and
has entire teams of research staff committed to making improvements in their infrastructure and
their processes.14 This research often results in novel tools that improve analytical capability or
improve access to previously cumbersome tools.
The search giant is uncompromising in its commitment to quality, not allowing anything
to subvert its core objective of democratically exposing the world’s information to Internet users.
For example, the company is extremely strict about the purity of its search results, not allowing
11 Worstall, Tim. “Fascinating Number: Google Is Now 40% Of The Internet.” Forbes. Accessed May 15, 2016. http://www.forbes.com/sites/timworstall/2013/08/17/fascinating-number-google-is-now-40-of-the-internet/. 12 “Ten Things We Know to Be True” – Google. Google https://www.google.com/about/company/philosophy/. This source is the proclamations of the core values of the company, among which is the dedication to an uncompromised search. 13 Dean, Jeff, and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters.” Operating Systems Design and Implementation: Google, Inc., 2004. http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf. 14 Sato, Kazunori. “An Inside Look at Google BigQuery.” Google, 2012. https://cloud.google.com/files/BigQueryTechnicalWP.pdf.
87
any type of internal or external incentives shape the way that these results are displayed to a
user.15 Not even ads, which generate the lion’s share of its profits, are allowed to have any
influence on the search algorithms. Google is committed to getting search results as close as
possible to the information need of the user, not necessarily as close to its preferred short-term
profit model.
The way in which Google pursues its goals of relevant search results is a lesson in
collaborative information environments. There is an understanding that building out products as
expansive as Google Search is not possible without constant collaboration between teams of
engineers. It promotes this collaboration by concentrating on the quality of internal
communications.16 It emphasizes to its employees that they are not at the company simply to
perform a certain task that is handed to them. Instead they are encouraged to become “topic
experts” and to share their expertise with those around the company. The idea is that the
collective set of knowledge will produce solutions that were otherwise impossible to conceive of.
Decision-making is also a highly collaborative process at Google. Nick Leeder, the CEO
of Google France, claims that the managers are not instructed to override employees. Instead,
employees are there to “encourage consensus” among their engineering and analytical teams.17
By requiring that employees collectively agree on forward steps, Google effectively incentivizes
collaboration, as each engineer’s ideas must pass muster with his or her fellow employees. This
is a significant departure from the strict hierarchical structure that has remained entrenched in
intelligence organizations for decades.
15 “Ten Things We Know to Be True”, Google. 16 Dubois, David. “Google, the Network Company: From Theory to Practice.” INSEAD Knowledge, September 11, 2013. http://knowledge.insead.edu/leadership-organisations/google-the-network-company-from-theory-to-practice-2602. 17 Dubois.
88
The openness of information at Google also plays a central role in its analytical strategy.
Google’s central code repository is thought to house nearly two billion lines of code.18 Nearly all
of it is available to any engineer working at the company. Some very sensitive code is only
available to specific engineers, the core search algorithm code being an example.19 The rest is
available for engineers to download, read, and use for their own devices. The complex products
at Google actually share parts of code due to this structure, and one positive change made by an
engineer can improve the capabilities of everyone at the company. The openness allows
engineers and analysts to learn from the work previously done by others and avoid re-inventing
the wheel for many workflows. Udi Manber, the head of Google search products from 2010 to
2014 claimed that “if you need something you look around…If you don’t like what’s available
you build your own.”20
The open access to source code can be compared reasonably with the access to all-source
intelligence data at the NCTC. They are each core products of the sectors that they operate in and
represent the lifeblood of the respective organizations. Open access can significantly increase the
productivity of employees by allowing them to integrate the ideas of others into their own
workflows.
However, as Manber states, not all problems can be solved by using what someone else
has built. Sometimes completely new solutions must be developed. “Building your own tool” at
Google is not a one-step process. It requires teams of engineers to build a product, and teams of
data scientists to evaluate it. Manber describes an iterative relationship between the teams that
18 Metz, Cade. “Google Is 2 Billion Lines of Code—And It’s All in One Place.” WIRED, September 16, 2015. http://www.wired.com/2015/09/google-2-billion-lines-codeand-one-place/. 19 Even if the core algorithms are hidden, a large part of the code is most likely still exposed. Furthermore, the information openness can help the search engineers improve their product, which is by far the most important at the company. 20 Udi, Manber. Interview by Ben Mittelberger. Email, April 22, 2016.
89
develop the search algorithms and the teams that evaluate them. Google spends a significant
amount of time coming up with ways to evaluate its prized engine.21 In a constant push and pull,
the teams propel each other forward in discrete steps, giving and getting feedback, sometimes
discarding changes, and other times keeping them.
The feedback process at Google is not an impulsive process driven by intuition; it is
deeply analytical in nature. It will almost never accept anecdotal evidence as sufficient proof of
the efficacy of a change. It instead relies on statistically significant figures to drive decisions.
This requires access to extensive logging information, something that Manber claims is
absolutely essential to providing any type of feedback.22 Even so, the engineers make plenty of
changes. In 2007 alone, Google launched more than 450 different modifications to its ranking
algorithms, more than one per day.23
The consensus-driven management style at the company also improves the iterative
process. Nick Leeder calls Google a “quasi-flat organization,” in which collaborators can interact
more freely, improving organizational flexibility in the face of difficult problems. The flatter
structure also gives managers better proximity to the working environment, giving the ultimate
decision makers the operational knowledge required to make good choices.
The problem of evaluating and iterating on a search engine may seem unconnected to
counterterror intelligence work; however there are unmistakable parallels. In both fields there is
no correct answer, and there is no end goal in sight. The environment is constantly changing, and
the answers are not entirely obvious. The intelligence community does not own a perfectly
21 Mease, David, and Ya Xu. “Evaluating Web Search Using Task Completion Time.” Boston, MA: Google Research, 2009. http://static.googleusercontent.com/media/research.google.com/en//archive/dmease-sigir09-full.pdf. “Search Evaluation at Google.” Official Google Blog. Accessed May 2, 2016. https://googleblog.blogspot.com/2008/09/search-evaluation-at-google.html. 22 Manber, Udi. “Guest Lecture: CS276 - Information Retrieval and Web Search.” Stanford University, April 2016. 23 Mease.
90
accurate terrorism database that it can check its conclusions against. Likewise, there is no “gold
standard” answer that Google can refer to when evaluating its engine either. They both exist in
degrees of certainty, and there is an understanding that there can always be improvements to
their methods.
Finally, the engineers and analysts who work at Google are the ones who back the
commitment to big data analytics, the open information sharing environments, and the robust
feedback mechanisms. The technical problems that they have solved would have stumped a large
majority of others working on the same issues. Their intelligence and work ethic cannot be
understated. Google’s success in this case is its ability to identify, recruit, and retain these
exceptionally talented employees.
Google’s main attractor for talented employees is the company culture of innovation and
engineering freedom.24 Smart analysts enjoy being given the agency to build out their own
conceptions of progress for the company. Google has a famous though unwritten policy of “20%
time” for its employees.25 Engineers and analysts are encouraged to go outside of their day-to-
day responsibilities to explore possible avenues for improvement and innovation. This time is not
required but the option exists and it exhilarates many of the employees who work there.
The benefits that are offered to employees are also hard to beat. They eat for free at any
of a variety of cafes around Google’s many campuses. They get free childcare, dry-cleaning,
transportation, and 100% health and dental coverage for them and their families. They can even
bring in their pets to work. These are not benefits that are generally offered by (aging)
24 Kuntze, Ronald, and Erika Matulich. “Google: Searching for Value.” Journal of Case Research in Business and Economics, 2009. http://www.aabri.com/manuscripts/09429.pdf, 4. 25 D’Onfro, Jillian, 2015 Apr. 17, “The Truth about Google’s Famous ‘20% Time’ Policy.” Business Insider. Accessed May 2, 2016. http://www.businessinsider.com/google-20-percent-time-policy-2015-4.
91
government organizations. These benefits are in addition to overall better pay, meaning the
benefits compound with savings on general living expenses alongside higher salaries.
Facebook also shares a large number of traits with Google. It has an unwavering
commitment to analytics and embeds its data science team with responsibilities across the entire
organization.26 It has a comparable information environment, with code being open to a majority
of engineers.27 It is continually taking iterative steps with the features of its product. Finally, it
provides an excellent work environment for its employees and compensates them well. Yet,
while both Google and Facebook have similar operating environments, Facebook can provide
additional lessons that are also applicable to the structure of the NCTC.
Facebook decided, unlike the majority of other technology companies, to remove the
barrier between development, testing and, deployment. The same group of engineers and data
scientists does all three in tandem. When things go right or wrong, the employees receive the
feedback directly and must take collective responsibility if fixes are required. The result is a
product suite that is built by people who understand the full context of the environment that they
are developing for. The idea is that when engineers and data scientists are directly exposed to the
impacts of their actions they are able to produce better work. This type of direct involvement is
unrealistic for the intelligence community – it simply does not make sense for an analyst to be
involved in the application of their conclusions. In fact, it is antithetical to the role of intelligence
to make the policy decisions. Still, allowing a higher level of involvement to lower level analysts
in the eventual use of their work may give them perspective that is eventually useful for
improving their performance.
26 Zimmerman, Thomas. “The Emerging Role of Data Scientists on Software Development Teams.” Redmond, WA: Microsoft Research, April 2015. http://research.microsoft.com/pubs/242286/MSR-TR-2015-30.pdf. 27 Feitelson, Dror. “Development and Deployment at Facebook.” IEEE Internet Computing 17, no. 3 (July 2013).
92
The lessons that Google and Facebook have learned over the past decade of explosive
growth have not gone unnoticed by the rest of the industry. The success of their business
structures has propelled a large number of companies to mirror them.28 There is a general
consensus among modern information giants that a commitment to analytics is necessary to
make further progress in the field. The majority of successful tech companies embed analytics
teams within their organizations and dedicate significant resources to capturing and structuring
their performance and usage metrics. They are constantly seeking new directions that could lead
to new and innovative problem solutions. In doing so, they keep pace with the present and
prepare themselves for the future.
Bigger May Not Always Be Better
So far, comparisons have been made between the NCTC and similarly sized
organizations: ones with tens of thousands of employees. They are behemoths that have the
ability to throw massive numbers of people and resources at large problems and hopefully solve
them. However, not all organizations that succeed in tackling important data analytic issues have
comparable workforces or resources. In some situations the work is not necessarily a resource
issue and is instead a focus and agility one.
The constantly shifting startup market in Silicon Valley is a manifestation of this idea.
Even Google and Facebook began as small enterprises focusing on specific and difficult
questions. The difference between the larger monolithic organizations and the nimble startups is
their level of focus and commitment to the mission. Larger organizations are constantly slowed
28 Dill, Kathryn. “‘It’s OK If They Copy Us’: Google’s HR Chief On The Upside Of Giving Away Staffing Secrets.” Forbes. March, 2015. http://www.forbes.com/sites/kathryndill/2015/03/25/its-ok-if-they-copy-us-googles-hr-chief-on-the-upside-of-giving-away-staffing-secrets/.
93
by their legacy commitments and pour significant resources into maintaining them.29 Smaller
teams are unencumbered by aging infrastructure and institutional inertia; they are free to work on
purely forward-thinking pursuits. They commit fully to new and big ideas, building them out,
and constantly improving them.
The dedication to innovative and untested projects is ideal for smaller and highly agile
teams. First, the nature of unexplored paths is that of uncertainty. Tim Junio, the CEO of a
successful network analytics startup called Qadium claims, “creating new technology and
product categories is an unguided process almost by definition. You don’t just wake up and
decide ‘I’m going to go from A to B.’”30 It takes bold, unguided moves to determine what works
and what does not. These moves are often much less intentional than many would like to believe
and they often end in failure. Junio himself admits that Qadium could have originally gone in
five different directions, but only one of them proved to be feasible. The fact that nearly 90% of
startups either fail to grow or die out is proof that most of the time “innovative” ideas do not
work. The small scale of these projects is what makes this failure palatable; it would be
unacceptable if a billion dollar company went belly-up after a committing fully to a new
direction.
A startup’s ability to fail may be necessary to achieving truly new capabilities.31
Innovation rarely results from playing a safe game and focusing on short-term requirements for
the organization. Risks have to be taken, and long-term improvement must be placed on at least
equal footing as daily operations. Smaller organizations have the luxury of being able to make
29 Blank, Steve. “Why the Lean Start-Up Changes Everything.” Harvard Business Review, May 2013. http://www.vto.at/wp-content/uploads/2013/10/Why-the-Lean-Startup-Changes-Everything_S.Plank_HBR-052013.pdf. 30 Junio, Tim. San Francisco. In Person Interview, January 13, 2016. Qadium is a network sensing company that scans public facing devices on the “dark net” for a variety of customers. Their focus is on cyber-security. More information can be found at www.Qadium.com. 31 Blank, 5
94
these commitments, something that existing bureaucracies do not have. For example, the
growing analytics company Palantir allows its engineers and analysts to stop work (as they are
able) for an entire week during its annual “hack week.”32 Employees are encouraged to take this
time to come up with possible new directions for the company, even at the expense of short-term
objectives. These “hack weeks” have been wildly successful, and many of the innovative
analytical offerings of Palantir have come from them.
Large resource pools are not necessary to make these types of innovative leaps. The
smaller team sizes can make the innovative process more effective, not less. Junio claims that in
his experience, and the experience of many other entrepreneurs, larger teams are often
unnecessarily bloated, causing disorganization and a loss in quality.33 What he opts for instead is
a small, focused team of committed and skilled engineers. This view is not necessarily
revolutionary and has been studied for decades.34 However, lean technology startup
environments take it into a more modern context.
Small teams might actually be more likely to fulfill the structural requirements for proper
analytics and development. They can be fully committed to their end objectives because they
have few other processes to run. Their information environment is inherently open because the
team is not large enough to contain organizational information barriers. Iteration and feedback
are necessary because the team itself may be too small to take large steps. Feedback is easy to
get because each employee is embedded in the process of developing the product. The personnel
are compelled to stay and work hard by the ownership that they feel over what they are working
32 Trump, Whitney. “Palantir Hack Week 2015.” Palantir. August, 2015. https://palantir.com/2015/08/hack-week-2015. 33 Tim Junio, 2016. 34 Carmel, Erran, and Barbara J. Bird. “Small Is Beautiful: A Study of Packaged Software Development Teams.” The Journal of High Technology Management Research 8, no. 1 (1997). http://www.sciencedirect.com/science/article/pii/S1047831097900171.
95
on. While it is no doubt required that startups focus on maintaining these concepts, they likely do
not have to reshape their organizational structure to fulfill them.
It is obviously not possible for NCTC to mimic the structure and capabilities of an agile
startup: it is an already large organization operating in cooperation with over a dozen even larger
ones. However, it does not mean that the lessons learned from the failure-ridden culture of
Silicon Valley startups can be ignored. As the NCTC and the larger intelligence community are
faced with increasingly complex data analytic problems, they must begin to search for more
effective methods of developing innovative techniques.
The Comparability of the NCTC
The NCTC is not as effective as these organizations when it comes to handling big data,
both from a technical and an organizational perspective. They simply cannot compete in terms of
compute power, innovative potential, or analytic capability. However, the companies described
in this chapter are the gold standard for the world of big data analytics. They invented the entire
computational paradigm, and it makes sense that they can leverage it better. Additionally, the
NCTC is not a private company; it is a taxpayer funded government organization that handles
highly classified information. Therefore, before enumerating recommendations for the NCTC in
the next chapter, it is first necessary to understand the limitations that restrict the NCTC’s
actions.
Operational Limitations of the NCTC
Even if the organizational inertia were surmountable, which it most likely is not, the
NCTC is not able to fully mimic a successful Internet giant or technology startup. To start, it
96
does not have the ability to dictate its own goals and measurable objectives.35 Directors of the
center were not responsible for the drafting of its role and operational goals and therefore. They
inherited the incomplete organization from its creators. NCTC directors and their executive staff
do not have the same power as a private sector Board of Directors and CEO. They are instead
beholden to the decisions of the legislative and executive branches of the federal government.
The political goals of these two bodies are not necessarily aligned with the analytical goals of the
center and may even hamstring it by stripping it of unpopular (though necessary) authority.36
Furthermore, political interests may force the center to focus on more short-term goals, instead of
investing in long-term strategy and innovation.
Politicians must seem that they are making forward progress and making their
constituents safer in a very concrete way.37 They must communicate the narrative that they have
fixed the problematic processes with their reforms. An example is the Transportation Security
Administration (TSA), which was hastily thrown together within a month of 9/11. It was a quick
fix that seemed reasonable and timely, but proved to be ill conceived and demonstrated
considerable shortcomings. It has not shown to be effective at interdicting possible terrorists, and
a previous director, James Loy, has claimed that it is an “abominable failure”.38 However, in
November of 2001, Americans were terrified. Focusing on less visible and longer-term safety
and intelligence measures would not have been feasible politically; a new program had to be
developed, and it had to be developed quickly. While the NCTC’s creation was more measured,
it is another example of a highly visible security reform that has not had a stellar track record.
35 Refer to chapter one, especially the description of Amy Zegart’s Book Spying Blind, which describes the dozens and reports and hundreds of recommendations that went unimplemented in the 1990’s. 36 Betts, Richard. Enemies of Intelligence. New York, NY: Columbia University Press, 2009, 135. 37 Betts, 136. 38 Lerner, Adam B. “TSA, an ‘Abominable Failure.’” POLITICO. http://www.politico.com/story/2015/06/tsa-airport-security-failure-jeh-johnson-118557.html.
97
Nonetheless, some political goals can be well founded and affect important structural
elements of the organization. Protection of classified information is paramount politically and in
the intelligence generation process. Disseminating sensitive data to more groups inherently
makes the information less secure.39 More points of access mean more endpoints to protect. In a
computational environment hacking can become a serious concern. Each additional location that
data is available gives another opportunity to a potential hacker. In reality, having a segmented
data store is much more secure, as a breach to one store is much less likely to propagate across to
other protected databases. Unfortunately, outside threats are not the only concerns for
information security. With more internal access, the set of potential insiders leaking information
increases significantly, especially when they are not specifically cleared to see that information.
In the words of former CIA director James R. Woolsey: “sharing is fine if you’re not sharing
with the Walkers, Aldrich Ames, Robert Hanssen, or some idiot who just enjoys talking to the
press about how we are intercepting bin Laden’s satellite telephone calls.”40
Breaches in information security are much more serious in the intelligence community
than they are in the private sector. If, due to its open access policy, top-secret Google search code
is leaked, the company may lose a technological edge over competing search engines. If top-
secret information about CIA or FBI informants is leaked, people may lose their lives. The
political costs could be enormous. A concrete example of major political fallout is the 2011 NSA
leak perpetrated by Edward Snowden.41 He was given very open access to a large portion of the
39 Putbrese, Daniel. “Intelligence Sharing: Getting the National Counterterrorism Analysts on the Same Data Sheet,” Atlantic Council International Security Papers, 2006. http://www.atlanticcouncil.org/publications/reports/intelligence-sharing-getting-the-national-counterterrorism-analysts-on-the-same-data-sheet, 14. 40 Woolsey, R. James. R. James Woolsey Testimony To U.S. Senate Committee on Governmental Affairs, 2004. 41 Kerr, Orin. “Edward Snowden’s Impact.” The Washington Post, April 9, 2015. https://www.washingtonpost.com/news/volokh-conspiracy/wp/2015/04/09/edward-snowdens-impact/.
98
data stores at the NSA.42 The only barrier to him stealing thousands of sensitive documents was
his security clearance contract, which he broke. This is not a political commentary on the
revelations of Snowden, but instead an observation that those with universal access to data can
do serious damage to the organizations that they work for.
The inherent requirements of government also hinder the ability for innovation and
iterative feedback. It is impossible for the NCTC to simply drop its current workflows in favor of
developing long-term innovative models. It has pressing responsibilities that it cannot lose sight
of. Furthermore, it is difficult to determine the number of resources that can reasonably be set
aside while maintaining an adequate focus on current terror threats. There is no definition of
what “reasonable” is; the environment is too complex. This is especially difficult when the
NCTC is under constant pressure from all sides to perform its defined mission. To what extent
can the NCTC de-emphasize its traditional workflows?
Feedback mechanisms in government make iteration even more difficult. The feedback
structures that might inform long-term strategy extend outside of the agency itself. If the
president makes a decision, his or her detailed thoughts on the quality or breadth of intelligence
may not effectively trickle back down to the analysts who generated the base intelligence.
Finally, the NCTC’s staffing issues may stem from broader organizational requirements
that extend to the rest of government. Hiring processes are strict, and they cannot ignore the risks
of making the hiring process too open. Budgeting also plays a large role in hiring, as the NCTC
often does not have the tools to make itself a more desirable employer. It is beholden to the
discretion of the Office of Budget Management (OMB), where conceptions of employee value
42 Greenwald, Glenn, Ewen MacAskill, and Laura Poitras. “Edward Snowden: The Whistleblower behind the NSA Surveillance Revelations.” The Guardian, June 11, 2013. http://www.theguardian.com/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance.
99
may not be aligned with those in the intelligence community.43 This makes it difficult to improve
perks and salaries for its most valued employees.
The NCTC struggles against these operational requirements. In some ways it has been
able to inch toward solving these problems and is evolving toward a better role in the
community.44 It has already come a long way. Still, it has yet to remedy many issues.
Fortunately, these limitations do not completely hamstring the NCTC. While described
limitations may make it impossible for the NCTC to fully replicate the structures of massive
Silicon Valley companies, there remain many positive moves for the NCTC to improve its big
data analytic capabilities.
Conclusion
At a certain scale, many data problems begin to look alike. As the volume of data on the
Internet grows larger, and private companies become bolder in their attempts to capture and
analyze it, the tools that they build become more applicable to modern counterterror intelligence.
The exceedingly complex analytical products and models that Internet giants like Google and
Facebook have created are proof that it is possible to control the the gushing, incoherent fire-
hose of data that is all-source intelligence. Furthermore, the harder analytical problems that have
not yet been addressed do not necessarily need massive teams or resources addressing them. The
lean startup model is becoming increasingly popular when it comes to solving hard data
problems, providing even more inspiration for the NCTC’s data needs.
43 Kravinsky, Robert. “Toward Integrating Complex National Missions: Lessons From The National Counterterrorism Center’s Directorate of Strategic Operational Planning.” Project On National Security Reform, February 2010. http://0183896.netsolhost.com/site/wp-content/uploads/2011/12/pnsr_nctc_dsop_report.pdf, 81. 44 ODNI Public Affairs. “NCTC 10 Years Later - A Decade of Service,” September 2014, 5.
100
These companies are not successful because of their technology alone. They were only
able to build and use their technology because of their structures and processes. These specific
organizational and cultural traits allow them to leverage data, employees, and technology in
effective ways. They are committed to their missions, they practice openness and collaboration,
they are constantly attempting to make changes to improve their models, and they all work hard
to recruit and retain extremely smart and motivated employees. It is unlikely that these
companies would have enjoyed the same level of success without a strict adherence to these
organizational principles.
One of the largest lessons to be learned from these companies is their acknowledgement
of the complexity that they face in their mission. They do not hold the illusion that they know
what the future holds, but their structures allow them to create flexible products and models that
can accommodate a continuously shifting information environment. In fact, an adherence to
traditional structures that favor older practices of business development can leave an
organization reeling in the face of modern information needs.
It would be convenient to simply export the successful traits from these companies to the
NCTC. Unfortunately, government intelligence agencies cannot simply realign themselves with
the same incentives as private organizations. The NCTC has its own strict requirements that
block the implementation of many of these structures. Furthermore, a realistic approach
recognizes the past attempts at major change and the organizational energy required to institute
even small changes in a structure as large as the interagency counterterror intelligence
community. Massive structural shifts are just not possible, and instead the NCTC must be
nudged in the right direction with a set of targeted changes.
The importance of learning these lessons now cannot be understated. The NCTC has
101
done a huge amount of good in the intelligence community, but it could have done so much
more. Its directors and cadre have consistently struggled against its crippling bureaucratic
clumsiness and relative lack of authority. They rail against the forces that keep them in the dark
and other agencies that maintain an iron grip over their precious data.
102
Chapter 5
Conclusion: Looking Back and Moving Forward
Intelligence work is not what it used to be. For decades, as the Cold War simmered, the
United States developed a very specific set of intelligence capabilities geared toward monitoring
the Soviet threat. However, after the fall of the Soviet Union, both the nature of the world’s
threats and its informational structures began evolving rapidly. With the disappearance of its
adversary, the intelligence community’s informational mooring point had dissolved completely.
As the US attempted to maintain control in a unipolar world, potential threats became
increasingly global and decentralized.
The intelligence environment has become exceedingly complex in ways that the
community was not designed to tackle. The complexity is in large part due to the increasing
speed of information creation and dispersal. Computers have spawned an era of democratized
and instant communication, causing an explosion in the amount of available data. This massive
quantity generates noise that can obfuscate the true threats that intelligence analysts look for.
Thankfully, the technology that creates this cacophony is also capable of helping sift and sort this
data, giving valuable insights to modern computational analysts. The information revolution
represents a race between the growing complexity of information, and the attempts to control it.
In the past several decades, there are few fields that have not been significantly impacted
by the meteoric rise of information technology. Computers first began as useful extra tools that
could perform specific computationally intensive jobs very quickly, but were used only in
103
specialized contexts. As hardware and software evolved, computers began to take increasingly
central roles in offices where they were used.
Their storage and analytical power have completely transformed the way in which
information-based work is completed. Humans working with data understand it in ways that is
linked to the computational tools used to evaluate it. Physical and email conversations flow into
each other, each being essential tools in collaboration. Writing products are malleable pieces of
data instead of finished, printed pages. Many analysts now conceive of information as it exists in
an excel spreadsheet. Often, the work that people do cannot be divorced from the tools that they
used to do it.
This marriage between human intellect and computational power has its limits.
Contemporary datasets can become so complex that they are beyond the ability of human
analysts to comprehend, let alone manipulate. Furthermore, datasets can get so large that their
size is beyond the limits of conventional computational tools. In the Internet age, the data being
collected each day more than fulfills these traits. These datasets, which require heavyweight
hardware and software infrastructure to manage properly have come to be known as “Big Data.”
While it is notoriously difficult to handle properly, the breadth and depth of information
contained in these datasets can provide insights that would have otherwise remained hidden.
These opportunities for generating enhanced insights extend into the realm of counterterror
intelligence.
Over the past decade, the technical challenges of Big Data have mostly been overcome,
and the methods to do so are becoming increasingly accessible. Dozens of companies develop
and sell their data solutions, creating a competitive and open marketplace, driving down prices
and improving quality. However, despite the technical improvements, analytics remains a
104
complex, difficult task. But technology is no longer the main limiting factor; the challenges of
Big Data have shifted from technical to organizational ones.
Simply having the technology to store and analyze Big Data does not guarantee any
results. It is entirely possible for an organization to acquire a multi-billion dollar Big Data
infrastructure, and not derive value from it. The organization must be structured in a way that
supports the implementation and growth of computational analytics. The proper structures and
processes for a successful Big Data organization can be broken up into four main categories.
First, the organization must have a detailed and strong commitment to analytics. Without
this, integrating Big Data processes with traditional workflows will most likely flounder. Second,
there must be an open information culture that encourages data sharing and analytic
collaboration across the organization. Analysts must be incentivized to work with each other, and
share their methods and conclusions. Third, there must be distinct set of processes and structures
dedicated to feedback and iteration on analytical methods and conclusions. The process of taking
small, focused steps in analytics has been shown to produce better computational models and
conclusions. Fourth, an organization must build out a culture that recruits and retains the best
possible analysts and data scientists.
These characteristics are central when it comes to investigating the Big Data capabilities
of the National Counterterrorism Center. Though the NCTC is in a position to acquire high-end
Big Data technology from the private sector, it cannot outsource its internal structure and
analytical culture. It must therefore focus on these aspects as it looks to move forward.
105
The State of the National Counterterrorism Center
From its beginning, the NCTC had more than just data and analytic challenges to face. It
was established to fix the entrenched organizational problems that the rest of the Intelligence
Community had tried – and failed – to solve. Dozens of reports over the period of a decade had
attempted to push the IC in the correct direction. Only after the events of 9/11 was Congress able
to make the herculean effort to unite the agencies responsible for counterterror intelligence.
To its credit, the NCTC has made significant contributions to the quality and
comprehensiveness of intelligence in counterterror. Its establishment propelled the entire
community forward in terms of collaboration and technical integration. Unfortunately, some
marked failures have exposed the institutional flaws that it has yet to resolve. Investigations have
found that the NCTC is not adequately centralizing intelligence and still has trouble providing
the necessary context to many counterterror analysts working there. When the NCTC is
examined through the lens of big data processes, the underlying causes of these failures become
clearer.
The NCTC exhibits shortcomings in multiple of the structural categories that are essential
for properly leveraging data-oriented intelligence:
1. It lacks the required specific analytic objectives required for embedding analytics into the
organization. Without more specified objectives and detailed plans on how to reach them,
the center may drift back to its traditional workflows. This is especially important in the
case of the NCTC, as it operates in an interagency space and is responsible for
overcoming the traditionalist inertia of over a dozen other agencies.
106
2. The center also struggles with providing a truly centralized data store for the intelligence
agency. Different political and bureaucratic requirements can limit the access to data for
many analysts, meaning they are unable to access the contextual information that exists in
a wide variety of databases. Furthermore, competing incentives between employees and
between organizations can push them to avoid good data sharing practices.
3. The isolation produced by these competing incentives can significantly reduce the
amount and quality of feedback that many analysts get, as they are often only interacting
with analysts of their own agencies. Additionally, adding the bureaucratic layer of the
NCTC creates even more hierarchy, further separating analysts from the end results of
their intelligence products. The lack of proper feedback structures may actually stifle
innovation, restricting newer, more effective forms of analysis.
4. The NCTC faces serious barriers in recruiting and retaining the best analysts and
engineers. Significant pay differences combined with a lackluster working environment
means that it will lose many potentially elite people to the private sector. This is not to
say that the NCTC does not employ extremely talented analysts, but it does miss out on
many opportunities for top-tier talent.
As computational data becomes ever more central to analytical workflows, the NCTC
should focus on improving its capabilities by developing plans and strategies to address each of
these key issues. While it has limitations that are necessary and appropriately related to its
governmental role, it could benefit from further integration of successful structures found in the
107
private sector. There are concrete steps that the NCTC may take in order to improve its
capabilities and produce better analytical results.
Recommendations for the National Counterterrorism Center
These recommendations are given with a full understanding of the history of intelligence
“reform.” Many previous attempts have failed to produce any real change in the community. On
the other hand, some smaller, more incremental changes may have a higher possibility of being
implemented, and can push the NCTC in a better direction. As a further disclaimer, it is not
possible to outline specific steps that the NCTC should take, as this would require detailed
operational knowledge of the center. This is information that is not readily available. These
recommendations instead attempt to identify effective and realistic guidelines that the NCTC
should follow for meaningful improvement.
Operational Mission and Capabilities
The NCTC’s stated missions should be reevaluated, and be rewritten in ways that
specifically designate its role and its desired capabilities. The original conception of the NCTC
was thrown into a fragmented intelligence environment with little regard as to how it would
build its competencies. Its capabilities and focuses grew organically, following paths of least
resistance, and filling in only cracks that were easy to fill. An updated conception of the NCTC
should include its specific access requirements to databases at other agencies in the counterterror
108
community. It should similarly include desired metrics for the integration of data.1 This would
replace the vague goal of “improved information sharing.” Similar to the way that Google
measures its search engine performance, the NCTC would have a benchmark on progress that it
is making in terms of information centralization. Granted, this will not immediately solve the
data access problems that exist in the community, but it gives a clearer target that the community
is working toward. While they might meet significant resistance, at least the NCTC officials
would not be aimlessly attempting to find a role for the center.
The NCTC can only do so much about this from within and requires congressional
assistance. The Congressional Intelligence Committees are responsible for the original language
of the NCTC, and have the power to change its dedicated mission. Though it may be politically
difficult, they should take incremental steps to improve both the specificity of the NCTC’s
mission and its authority to carry it out. They should work closely with both IC officials and data
scientists to understand the needs of modern intelligence work.
Internal Operating Structure and Information Procedures
It is impossible for the NCTC to make sweeping organizational changes without
legislative aid; however, its leadership can still have an impact on its future. The NCTC’s
leadership determines the internal procedures that back the analytical products of the center.
Though they may not fix every problem, high-ranking officials can remove small barriers that
have large impacts on analytical capability.
It may be the case that information-sharing problems in the intelligence community will
1 Possible questions to ask to get metrics include: What sources were eventually used in a report? How were they accessed? What sources were checked? What sources could not be accessed due to information barriers? Measuring the answers to these questions can give a sense of progress, and also reveal pain points in the intelligence generation process.
109
never be solved fully. Top-secret intelligence information has inherent sharing limitations.
However, the excessive hurdles for information sharing can be overcome. Understanding the
human aspect of analytical collaboration is the first step. Often the choices to over-classify
information are made by individuals (not the institutions themselves), and these people can be
convinced to share. The NCTC should work to avoid pitting these analysts against each other
through rigid performance measurement and draconian punitive structures. Instead, it should
work to produce metrics that incorporate collaboration as a goal, and reduce the regulations on
shared information. This has been addressed in the private sector by evaluating employees not
only through their direct contributions but also by the utility that they bring to their peers.
The incentive structures that exist within the center should be also modified to encourage
a focus on both short and long-term goals. Innovation rarely occurs when employees are
focusing solely on routine intelligence reports. Analysts and data scientists should be required to
finish their urgent daily work, while also taking time to look to the future. This can accomplished
by further modifying incentive structures that reward analysts for valuable products that may not
be immediately useful but might be in the long-term.
In order to accelerate its innovative capabilities, the NCTC can learn from the lean
startup model and begin instituting separate innovation-focused programs that move alongside
the daily operations at the center. A focus on small, prototypical products can produce amazing
dividends in the long run. These ideas are making their way to government, and are currently
being implemented at the Pentagon with Secretary Carter’s Innovation Initiative.2 The
Department of Defense (DoD) is putting together an innovation advisory board that sits outside
2 Carter, Ashton. “Drell Lecture: "Rewiring the Pentagon: Charting a New Path on Innovation.” Stanford University, April 2015. http://www.defense.gov/News/Speeches/Speech-View/Article/606666/drell-lecture-rewiring-the-pentagon-charting-a-new-path-on-innovation-and-cyber.
110
of the traditional military planning structure.3 The board’s role is to identify potential innovative
directions for many of the Pentagon’s departments.
In the context of the NCTC, smaller, agile teams like the DoD innovation advisory board
may operate outside of the structure that has already crystallized around the center and its sister
agencies. They would be dedicated to long-term technological and analytical models to facilitate
better counterterror intelligence products. They would also be unfettered by the current feedback
structures and pressures of daily counterterror work. When these smaller teams manage to
construct something useful, they can work to integrate the product into daily workflows. Using
this model, the NCTC can practice its traditional techniques, while improving its effectiveness in
parallel.
Currently, intelligence community leadership attitudes are trending in the direction of
modernization. Senior intelligence officials have noticed the issues and are acknowledging the
difficulties of a complex environment. As a result, they are beginning to make strong pushes for
a more computational approach to problem solving and analytics.4 Furthermore, policymakers
are beginning to understand the new information environment, and have begun imposing more
modern informational requirements on intelligence agencies.5 The institutional will exists to
make these necessary changes, but it must be translated into action.
Personnel
The NCTC’s personnel problem is difficult to solve, especially considering the size and 3 “Pentagon to Establish Defense Innovation Advisory Board.” US Department of Defense. http://www.defense.gov/News-Article-View/Article/684366/pentagon-to-establish-defense-innovation-advisory-board. 4 Kerbel, Josh. “The Complexity Challenge: The U.S. Government’s Struggle to Keep Up with the Times.” The National Interest, August 2015. 5 Savage, Charlie. “Obama Administration Set to Expand Sharing of Data That N.S.A. Intercepts.” The New York Times, February 25, 2016. http://www.nytimes.com/2016/02/26/us/politics/obama-administration-set-to-expand-sharing-of-data-that-nsa-intercepts.html.
111
complexity of the intelligence community. However, the NCTC can still take steps to improve
the quality of its workforce, even with major budgetary constraints. As elaborated earlier, simply
throwing more analysts at the growing information base cannot solve many of these problems. In
fact, more may not be better: it may be the case that smaller numbers of focused analysts can
actually produce better results.
In fact, the counterterror community may simply be bloated with a large number of
analysts. Even employees working at the center acknowledge that there are far too many of
them.6 They claim that the number of counterterror analysts actually produces a negative impact
on intelligence work, as there are too many people competing with each other. The current
numbers are so high that many analysts are unable able to produce separate work; they are only
stepping on each other’s toes.
The NCTC can address this problem this by removing analysts who do not perform well
with modern analytic techniques. However, instead of re-hiring new analysts to compensate for
lost numbers, it should consider keeping the downsized teams. First, the smaller number of
employees simplifies operations, and reduces the bloat that exists in the organization. Second,
the NCTC would also be dealing with freed resources. The strict budgeting requirements of
government mean personnel expenses are a zero sum game. If the teams gets smaller and the
budget remains the same, then the ratio of resources to employees goes up. This could result in
higher salaries and better employee benefits. The improved compensation and working
environment can lure top-tier talent away from the private sector. By engaging in employee
downsizing and resource redistribution, the NCTC can create a smaller, higher quality, and
better-tuned workforce. 6 Nolan, Bridget. “Information Sharing And Collaboration in the United States Intelligence Community: An Ethnographic Study of the National Counterterrorism Center.” PhD Dissertation, University of Pennsylvania, 2013, 158.
112
The NCTC can retain skilled analysts and engineers by putting them in a structure where
they are given the freedom and responsibility to work on new and interesting problems. If the
center places high-value employees in teams that are focused on innovation, they are more likely
to stay and work toward improving the capabilities of the entire center. This means that creating
innovation centric, lean-startup inspired teams has another added benefit: it can help alleviate the
pain that the NCTC feels when it aims to recruit and keep the best that the industry has to offer.
Focusing on the needs of its employees is extremely important for the NCTC. In the end,
it is they who are producing the intelligence used at the top levels. If they suffer, the quality of
their work suffers. The analysts and engineers who do intelligence work are the beating heart of
the NCTC, and their needs must be prioritized.
Lessons Beyond the National Counterterrorism Center The NCTC is not the first attempt by the federal government at centralizing intelligence.
As covered in Chapter One, the CIA was the original conception of an intelligence integration
center. It did not work out to be that simple. Lessons were learned from this failure, and the
intelligence community moved on and adapted accordingly with the Cold War information
environment. Since then, the world has globalized, and questions asked at intelligence agencies
have gotten much harder to answer. The NCTC was the first attempt to formally centralize
intelligence in this modern environment. The organizational and analytical lessons that have
been learned in the 21st century were not nearly as clear in 2004. However, given the difficulties
that the NCTC has experienced, some of the flaws of its original conception have been exposed.
It is imperative that the failures of this modern attempt at centralization are not repeated.
The creation of the National Counterterrorism Center was an exercise in solving a bureaucratic
113
problem with yet another layer of bureaucracy. The NCTC inherited the organizational inertia
and the technical debt of the other agencies. The plans, the structures, and even the employees at
the center fully mimicked the structures of the previously ineffective intelligence community.
What could it really change, if it was subject to the same rules as everyone else, except it had
more responsibility?
The case of the NCTC has implications beyond counterterror intelligence. The United
States Federal Government is going to continue to require data centralization efforts across all of
its branches, and all of its missions. The nature of an increasingly interconnected world dictates
this trend. As it moves to take advantage of large-scale computational analytics in all of its
forms, it must learn that it can no longer afford to continue operating “business as usual.” It must
learn lessons from those that have managed to build the data-oriented present and use them as it
looks to the future.
114
Works Cited “Alphabet: Number of Google Employees 2015 | Statistic.” Statista. Accessed May 3, 2016.
http://www.statista.com/statistics/273744/number-of-full-time-google-employees/. “Apache Hadoop,” n.d. http://hadoop.apache.org/. “Apache Hive,” n.d. http://hive.apache.org/. Bergen, Peter. “Do NSA’s Bulk Surveillance Programs Stop Terrorists?” New American Foundation, January 2014.
https://static.newamerica.org/attachments/1311-do-nsas-bulk-surveillance-programs-stop-terrorists/IS_NSA_surveillance.pdf.
Berner, Martin, Enrico Graupner, and Alexander Maedche. “The Information Panopticon in the Big Data Era.” Journal of
Organization Design 3, no. 1 (2014). Best, Richard. “Intelligence Information: Need-to-Know vs. Need-to-Share,” June 2011.
https://www.fas.org/sgp/crs/intel/R41848.pdf. ———. “The National Counterterrorism Center (NCTC)—Responsibilities and Potential Congressional Concerns.”
Congressional Research Service, December 2011. https://www.fas.org/sgp/crs/intel/R41022.pdf. Betts, Richard. Enemies of Intelligence. New York, NY: Columbia University Press, 2009. Blank, Steve. “Why the Lean Start-Up Changes Everything.” Harvard Business Review, May 2013. http://www.vto.at/wp-
content/uploads/2013/10/Why-the-Lean-Startup-Changes-Everything_S.Plank_HBR-052013.pdf. Boschee, Elizabeth, and Natarajan Premkumar. “Automatic Extraction of Events from Open Source Text for Predictive
Forecasting.” In Handbook of Computational Approaches to Counterterrorism, 1st ed. Springer Science, 2013. Brooks, Frederick. “No Silver Bullet -- Essence And Acccident in Software Engineering.” Univeristy of North Carolina at
Chapel Hill, 1986. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1663532. ———. The Mythical Man Month. Addison-Wesley, 1974. “Building a Scalable Big Data Infrastructure for Dynamic Workflows.” EMC2, n.d.
http://www.emc.com/collateral/solution-overview/h11761-building-scalable-big-data-infra-so.pdf. Burr, Richard. “Unclassified Executive Summary of the Committee Report on the Attempted Terrorist Attack on
Northwest Airlines Flight 253,” May 2010. http://www.intelligence.senate.gov/publications/report-attempted-terrorist-attack-northwest-airlines-flight-253-may-24-2010.
Carabin, David. “Intelligence-Sharing Continuum: Next Generation Requirements for U.S. Counterterrorism Efforts.”
Naval Postgraduate School, 2011. https://www.hsdl.org/?abstract&did=691253. Carmel, Erran, and Barbara J. Bird. “Small Is Beautiful: A Study of Packaged Software Development Teams.” The
Journal of High Technology Management Research 8, no. 1 (1997). http://www.sciencedirect.com/science/article/pii/S1047831097900171.
Caroll, Rory. “Silicon Valley’s Culture of Failure … and ‘the Walking Dead’ It Leaves behind” The Guardian. June, 2014.
https://www.theguardian.com/technology/2014/jun/28/silicon-valley-startup-failure-culture-success-myth. Carter, Ashton. “Drell Lecture: ‘Rewiring the Pentagon: Charting a New Path on Innovation.’” Stanford University, April
2015. http://www.defense.gov/News/Speeches/Speech-View/Article/606666/drell-lecture-rewiring-the-pentagon-charting-a-new-path-on-innovation-and-cyber.
115
Choucri, Nazli, Stuart Madnick, and Michael Siegel. “Improving National and Homeland Security Through Context Knowledge Represenation and Reasoning Technologies.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006.
“CIA Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/CIA-Salaries-E41381.htm.
Clark, Jack. “5 Numbers That Illustrate the Mind-Bending Size of Amazon’s Cloud.” Bloomberg Business, November
2014. http://www.bloomberg.com/news/2014-11-14/5-numbers-that-illustrate-the-mind-bending-size-of-amazon-s-cloud.html.
Col. Brian Reinwald. “Assessing the National Counterterrorism Center’s Effectiveness in the Global War on Terror.”
Masters Thesis, Army War College, 2007. “Columbus, Ohio Man Charged with Providing Material Support to Terrorists.” Department of Justice, April 2015.
https://www.fbi.gov/cincinnati/press-releases/2015/columbus-ohio-man-charged-with-providing-material-support-to-terrorists.
“Computing Is Still Too Clunky: Charlie Rose and Larry Page in Conversation.” TED Blog, March 19, 2014.
http://blog.ted.com/computing-is-still-too-clunky-charlie-rose-and-larry-page-in-conversation/. Davenport, Thomas, and D.J. Patil. “Data Scientist: The Sexiest Job of the 21st Century.” Harvard Business Review,
October 2012. Dean, Jeff, and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters.” Operating Systems
Design and Implementation: Google, Inc., 2004. http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf.
DeYoung, Dan Eggen, Karen, and Spencer S. Hsu. “Plane Suspect Was Listed in Terror Database after Father Alerted
U.S. Officials.” The Washington Post, December 27, 2009, sec. Nation. http://www.washingtonpost.com/wp-dyn/content/article/2009/12/25/AR2009122501355.html.
Dill, Kathryn. “‘It’s OK If They Copy Us’: Google’s HR Chief On The Upside Of Giving Away Staffing Secrets.”
Forbes. March, 2015. http://www.forbes.com/sites/kathryndill/2015/03/25/its-ok-if-they-copy-us-googles-hr-chief-on-the-upside-of-giving-away-staffing-secrets/.
D’Onfro, Jillian, 2015 Apr. 17, 842 46, and 1. “The Truth about Google’s Famous ‘20% Time’ Policy.” Business Insider.
Accessed May 2, 2016. http://www.businessinsider.com/google-20-percent-time-policy-2015-4. Dubois, David. “Google, the Network Company: From Theory to Practice.” INSEAD Knowledge, September 11, 2013.
http://knowledge.insead.edu/leadership-organisations/google-the-network-company-from-theory-to-practice-2602.
Eccles, Robert. “The Performance Measurement Manifesto.” Harvard Business Review, February 1991.] Edunov, Sergey. “Three and a Half Degrees of Separation.” Research at Facebook. Accessed May 1, 2016.
https://research.facebook.com/blog/three-and-a-half-degrees-of-separation/. “Edward Snowden’s Impact - The Washington Post.” Accessed May 4, 2016.
https://www.washingtonpost.com/news/volokh-conspiracy/wp/2015/04/09/edward-snowdens-impact/. Everton, Sean. Disrupting Dark Networks. Cambridge, UK: Cambridge University Press, 2012. Executive Order 13354: National Counterterrorism Center, 2004. http://www.nctc.gov/docs/eo13354.pdf.
“Facebook Research Scientist Salaries.” Glassdoor. Accessed April 15, 2016.
https://www.glassdoor.com/Salary/Facebook-Research-Scientist-Salaries-E40772_D_KO9,27.htm.
116
Feintzeig, Rachel. “U.S. Struggles to Draw Young, Savvy Staff.” Wall Street Journal, June 11, 2014, sec. Careers. http://www.wsj.com/articles/u-s-government-struggles-to-attract-young-savvy-staff-members-1402445198.
Feitelson, Dror. “Development and Deployment at Facebook.” IEEE Internet Computing 17, no. 3 (July 2013). “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel.” Homeland Security Committee,
September 29, 2015. https://homeland.house.gov/wp-content/uploads/2015/09/TaskForceFinalReport.pdf. Fire, Michael, and Rami Puzis. “Link Prediction in Highly Fractional Data Sets.” In Handbook of Computational
Approaches to Counterterrorism. Springer, 2013. Fisher, Danyel. “Interactions With Big Data Analytics.” Interactions, June 2012.
http://dl.acm.org/citation.cfm?id=2168943. Freytas-tamura, Aurelien Breeden, Kimiko De, and Katrin Bennhold. “Call to Arms in France Amid Hunt for Belgian
Suspect in Paris Attacks.” The New York Times, November 16, 2015. http://www.nytimes.com/2015/11/17/world/europe/paris-terror-attack.html.
Galbraith, Jay. “Organization Design Challenges Resulting From Big Data.” Journal of Organization Design 3, no. 1
(2014). Gallagher, Sean. “What the NSA Can Do with ‘big Data.’” Ars Technica, June 12, 2013.
http://arstechnica.com/information-technology/2013/06/what-the-nsa-can-do-with-big-data/. Greenwald, Glenn, Ewen MacAskill, and Laura Poitras. “Edward Snowden: The Whistleblower behind the NSA
Surveillance Revelations.” The Guardian, June 11, 2013. http://www.theguardian.com/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance.
Grossman, Robert. “Organizational Models for Big Data and Analytics.” Journal of Organization Design 3, no. 1 (2014). Hillis, Ken, Michael Petit, and Kylie Jarrett. Google and the Culture of Search. New York, NY: Routledge, 2013.
Ho, Kevin. “41 Up-to-Date Facebook Facts and Stats,” April 2015. http://blog.wishpond.com/post/115675435109/40-up-
to-date-facebook-facts-and-stats. Inspector General Roth, John. TSA: Security Gaps : Statement of John Roth Inspector General, Department of Homeland
Security, Before the Committee on Oversight and Government Reform. US House of Representatives, 2015. https://oversight.house.gov/wp-content/uploads/2015/11/11-3-2015-Committee-Hearing-on-TSA-Roth-DHS-OIG-Testimony.pdf.
Johnson, Carrie. “Explosive in Detroit Terror Case Could Have Blown Hole in Airplane, Sources Say.” The Washington
Post, December 29, 2009, sec. Nation. http://www.washingtonpost.com/wp-dyn/content/article/2009/12/28/AR2009122800582.html.
Junio, Tim. Interview With Tim Junio. In Person, January 13, 2016.
Kerbel, Josh. “The Complexity Challenge: The U.S. Government’s Struggle to Keep Up with the Times.” The National
Interest, August 2015. Kerr, Orin. “Edward Snowden’s Impact.” The Washington Post, April 9, 2015.
https://www.washingtonpost.com/news/volokh-conspiracy/wp/2015/04/09/edward-snowdens-impact/. Knorr, Eric. “Anatomy of an IT Disaster: How the FBI Blew It.” InfoWorld, March 21, 2005.
http://www.infoworld.com/article/2672020/application-development/anatomy-of-an-it-disaster--how-the-fbi-blew-it.html.
117
Koren, Marina. “How the San Bernardino Shooters Planned for Jihad.” The Atlantic, December 9, 2015. http://www.theatlantic.com/national/archive/2015/12/san-bernardino-shooters-radicalization/419610/.
Kravinsky, Robert. “Toward Integrating Complex National Missions: Lessons From The National Counterterrorism
Center’s Directorate of Strategic Operational Planning.” Project On National Security Reform, February 2010. http://0183896.netsolhost.com/site/wp-content/uploads/2011/12/pnsr_nctc_dsop_report.pdf.
Kuntze, Ronald, and Erika Matulich. “Google: Searching for Value.” Journal of Case Research in Business and
Economics, 2009. http://www.aabri.com/manuscripts/09429.pdf.
Lazaroff, Mark. “Anticipatory Models for Counter-Terrorism.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006.
Lerner, Adam B. “TSA, an ‘Abominable Failure.’” POLITICO. Accessed May 17, 2016.
http://www.politico.com/story/2015/06/tsa-airport-security-failure-jeh-johnson-118557.html. Lin, Herbert, and James McGroddy. “A Review of the FBI’s Trilogy Information Technology Modernization Program.”
National Research Council, 2004. Lunney, Kelly. “Public-Private Sector Pay Gap Remains at 35 Percent.” Government Executive. Accessed April 15, 2016.
http://www.govexec.com/pay-benefits/2014/10/public-private-sector-pay-gap-remains-35-percent/96830/. Manber, Udi. “Guest Lecture: CS276 - Information Retrieval and Web Search.” Stanford University, April 2016.\ Mannes, Aaron. “Qualitative Analysis & Computational Techniques for the Counter-Terror Analyst.” In Handbook of
Computational Approaches to Counterterrorism. Springer, 2013. Manyika, James, Michael Chui, and Brad Brown. “Big Data: The next Frontier for Innovation, Competition, and
Productivity.” McKinsey Global Institute, June 2011. Margolis, Gabriel. “The Lack of HUMINT: A Recurring Intelligence Problem.” Global Security Studies 4, no. 2 (spring
2013). http://globalsecuritystudies.com/Margolis%20Intelligence%20(ag%20edits).pdf. McConnel, Steve. “Managing Technical Debt.” International Conference on Software Engineering, 2013. http://2013.icse-
conferences.org/documents/publicity/MTD-WS-McConnell-slides.pdf. ———. “Origins of 10X – How Valid Is the Underlying Research?,” n.d.
http://www.construx.com/10x_Software_Development/Origins_of_10X_%E2%80%93_How_Valid_is_the_Underlying_Research_/.
Mease, David, and Ya Xu. “Evaluating Web Search Using Task Completion Time.” Boston, MA: Google Research, 2009.
http://static.googleusercontent.com/media/research.google.com/en//archive/dmease-sigir09-full.pdf. Metz, Cade. “Google Is 2 Billion Lines of Code—And It’s All in One Place.” WIRED, September 16, 2015.
http://www.wired.com/2015/09/google-2-billion-lines-codeand-one-place/. Meyer, Robinson. “The Unbelievable Power of Amazon’s Cloud.” The Atlantic, April 23, 2015.
http://www.theatlantic.com/technology/archive/2015/04/the-unbelievable-power-of-amazon-web-services/391281/. Mills, Steve, and Steve Lucas. “Demystifying Big Data: A Practical Guide To Transforming The Business of
Government.” IBM, 2012. https://www-304.ibm.com/industries/publicsector/fileserve?contentid=239170. Intelligence Reform and Terrorism Prevention Act of 2004, 2004. http://www.nctc.gov/docs/pl108_458.pdf. Newport, C.L., and D.G. Elms. “Effective Engineers.” International Journal of Engineers 13, no. 5 (1997).
118
Nolan, Bridget. “Information Sharing And Collaboration in the United States Intelligence Community: An Ethnographic Study of the National Counterterrorism Center.” PhD Dissertation, University of Pennsylvania, 2013.
Novet, Jordan. “How Facebook Matured Its Data Structure and Stepped into the Graph World,” June 25, 2013.
https://gigaom.com/2013/06/25/how-facebook-matured-its-data-structure-and-stepped-into-the-graph-world/. NSA. “NSA 60th Anniversary Book,” 2012. https://www.nsa.gov/about/cryptologic-heritage/historical-figures-
publications/nsa-60th/. “NSA Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/NSA-Salaries-E41534.htm. “Number of Facebook Users Worldwide 2008-2016 | Statistic.” Statista. Accessed April 30, 2016.
http://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/. ODNI Public Affairs. “NCTC 10 Years Later - A Decade of Service,” September 2014. Owen-Smith, Jason. “Workplace Design, Collaboration, and Discovery,” 2013.
http://sites.nationalacademies.org/cs/groups/dbassesite/documents/webpage/dbasse_085437.pdf. Pavlo, Andrew. “A Comparison of Approaches to Large-Scale Data Analysis.” Paper presented at ACM SIGMOD
International Conference on Management of data New York, NY, 2009. http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf
Peled, Alon. “Coerce, Consent, and Coax: A Review of U.S. Congressional Efforts to Improve Federal Counterterrorism
Information Sharing.” Terrorism and Political Violence 1, no. 18 (August 2014). “Pentagon to Establish Defense Innovation Advisory Board.” US Department of Defense. http://www.defense.gov/News-
Article-View/Article/684366/pentagon-to-establish-defense-innovation-advisory-board.
Popp, Robert, and John Yen. Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006.
Putbrese, Daniel. “Intelligence Sharing: Getting the National Counterterrorism Analysts on the Same Data Sheet.” Atlantic
Council International Security Papers, 2006. http://www.atlanticcouncil.org/publications/reports/intelligence- sharing-getting-the-national-counterterrorism-analysts-on-the-same-data-sheet. Rasmussen, Nicholas. Hearing before the House Committee on Homeland Security “Countering Violent Islamist
Extremism: The Urgent Threat of Foreign Fighters and Homegrown Terror,” 2015.
Rumsfeld, Donald. “DoD News Briefing - Secretary Rumsfeld and Gen. Myers.” February 12, 2002. http://archive.defense.gov/Transcripts/Transcript.aspx?TranscriptID=2636.
Sagan, Scott, and Kenneth Waltz. The Spread of Nuclear Weapons: An Enduring Debate. New York, NY: W. W. Norton,
1995. Sato, Kazunori. “An Inside Look at Google BigQuery.” Google, 2012.
https://cloud.google.com/files/BigQueryTechnicalWP.pdf. Savage, Charlie. “Obama Administration Set to Expand Sharing of Data That N.S.A. Intercepts.” The New York Times,
February 25, 2016. http://www.nytimes.com/2016/02/26/us/politics/obama-administration-set-to-expand-sharing-of-data-that-nsa-intercepts.html.
Schmidt, Eric. Presented at the Techonomy Conference, 2010, n.d. http://techonomy.com/tag/eric-schmidt/. Schrodt, Philip, and David Brackle. “Automated Coding of Political Event Data.” In Handbook of Computational
Approaches to Counterterrorism. Springer, 2013.
119
“Search Evaluation at Google.” Official Google Blog. https://googleblog.blogspot.com/2008/09/search-evaluation-at-
google.html. Sharkey, Brian. “Information Processing at Very High Speed Data Ingestion Rates.” In Emergent Information
Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006. Silke, Andrew. “Research On Terrorism.” In Terrorism Informatics: Knowledge Management and Data Mining for
Homeland Security. University of East London School of Law, 2008. Singh, Arjun. “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network.”
Paper presented at ACM Sigcomm. London, UK, 2015. Sliva, Amy. “SOMA: Stochastic Opponent Modeling Agents for Forecasting Violent Behavior.” In Handbook of
Computational Approaches to Counterterrorism. Springer Science, 2013.
Stern, Jessica, and J.M. Berger. “ISIS and the Foreign-Fighter Phenomenon.” The Atlantic, March 2015. http://www.theatlantic.com/international/archive/2015/03/isis-and-the-foreign-fighter-problem/387166/.
Subrahmanian, V.S. Handbook of Computational Approaches to Counterterrorism. Springer Science, 2013. Sukumar, Sreenivas, and Regina Ferrell. “‘Big Data’ Collaboration: Exploring, Recording and Sharing Enterprise
Knowledge.” Journal of Information Services and Use 33, no. 3 (July 2013). Sullivan, Danny. “Google Launches Knowledge Graph To Provide Answers, Not Just Links.” Search Engine Land, May
16, 2012. http://searchengineland.com/google-launches-knowledge-graph-121585. “Ten Things We Know to Be True.” Google. Accessed May 1, 2016.
https://www.google.com/about/company/philosophy/. The New York Times. “Many Say U.S. Planned for Terror but Failed to Take Action.” The New York Times, December
30, 2001, sec. National. http://www.nytimes.com/2001/12/30/national/30TERR.html. “The Zettabyte Era—Trends and Analysis.” Cisco. Accessed April 22, 2016.
http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/VNI_Hyperconnectivity_WP.html.
Trump, Whitney. “Palantir Hack Week 2015.” Palantir. August, 2015. https://palantir.com/2015/08/hack-week-2015. Twitter, Krishnadev Calamur. “Some Clinton Emails Were Retroactively Classified.” NPR.org. Accessed April 13, 2016.
http://www.npr.org/sections/thetwo-way/2015/05/22/408774111/state-department-to-release-more-clinton-emails-today.
Udi, Manber. Interviewed by Ben Mittelberger. Email, April 2016. Uffe, Wiil. Counterterrorism and Open Source Intelligence. Springer, 2011. United And Strengthening America By Providing Appropriate Tools Required To Intercept And Obstruct Terrorism (USA
PATRIOT ACT) ACT OF 2001, 2001. http://www.gpo.gov/fdsys/pkg/PLAW-107publ56/pdf/PLAW-107publ56.pdf. Valacich, Joseph, and Christoph Schneider. “Managing the Information Systems Infrastructure.” In Information Systems
Today: Managing in the Digital World, 2013. Waltz, Edward. Knowledge Management in the Intelligence Enterprise. Artech House, 2003.
120
Warner, Michael. “Wanted: A Definition of ‘Intelligence.’” Journal of the American Intelligence Professional 46, no. 3 (2002). https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol46no3/article02.html#rfn7.
Wegener, Rasmus. “The Value of Big Data: How Analytics Differentiates Winners.” Bain & Company, 2013.
http://www.bain.com/Images/BAIN%20_BRIEF_The_value_of_Big_Data.pdf. Wood, Graeme. “What ISIS Really Wants.” The Atlantic, March 2015.
http://www.theatlantic.com/magazine/archive/2015/03/what-isis-really-wants/384980/. Woolsey, R. James. R. James Woolsey Testimony To U.S. Senate Committee on Governmental Affair, 2004. Worstall, Tim. “Fascinating Number: Google Is Now 40% Of The Internet.” Forbes. Accessed May 15, 2016.
http://www.forbes.com/sites/timworstall/2013/08/17/fascinating-number-google-is-now-40-of-the-internet/. Zegart, Amy. Spying Blind: The CIA, the FBI, and the Origins of 9/11. Princeton, NJ: Princeton Press, 2007. Zelikow, Phillip. “The 9/11 Commission Report: Final Report of the National Commission on Terrorist Attacks Upon the
United States,” July 22, 2004. https://www.gpo.gov/fdsys/pkg/GPO-911REPORT/content-detail.html. Zimmerman, Thomas. “The Emerging Role of Data Scientists on Software Development Teams.” Redmond, WA:
Microsoft Research, April 2015. http://research.microsoft.com/pubs/242286/MSR-TR-2015-30.pdf.
121