in data we trust? the big data capabilities of the ...km541mf6208... · data needs. it explores the...

125
In Data We Trust? The Big Data Capabilities of the National Counterterrorism Center A THESIS SUBMITTED TO THE INTERSCHOOL HONORS PROGRAM IN INTERNATIONAL SECURITY STUDIES Center for International Security and Cooperation Freeman Spogli Institute for International Studies STANFORD UNIVERSITY By Benjamin Mittelberger May 2016 Adviser: Dr. Martha Crenshaw

Upload: others

Post on 19-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

In Data We Trust? The Big Data Capabilities of the National Counterterrorism Center

A THESIS

SUBMITTED TO THE

INTERSCHOOL HONORS PROGRAM IN INTERNATIONAL

SECURITY STUDIES

Center for International Security and Cooperation

Freeman Spogli Institute for International Studies

STANFORD UNIVERSITY

By

Benjamin Mittelberger

May 2016

Adviser:

Dr. Martha Crenshaw

Page 2: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

  ii  

Abstract

The  National  Counterterrorism  Center  (NCTC)  sits  as  the  central  nexus  for  counterterror  intelligence  analysis  and  strategy  for  the  entire  American  intelligence  community  (IC).  It  exists  to  offset  the  fragmentation  issues  that  came  to  light  in  the  aftermath  of  the  9/11  attacks.  Its  role  is  to  integrate  all  types  of  intelligence  information  using  traditional  and  computational  Big  Data  techniques.  Unfortunately,  the  NCTC  is  not  in  a  position  to  take  full  advantage  of  contemporary  Big  Data  analytics,  severely  reducing  its  analytical  capabilities  in  the  current  information  environment.  These  weaknesses  stem  from  its  organizational  structure,  not  its  technological  capabilities.    

The  NCTC’s  deficiencies  manifest  themselves  both  its  internal  structure  as  well  as  its  position  within  the  community  as  a  whole.  Internally,  the  NCTC  does  not  have  the  information-­‐sharing  environment  that  it  should  and  provides  poor  incentives  for  collaboration  and  innovation.  Externally,  it  has  a  poorly  defined  role  within  the  community  and  lacks  the  power  that  is  required  in  order  to  truly  centralize  big  data  analytics.  The  private  sector  is  leaps  and  bounds  ahead  of  the  government  when  it  comes  to  the  implementation  of  Big  Data  analytics.  Internet  giants  such  as  Google  and  Facebook  may  serve  as  models  for  future  improvements  to  the  NCTC.  With  full  understanding  that  certain  limitations  apply  to  government  agencies,  this  thesis  provides  recommendations  on  how  the  NCTC  may  improve  its  ability  to  take  advantage  of  large-­‐scale  data  analytics  and  provide  a  better  service  to  the  rest  of  the  Intelligence  Community.  

Page 3: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

  iii  

Acknowledgements

This thesis would have been impossible without the support from my adviser, Professor Martha Crenshaw. I started the process completely clueless, and she guided me through it expertly. She began nudging me in the right direction six months before I even started writing. She has taught me so much, most notably how much I have left to learn. Without her direction, I would not have been able to formulate a reasonable question, let alone answer it. I would also like to thank Professor Coit Blacker. His pointed questions often forced me to defend and sometimes reconsider my positions. They were not always convenient to struggle through but eventually helped me produce better work. It was nice to know he was always there to keep me honest. I also want to thank Shiri Krebs and Shelby Speer for the support that they gave me throughout the process. To the rest of my CISAC cohort, I’m not sure that I could have finished this without all of you. From groggily meeting each other at six a.m. for a day at Gettysburg, to commiserating over tacos about the state of our theses, your presences have kept me grounded in this process. I won’t soon forget the extent of our late night working sessions and long conversations. I’ve made some amazing friends that I hope to keep for a long time, regardless of where we all end up. To my friends outside CISAC, thank you for keeping me sane as I worked through this project. I was not always the most fun to be around, but I appreciate that you kept me around anyway. Finally, to my parents, I’m not sure I can really thank you in a paragraph in the acknowledgements section of my thesis. Just know that everything that I’ve been able to accomplish stems from the opportunities and guidance that you gave me.

Page 4: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

  iv  

Table of Contents

Abstract  ......................................................................................................................................  ii  Acknowledgements  ...............................................................................................................  iii  Introduction  ..............................................................................................................................  1  The  Evolution  of  the  Modern  Counterterrorism  Information  Environment  ......  5  Defining  “Intelligence”:  Evidence  and  the  Knowledge  Production  Process  .............  14  Modern  Counterterrorism  Intelligence  Paradigms  .............................................................  22  

A  Novel  Intelligence  Toolbox:  Computational  Analytics  in  Practice  ....................  28  The  growth  of  “Big  Data”  ................................................................................................................  29  Computational  Methods  in  Counterterrorism  ......................................................................  31  Technical  Requirements  for  Big  Data  success  .......................................................................  36  Organizational  Requirements  for  Big  Data  Success  ............................................................  39  Conclusion  .............................................................................................................................................  51  

Organizational  Successes  and  Failures  of  the  NCTC  ..................................................  53  Successful  components  of  the  National  Counterterrorism  Center’s  Structure  .......  54  Concrete  Points  of  Failure  in  the  Counterterror  Community  .........................................  58  Implicit  Organizational  Deficiencies  Within  the  NCTC  ......................................................  66  External  Organizational  Deficiencies  of  the  NCTC  ..............................................................  74  Conclusion  .............................................................................................................................................  78  

Looking  to  the  Future:  Innovative  Models  for  the  NCTC  ..........................................  80  Modern  Big  Data  Tools  ....................................................................................................................  81  Successful  Big  Data  Structures  .....................................................................................................  84  The  Comparability  of  the  NCTC  ...................................................................................................  95  Conclusion  .............................................................................................................................................  99  

Conclusion:  ...........................................................................................................................  102  Recommendations  for  the  National  Counterterrorism  Center  ....................................  107  Lessons  Beyond  the  National  Counterterrorism  Center  .................................................  112  

Works  Cited  ..........................................................................................................................  114  

Page 5: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

  1  

Introduction

Visiting the National Counterterrorism Center (NCTC) can seem like a trip to a

blockbuster movie set. The high levels of security expected of a clandestine intelligence agency

are all there: the armed guards, the metal detectors, even the radiation sealing boxes for all

electronics. Suited analysts and uniformed armed forces personnel fill the hallways.

In the bowels of the building sits a cavernous central command center with rows of desks,

each equipped with a set of triple monitors, all facing an array of wall-mounted screens. These

desks are manned 24 hours per day by analysts who monitor the intelligence streaming in from

the entirety of the counterterrorism intelligence community. In theory, these analysts receive the

most time critical and relevant information in counterterror intelligence.

The rest of the building also pulses with intelligence activity. Agencies from all over the

intelligence community are represented, working together in order to centralize American

counterterror efforts in the Global War on Terror. Everyone is trying to identify and prevent the

next inevitable terror plot.

The futuristic command space is a microcosm of the NCTC’s role in the counterterror

intelligence community. It is sleek, new, and places an emphasis on the use of modern

technology to perform intelligence work. It also represents one of the most modern

organizational developments in intelligence: a move to centralize counterterror information in

the Office of the Director of National Intelligence (ODNI). The creation of the NCTC is a union

of two ideas: combining centralized organizational structures with modern computational

technology can offset intelligence failures.

Page 6: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

  2  

It is apparent that technology alone has been and will be insufficient to successfully

combat terrorism. Originally, I thought the focus of this thesis should be on the specific

computational methods the NCTC should employ or the type of data and algorithms it might use.

As I began this project, I (an idealistic computer scientist) thought I could write my own

analytical tools in order to prove that these methods were feasible. Surely the Islamic State’s use

of social media, which often foreshadowed its intent could be mined and analyzed using “Big

Data” techniques. Methods of analyzing datasets of this size have been around for a decade. Why

is the NCTC unable to tackle this threat? All that I thought was needed was someone competent

to unearth the required information.

But, after weeks of fruitless search through millions of social media posts mentioning

ISIS, I was stumped. I did not even know what I was looking for. Even if I did, I most likely

would have been unable to find it. I did not have the necessary tools to take advantage of the Big

Data that I had collected; I needed more robust hardware and software. However, the technology

was only one piece of the puzzle. I realized I needed topic experts to give me background on the

situation. I needed foreign language speakers to translate the dozens of languages I was

encountering. I needed tips on suspicious actors. I needed access to law enforcement to get

warrants for more information. I needed a dozen more resources. As it turns out, the process of

generating counterterror intelligence is extremely difficult and requires an effort from a diverse

set of skills.

The NCTC was created to provide these required resources. It is the organization that

theoretically combines all-source counterterror information in the Office of the Director of

National Intelligence (ODNI). Given the tools and analysts it has at its disposal, the NCTC is in a

better position to solve Big Data problems. However, it is not clear that it is able to conclusively

Page 7: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

  3  

solve these analytic puzzles. Counterterror intelligence failures have plagued its record and

continue to do so. Major blunders, leading to harsh investigations, have created some doubt

about its effectiveness. Additionally, it has been the subject of many other critical reports, further

exposing flaws in its structure and processes. Nevertheless, even given these assessments, it is

still not obvious how to fix these problems.

There is no doubt that hardworking analysts are producing quality work at the center.

Furthermore, there is no question that it has considerable Big Data analytic means. But its

concrete failures and implicit faults raise questions about the efficacy of its Big Data efforts. As

computational analytics are becoming increasingly central in a data-rich world, the NCTC must

be prepared to lead the charge to transform counterterror analytics. However, it must first be able

to effectively leverage the technology that it has acquired. Given the doubts about the NCTC’s

performance, this thesis seeks to answer the question: Does the NCTC possess the requisite

structures and processes to successfully conduct intelligence operations in a Big Data

environment?

To address this question, this thesis is split up into five chapters. The first chapter

addresses counterterror intelligence in a modern context. Information environments have become

increasingly complex, and traditional analytic methods are unable to keep up. The second

chapter addresses the computational methods that have been developed in response to shifting

data needs. It explores the concept of Big Data and the organizational structures required to

leverage it effectively.

Chapter Three applies the ideas of modern Big Data analytics to the National

Counterterrorism Center’s capabilities. It assesses the performance of the center and examines

the ways in which it does not fulfill the organizational requirements of computational analytics.

Page 8: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

  4  

It concludes that the NCTC has significantly improved the analytical capability of the

intelligence community, but it still has many structural and procedural flaws that must be

addressed.

Chapter Four looks to the private sector for inspiration on productive Big Data structures.

Through an analysis of the Internet Giants, Google and Facebook, alongside an analysis of

smaller analytic startups, a path forward may become clearer for the NCTC. The comparability

of these private sector organizations to the NCTC is also discussed.

The thesis concludes with a summary of the findings and a series of recommendations for

the NCTC. The recommendations include a call for the reorientation of the center’s goals, an

expanded approach to innovation, and an increased focus on employee quality. These

recommendations are given with an understanding that forward progress is difficult, but also

with the hope that some steps can be taken to better position the National Counterterrorism

Center in the complex world of Big Data analytics.

Page 9: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  5  

Chapter 1

The Evolution of the Modern Counterterrorism Information Environment

“There are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don't know we don't know. And if one looks throughout the history of our country and other free

countries, it is the latter category that tend to be the difficult ones.” -Donald Rumsfeld, 20021

The counterterror intelligence community has undergone major changes in recent

decades. Once relegated to the obscure, understaffed, and underfunded departments in various

scattered agencies across the American intelligence community, it is now one of the main

focuses of American security. Significant resources and effort have gone into improving its

performance and reach. Understanding the motivations for this transformation is necessary to

explore the community’s goals, methods, strengths, and weaknesses.

This chapter first investigates the developmental history of the community from a

structural perspective. The community has suffered from cultural, ethnographic, and

organizational issues for decades. Internal and external fragmentation have plagued its data

collection and analysis abilities, creating a system that was effectively designed to miss modern

informational trends. The community has gone through a tortured reform process, going through

major upheaval in the aftermath of the 9/11 attacks, culminating with the creation of the National

Counterterrorism Center (NCTC).

                                                                                                               1  Rumsfeld, Donald. “DoD News Briefing - Secretary Rumsfeld and Gen. Myers.” February 12, 2002. http://archive.defense.gov/Transcripts/Transcript.aspx?TranscriptID=2636.  

Page 10: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  6  

The adjustments made to the structure and methods of the intelligence community have

come in response to a shifting informational environment. This is one that is far more

decentralized, unstructured, and larger in scale than traditional methods were capable of

monitoring and interpreting. The explosion in computational technology has driven fundamental

changes in the ways that intelligence is collected and analyzed. The new security environment,

combined with the development of modern tools, has transformed the fundamental goals and

models of counterterror intelligence.

The Organizational Evolution of Counterterror Intelligence

A Brief History of Intelligence

American conceptions of intelligence only came to mature in the decades following

World War II. The CIA was created in 1947 to serve as a central nexus of intelligence for the

intelligence community (IC) as a whole. Other agencies, such as the NSA in 1952, joined the

fray as the Cold War escalated. In all, seventeen agencies work in conjunction to bring together

intelligence for policymakers in the White House, Capitol, and Pentagon.

Each agency has a specific role: to collect a certain type of intelligence, analyze it and

provide conclusions based on their collected data. These agencies have evolved in isolation,

building up extensive areas of expertise in their own specific fields. For example, the National

Geospatial Intelligence Agency (NGA) deals exclusively with aerial imagery as a source of

intelligence. Using a variety of different camera systems and detection methods, they screen out

what they consider unimportant images and propagate images of interest up the chain of

command. They are not responsible for providing or analyzing any other sources of intelligence

information. This type of isolation creates self-contained agencies that have their own structure,

Page 11: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  7  

culture, and operating procedures.2 The intelligence produced by these compartmentalized

organizations is presented up the hierarchy where it must be combined.

The segmentation of the IC ran deep, even manifesting itself at the level of the individual.

The employees of an agency like the CIA were often hired, evaluated, and promoted directly

within the agency, rarely interacting professionally with anyone outside of their own cloistered

setting.3 The clandestine nature of their work further isolated them from the outside world, often

preventing them from speaking about their professional lives with almost anyone outside of their

“home turf.” What resulted were strong internal communities, with intense loyalty to one’s own

agency and little regard for the intelligence community as a whole.4 In fact, the environment

fostered negativity among many of the members of the community, where an “us versus them”

attitude dominated interagency discourse.

The resulting structure was one of competition instead of collaboration. Information

became an asset to be used for personal or institutional gain, not something to be shared with the

rest of the community. Strict processes for “protecting” information on a need-to-know basis

choked off flows of information between agencies. Only high-ranking members of the

community could see anything other than the distilled conclusions from a massive pool of

collected data. Instead of working together, agencies worked in parallel down to the individual

level, generating their own intelligence products.

For decades, the security environment that existed suited the segmented nature of these

agencies. The separate agencies of the IC had co-evolved with the actions of the Soviet Union

and therefore were structured in ways that matched the Soviet structure. This meant that each

                                                                                                               2 Zegart, Amy. Spying Blind: The CIA, the FBI, and the Origins of 9/11. Princeton Press, 2007. 67 3 Zegart, Spying Blind , 68. 4  Nolan, Bridget. “Information Sharing And Collaboration in the United States Intelligence Community: An Ethnographic Study of the National Counterterrorism Center.” PhD Dissertation, University of Pennsylvania, 2013, 57.

Page 12: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  8  

agency could work independently and still provide sufficiently detailed intelligence on its target.

Additionally, the Cold War presented an adversary that was monolithic in nature. There was little

confusion about the location or the intent of the Soviet Union, meaning agile inter-agency

collaboration was rarely necessary. As a result, a combination of the separate intelligence

conclusions from each agency was enough to make coherent and very often correct decisions by

senior leadership.

The fall of the Soviet Union marked a shift in the operational requirements as the

monolithic threat collapsed. The result for the intelligence community was certain listlessness in

action. There simply was no imposing enemy force to motivate work by the community. It was

not immediately obvious what the new threat was, if it even existed. Clearly there had to be a

shift in strategy, and intelligence officials commissioned dozens of reports on the new

intelligence environment, searching for new improvements and directives for the modern

American intelligence apparatus.5 Zegart counted 514 recommendations in the reports relevant to

counterterror intelligence. The recommendations focused on issues ranging from intelligence

targets to organizational structure. Divisions ran deep within the community as officials grappled

with the findings of the new reports and how to implement their suggestions. In the end, few

recommendations were ever actually implemented. Only 10% of the recommendations were

successfully fulfilled, with another 11% being partially implemented.6 80% resulted in no action

at all.

Hints of new security threats started to appear in the decade preceding 9/11. The

destabilization of Eastern Europe and the Middle East began to produce weaker states and rogue

groups. In 1998 a middle-aged Saudi exile named Osama Bin Laden faxed an article from

                                                                                                               5  Zegart,  Spying  Blind,  29.  6  Zegart,  Spying  Blind,  36.  

Page 13: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  9  

Afghanistan to a London based Islamic newspaper, declaring a fatwa in the name of a “World

Islamic Front”.7 The fatwa is a scholarly interpretation of Islamic law that Bin Laden used in

order to call for the murder of any American, anywhere in the world. In the same article he

praised the mounting terror attacks against Americans, including the 1993 World Trade Center

bombing and the 1993 Somali firefights, which caused an American military pullout. He even

claimed that he would bring the fight to American soil in the near future. The intelligence

community noted his declarations, and a few analysts were assigned to track him. Al’Qaeda was

on the IC’s radar, and Clinton even authorized a plan to kill him.8

However, there was a general lack of concern about his ability to actually threaten the

United States beyond simple car bombings.9 There were no major shifts in intelligence focus, or

in the methods employed to conduct counterterrorism intelligence. It just did not seem that a man

relegated to one of the poorest and most remote regions on earth could be capable of anything

truly destructive.

The 9/11 Attacks and Their Implications

The attacks on September 11th caught the counterterror intelligence community

completely off-guard. They simply had no idea that this attack was coming. However, in another

strange sense, there was little surprise that an attack had occurred.10 The Al ‘Qaeda threat had

been present for years, and those assigned to track it knew something was going to happen.

Furthermore, there had been a decade of ignored recommendations and non-adaptation to a new

                                                                                                               7 Zelikow, Phillip. “The 9/11 Commission Report: Final Report of the National Commission on Terrorist Attacks Upon the United States,” July 22, 2004. https://www.gpo.gov/fdsys/pkg/GPO-911REPORT/content-detail.html, 47 8  The New York Times. “Many Say U.S. Planned for Terror but Failed to Take Action.” The New York Times, December 30, 2001, sec. National. http://www.nytimes.com/2001/12/30/national/30TERR.html. 9 Zelikow, 9/11 Comission Report, 74-85. 10 Zegart, Spying Blind , 86.

Page 14: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  10  

intelligence environment. Senior leadership at the national intelligence level acknowledged the

importance of counterterror intelligence, but was either unable or unwilling to overcome massive

organizational inertia to make major changes and accommodate better intelligence practices.

Analysts in many of the isolated agencies reported a sense of dread and helplessness as they tried

desperately to conceptualize what they considered to be a looming threat.11

9/11 spurred an unprecedentedly large investigation on the intelligence failures leading

up to the attack. The federal government commissioned a massive report, and over three years it

combed through logs and timelines to determine what had gone so wrong that there had been no

warning for an attack of this scale.12 The resulting 600-page report explored the history of the

intelligence community and its investigations of Al ‘Qaeda. Social scientists also rushed to

uncover the causes for the failure of such a large and powerful system. The result was a growing

body of scholarship focusing on the deficiencies of the intelligence community and how to fix

them.13 They also discovered a number of missed opportunities to uncover the plans for 9/11

before it happened. No single agency was to blame; the failures were spread across the IC.

Unsurprisingly, the majority of the failures by the community can be attributed to poor

organizational management and cultural issues. The inter-agency nature of counterterror

intelligence simply was not constructed to deal with a well-funded clandestine group that

spanned multiple continents. The traditional setup that had dominated intelligence efforts broke

down targets by country or region and completely separated domestic and foreign intelligence

gathering. This resulted in both internal and inter-agency fragmentation. No single person or

                                                                                                               11  Zelikow, 9/11 Comission Report, 259.  12 Zelikow, 9/11 Comission Report. 13  Silke, Andrew. “Research On Terrorism.” In Terrorism Informatics: Knowledge Management and Data Mining for Homeland Security. University of East London School of Law, 2008, 29  

Page 15: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  11  

agency was definitively in charge of monitoring Al ‘Qaeda, providing disjointed and incomplete

monitoring.14

An illustrative example of internal fragmentation occurred in early 2000, as the CIA was

tracking suspected Qaeda operates travelling through Southeast Asia. The CIA clandestine

officers believed (correctly) that the suspects were planning something big. However, the way in

which they were tracking them was disorganized and improperly segmented. The CIA had

distinct sets of personnel that focused on each nation state, meaning that the operatives

conducting the surveillance depended on the location of the targets. As the Qaeda operatives

travelled from Malaysia to Thailand, the jurisdiction for their surveillance changed hands. The

Malaysian CIA outpost desperately sent messages to the Thailand-based operatives, but by the

time the messages were received, the suspects had disappeared.15 In a post-9/11 investigation, it

was found that two of these suspects continued on to California, where they planned the 9/11

attacks for 18 months. They had both managed to obtain US visas – despite their high-risk

designation by the CIA.

Inter-agency communication, or lack thereof, allowed for the Qaeda operatives to enter

the country and make arrangements without detection. The FBI, the agency in charge of

domestic counterterror intelligence, had poor procedures for procuring necessary information

from other sources, namely the CIA. In mid-2001, flight school employees tipped off the FBI

about a group of suspicious men who had requested Boeing 747 flight simulator lessons.16 The

FBI was even in possession of their names and addresses. However, because they did not

regularly communicate with the CIA about their operations, they did not know that the men that

they were following were known Al ‘Qaeda agents who had eluded surveillance and capture

                                                                                                               14 Zegart, Spying Blind 103. 15 Zegart, Spying Blind, 100-104. 16 Zelikow, 9/11 Comission Report, 273  

Page 16: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  12  

abroad. The pieces of the puzzle were all there: the CIA knew that these men were members of a

militant group dedicated to the slaughter of Americans and the FBI knew where they were and

what they were doing. Unfortunately, the fragmentation within and between the agencies

prevented the puzzle pieces from being put together.

Movements Toward Reform

The 9/11 attacks provided the necessary impetus to begin instituting substantive

organizational and procedural change in the intelligence community. The bureaucratic inertia

that had plagued the community for a decade only became surmountable after a tragedy of this

scale. Within two months, major legislation was drafted to improve US counterterror intelligence

efforts.17 The PATRIOT act of 2001 significantly increased both manpower and funding for

counterterror activities, including intelligence efforts. It created new laws and requirements for

border crossing and money transfers and set more relaxed limits on government surveillance. It

also made attempts to remedy some of the fragmentation problems that existed within the

counterterror intelligence community. Information sharing requirements were laid out,

incentivizing the dissemination of collected data.

Three years after the signing of the PATRIOT act, Congress drafted a more

comprehensive bill to fix the major weaknesses within the US intelligence community.18 The

National Security Intelligence Reform Act of 2004 introduced sweeping changes in the process

and structure of counterterrorism intelligence. Most notably, it commissioned the creation of a

National Counterterrorism Center to serve as the central nexus for American counterterror

                                                                                                               17 United And Strengthening America By Providing Appropriate Tools Required To Intercept And Obstruct Terrorism (USA PATRIOT ACT) ACT OF 2001, 2001. http://www.gpo.gov/fdsys/pkg/PLAW-107publ56/pdf/PLAW-107publ56.pdf. 18 Intelligence Reform and Terrorism Prevention Act of 2004, 2004. http://www.nctc.gov/docs/pl108_458.pdf.

Page 17: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  13  

efforts. The stated mission of the center is: “To serve as the primary organization in the United

States Government for analyzing and integrating all intelligence possessed or acquired by the

United States Government pertaining to terrorism and counterterrorism.” It has the additional

responsibility of coordinating overall counterterror strategic operations for the United States by

“integrating all instruments of power, including diplomatic, financial, military, intelligence,

homeland security, and law enforcement.” However, this is in a purely managerial sense, as the

center is not responsible for any execution of counterterror operations.

The NCTC also integrates the staff of the 16 agencies that conduct counterterror

intelligence operations.19 It is close to 50% NCTC specific employees, including analysts,

managers and other staff. The other 50% are veterans of other parts of the IC that are in the

NCTC on loan for a few years at a time. This is meant to foster better inter-agency relationships

and give different personnel the opportunity to broaden the way in which they conceive of

counterterror intelligence operations. The hope is that these employees will no longer experience

cultural barriers between them and employees at other agencies, fostering better collaborative

efforts.

Overall, the NCTC is an agency that was created to offset the entrenched structural

deficiencies that have existed within the intelligence community for decades. Instead of tearing

down the existing structure and rebuilding an entirely new system, lawmakers created a central

nexus to which all relevant information must flow and where all long-term strategy decisions

must be made. It takes hard lessons – learned in the years picking up the pieces from 9/11 – and

turns them into policies that can be applied to the community as a whole. In essence, the NCTC

is a modern agency, meant to respond to the requirements of a modern intelligence environment.

                                                                                                               19 Nolan, 4.

Page 18: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  14  

Defining “Intelligence”: Evidence and the Knowledge Production Process Defining intelligence in a rapidly shifting information environment is difficult. On a basic

level, intelligence in this context refers to knowledge that is integral to effective tactical and

strategic decision-making at many levels of government.20 Though there is significant debate

among scholars about the formal definition of intelligence, the CIA weighed in with this

relatively concise description: “Reduced to its simplest terms, intelligence is knowledge and

foreknowledge of the world around us—the prelude to decision and action by U.S.

policymakers.”21 However, it is important to note that intelligence is not simply information

about a specific problem that needs to be addressed. It is knowledge that is produced as the end

result of an arduous intelligence creation process. This knowledge produced is a pre-requisite to

understanding the operational environment in which policy decisions will be made.

Intelligence production requires three main steps: evidence collection, analysis, and

synthesis.22 Evidence collection refers to the creation of data points for analysts to work with,

through a variety of means. The result of evidence collection is a pool of unstructured and

unlinked snippets of data that require further refinement. The next step, analysis, refers to the

process of resolving data points into fundamental parts in order to begin building up a contextual

model of the data. This step is important both for determining what the data can reveal, as well as

what it cannot. Analysis is also essential to establish what data points are missing from the

previously collected evidence. Finally, synthesis is the process of constructing feasible                                                                                                                20 Waltz, Edward. Knowledge Management in the Intelligence Enterprise. Artech House, 2003. Pages 16-18. 21 Warner, Michael. “Wanted: A Definition of ‘Intelligence.’” Journal of the American Intelligence Professional 46, no. 3 (2002). https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol46no3/article02.html#rfn7. 22Lazaroff, Mark. “Anticipatory Models for Counter-Terrorism.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006, 52-53.

Page 19: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  15  

explanations or solutions from the fundamental components that were produced from analysis.

This step creates intelligence, as it provides knowledge of a situation and its context, not simply

factoids describing it. Without this intelligence it is often impossible to produce sound policy

decisions.

Evidence Collection

Data is the raw material from which intelligence is formed.23 In order to construct

comprehensive intelligence reports, agencies must first collect as much relevant data as possible.

Failure to collect enough meaningful data can render a final analysis impossible or incomplete.

American counterterrorism intelligence agencies rely on many different methods for data

collection. Perhaps the most traditional method still in use today is the process of human

intelligence (HUMINT). The CIA is the primary producer of human intelligence in a foreign

setting, while the FBI conducts the majority of human intelligence domestically.24 Gathering

information through HUMINT entails human espionage through the use of spies. Traditionally,

the CIA would commission intelligence officers: spies who infiltrate foreign soil and make

contact with foreign government officials in order to extract information. While traditional

intelligence officers no doubt still work for American agencies, modern counterterrorism

intelligence requires a much more diverse set of agents that span both domestic, foreign, state,

and non-state actors.

Human intelligence is often pre-structured by its source and provides data that is useful

for its specificity and context-aware origins. For example, an informant in a possibly radicalized

group could provide detailed and relevant information on the future actions of that group. These

                                                                                                               23 Waltz, 19. 24 Margolis, Gabriel. “The Lack of HUMINT: A Recurring Intelligence Problem.” Global Security Studies 4, no. 2 (Spring 2013). http://globalsecuritystudies.com/Margolis%20Intelligence%20(ag%20edits).pdf , 45.

Page 20: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  16  

hints come from a person who is cognizant of the relevant factors that shape analytical

conclusions, and therefore the information requires less structuring by intelligence analysts. On

the other hand, human intelligence is difficult to implement and maintain, requiring extensive

planning, training and resources. It is also risky for the agents providing the information,

meaning the source can be cut off without warning. This makes HUMINT an exceedingly

valuable but often unreliable source of basic evidence for intelligence production.

Geospatial intelligence, commonly known as GEOINT is based upon collecting images

from various different optical, infrared, and radar sensors. The most used sources are high-

resolution satellite images and footage from unmanned aerial vehicles. Though it is primarily

used as tactical information for military operations, GEOINT is still useful for modern

counterterrorism intelligence purposes.25 It can provide nearly real-time tracking of uncovered

targets, giving analysts better indications of terrorist capabilities. It is also useful for determining

physical, not simply conceptual, relationships between different intelligence targets. The

downside of GEOINT is its subjective nature, requiring extensive inference from a viewer to

make any conclusions. The rooftop of a suspicious building or a group of vehicles moving in a

line can be helpful as supplementary pieces of evidence but are not necessarily noteworthy in a

contextual vacuum. Judgments based on imagery must therefore be made in conjunction with

conclusions from other sources of evidence.

Signals Intelligence (SIGINT) is the act of intercepting and decoding adversarial

communications, generally through electronic means. Hailing back to the days of World War II,

this mostly involved capturing radio waves sent between enemy military commanders.

Contemporary SIGINT now encompasses a significantly larger set of communications.26

                                                                                                               25 Margolis 47. 26 Margolis 48.

Page 21: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  17  

Everything from satellite messages to electronic information being sent on the open Internet can

theoretically be captured and treated as SIGINT information. The technical prowess required to

implement successful SIGINT operations is significant, and the National Security Agency (NSA)

employs an estimated 35-40 thousand people to devise systems of interception for the world’s

communications.27 As intelligence targets shift away from traditional military targets to

accommodate counterterror efforts, signals intelligence probes must be more diverse and

involved than they have ever been before. Capabilities have evolved to the point where major

constitutional objections have been raised about the extent of American signals collection.28

The value of signals intelligence lies in its unfiltered nature. Parties that are unaware of

any eavesdropping are likely to communicate candidly, often providing significant insights into

their intentions. A classic example of SIGINT interception is the capture of Osama Bin Laden’s

satellite phone communications in the late 1990’s.29 Foundational knowledge about Al ‘Qaeda

was generated from the content of his conversations with other ranking operatives.

Unfortunately, when he discovered that his calls were being monitored, he immediately dropped

his use of satellite phones, closing off an invaluable source of information. This example exposes

a weakness in SIGINT operations: an adversary must be using an electronically interceptable

method of communication for it to be of any value.

The aforementioned types of information collection are all methods that somehow

procure data from sources that were previously inaccessible. However, not all evidence is

intentionally hidden from view. Often, valuable pieces of information are openly available

                                                                                                               27 NSA. “NSA 60th Anniversary Book,” 2012. https://www.nsa.gov/about/cryptologic-heritage/historical-figures-publications/nsa-60th/, 3. 28 Greenwald, Glenn, Ewen MacAskill, and Laura Poitras. “Edward Snowden: The Whistleblower behind the NSA Surveillance Revelations.” The Guardian, June 11, 2013. http://www.theguardian.com/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance. 29 Margolis, 50.

Page 22: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  18  

through a variety of public sources. Using widely accessible information for intelligence

purposes is known as open source intelligence (OSINT).30 Through the compilation of data from

publications, databases, and other large unstructured datasets, counterterror analysts can create

detailed and comprehensive overviews of emergent targets and situations.

Potential open sources for intelligence data are plentiful. Analysts can read newspapers

and magazines from regions of interest, or access public information repositories that store

aggregate socio-economic statistics and records. They can even go as far as data-mining financial

transaction logs from the Internet, creating real time tools to monitor suspicious activity and

provide supplementary evidence to other investigations. Complex interactions between separate

sources may reveal insights that were not previously discernable. However, it is essential to note

that OSINT is not a replacement for other types of classified “procurement” methods of data

collection. Instead it provides a backdrop upon which more targeted information sources can be

placed.

The strength of open source data gathering, namely its breadth and contextual value, is

also one of its weaknesses.31 The open source world is an incoherent stream of data. If it is not

mined with skill and specificity, it can confound an analytical team, rendering them unable to

produce any meaningful conclusions from the information that they have collected. However, the

variety and breadth of collection must be large and diverse enough to ensure that enough relevant

information is being collected to adequately back analytical conclusions. Furthermore, the

structure of the data is often extremely heterogeneous. Intelligence departments must work

tirelessly to extract the information from whatever medium they are pursuing and make it

                                                                                                               30 Uffe, Wiil. Counterterrorism and Open Source Intelligence. Springer, 2011, 1. 31 Uffe,Wiil, 3.

Page 23: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  19  

consumable by a larger audience base. This includes acts of translation, content tagging, and

extensive summarization.

Information Technology and Knowledge Production

The information revolution of recent decades has precipitated major changes in the ways

that intelligence collection and analysis are undertaken. More specifically, the combination of

two separate but related trends – increased globalization and technological requirements of many

activities – is shaping modern approaches to counterterror intelligence.32

First, agencies can significantly improve the rate at which they collect vast amounts of

data from sources that vary in context, content, language, and geographic locality. With the aid

of computational techniques, agencies can accrue information at dizzying rates. The extreme

breadth of the Internet reveals massive opportunities for open source and signals intelligence.

Retrieval of an effectively unlimited amount of open data from nearly any geographic region

becomes simple and easy. Online communications and public Internet posts also have become

ubiquitous, even among clandestine organizations, allowing for extremely comprehensive signals

interception. The constitutionality of these actions, while important and requiring further

consideration, is not the focus of this thesis.

The recent technological acceleration even applies to more traditional forms of

intelligence gathering, such as human intelligence. Take, for example, a 16-gigabyte flash drive

(as of now, considered to be fairly small), which can fit in the pocket of any informant. This

                                                                                                               32 Choucri, Nazli, Stuart Madnick, and Michael Siegel. “Improving National and Homeland Security Through Context Knowledge Represenation and Reasoning Technologies.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006.  

Page 24: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  20  

flash drive can store slightly fewer than 10 million pages of text, significantly more than can be

jammed into a briefcase.

The truly difficult task in modern knowledge production comes after the collection phase.

Open source and Signals intelligence have been established as extremely unstructured and

difficult to make sense of. However, the real challenge may come from integrating all of the

aforementioned intelligence sources into a single knowledge repository. Human intelligence may

not be easily integrated with geospatial, and may differ greatly in structure from collected signals

intelligence. And yet, it is imperative that they be integrated in some way so as to build a

complete picture of an operating environment. Without doing so, intelligence operations would

simply create siloed data stores that are not cross referable.

To integrate intelligence data after it is collected, its meaning must first be resolved, then

structured appropriately, then collated, and finally integrated into an already existing knowledge

store.33 Given the size of the data being collected in a modern intelligence context, it is

impossible to expect analysts to be able to perform all of these steps on each piece of

intelligence. The tasks must be done at least semi-autonomously, using computational tools to

infer content, proper structure, and its relevance to existing data.34 Semi-autonomous refers to the

action of human verification of correct behavior by computational algorithms.

There are major complications to writing these algorithms, and they must be at least

somewhat context-aware as they resolve the meaning of diverse documents.35 Take, for example,

the challenges of language differences. An American human intelligence informant may submit a

comprehensive report on suspicious activity that is occurring in “Brussels.” An intercepted text

message (SIGINT) between two suspected French terrorists might mention a possible future

                                                                                                               33 Choucri, 145. 34 Choucri 141. 35 Choucri 141.  

Page 25: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  21  

attack in “Bruxelles.” Police surveillance videos (GEOINT) from Belgian police could capture

suspicious activity occurring in front of a specific location in the city of “Brussel.” Taken

together, these pieces of information could be critical for foiling a possible terrorist plot in the

city. And yet, the key piece of information that links them all is distinct for each source. A

human analyst could see this with ease, as they understand the language barrier context, and

could infer that they are all referring to the same city. However, a computer is not this smart, and

must be specifically programmed to make these types of inferences on its own. Computational

methods already exist for scenarios exactly like this but making them function well in the general

case is an extremely difficult engineering challenge.

It is important to note that what requires integration in the previous example is not the

data itself. Video footage cannot be “integrated” with an intercepted text message. What is

instead being integrated is the context. This is the required step for intelligence data to become

intelligence knowledge. Only once the data has been collected, resolved into its constituent parts,

and integrated with real-world context can it be of any real use for making policy decisions.

A complex skillset is required in order to actually perform analysis on these massive and

heterogeneous datasets. Analysts must be capable of using computational techniques for

managing, structuring, and integrating complex data. This requires a culture of competence,

where analysts iteratively improve their abilities in order to complete increasingly complex tasks.

They must continually evaluate both themselves and their methods in order to stay ahead in an

intelligence environment where the amount of valuable data is skyrocketing while the proportion

of useful information to total collected is shrinking rapidly.

Page 26: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  22  

Modern Counterterrorism Intelligence Paradigms The Limits of Intelligence

There is a common expectation by the public that intelligence collection and synthesis

should result in a certain level of omniscience. Many assume that given the afforded resources

and manpower, counterterror agencies should not be surprised by anything. However, it is not

possible to siphon up every piece of data possible, build a one-to-one symbolic map of the

universe, then bring retribution to every terrorist and militant, and prevent all future terror

attacks. Intelligence production must instead manage the complications of the modern

information environment by creating intelligent models to improve situational awareness and

reduce the probability of successful attacks.

It is therefore imperative to temper expectations looking back as well as going forward.

There is a tendency to blame intelligence failures for many of the attacks that Americans have

experienced, from Pearl Harbor to 9/11.36 Even though it may seem obvious in retrospect that

many opportunities to interdict the plot were missed, this may simply be a case of hindsight bias,

also known as creeping determinism. Though intelligence reform in the 1990’s may have made it

much more likely that the plot was stopped, it is impossible to prove either way. Many convince

themselves in the aftermath that there were objectively enough clues to “connect the dots,” and

stop the worst terror attack in American history. To do so would be to assume the omniscience of

intelligence efforts, and misunderstand its true purpose.

Traditional Intelligence Paradigms

                                                                                                               36 Lazaroff 59.

Page 27: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  23  

Intelligence is gathered in order to produce better contextual knowledge with which to

make policy decisions. Decision makers with knowledge of relevant factors and possible

consequences have a distinct advantage over those who operate in relative darkness. Their

strategic knowledge is broader and deeper, and their tactical tools are more refined. However, in

order to make hard choices, there have to be criteria upon which to judge the knowledge that is

generated during intelligence production. Models must be developed that can define the process

of transforming contextual knowledge into action.

Traditional intelligence paradigms developed during the Cold War era were based upon

the actions of Soviet counterintelligence and military divisions.37 These paradigms were formed

in response to observational data that had been collected over decades of experience with Soviet

grand strategy and tactics. The models focused on large-scale changes. Major events such as tank

division deployment, naval fleet movement, ICBM silo construction, rocket launches and trade

embargoes were used to make large-scale decisions. Once evidence of these events was found,

there was a defined process for moving forward with the investigation.

A classic example of the traditional intelligence model is the Cuban Missile Crisis of

1962. Geospatial intelligence from U-2 spy planes gave very clear indications of what the

intelligence community calls “signatures” of soviet long-range missile sites.38 First they noticed

the extremely wide roads, with massive circular turns designed to accommodate large missile

trucks. Next they noticed the classic triple fence perimeter that was instituted at every other

known Soviet long-range missile silo. From these signatures, they inferred that the Soviet Union

had constructed a missile-launching platform on the island of Cuba. They were right.

                                                                                                               37 Zegart, Spying Blind, 69. 38 Sagan, Scott, and Kenneth Waltz. The Spread of Nuclear Weapons: An Enduring Debate. New York, NY: W. W. Norton, 1995, 62.

Page 28: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  24  

In this example, intelligence analysts received geospatial data to analyze. They did not

have to integrate this information significantly with other forms of intelligence. Instead, they

simply applied the data that they had received to an empirically proven model. This model was

based upon the past observation of known missiles sites. Extensive knowledge of other parts of

the Soviet military or political structure was not necessary to come to a correct conclusion. The

knowledge production process was very straightforward: source à data à model à conclusion.

The overarching model that guided these intelligence efforts against the Soviet Union is

known as a reductionist model.39 This model decomposes a problem (or adversary) into its

constituent parts, where solving each part solves the “problem”. In order to function a

reductionist model requires strict causal relationships through every part of the process, creating

a chain of cause-and-effect relationships that fully explain a conclusion. This model assumes that

the final synthesis of intelligence is equal to the sum of its parts. An advantage to a reductionist

model is the awareness that it gives to gaps in information; it is often obvious what pieces are

missing from the puzzle. On the other hand, the strict causal nature of reductionism requires a

pre-defined model where data can be plugged in to produce relevant conclusions.

The attacks of 9/11 were an indication that the reductionist model no longer worked as

well as it had in the past, and that a major re-imagining of intelligence was required in order to

tackle a threat as decentralized and unpredictable as terrorism. Both the methods used to collect

intelligence and the models to which they were applied needed to be rebuilt for the modern era.

                                                                                                               39 Lazaroff, 55.

Page 29: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  25  

Modern Counterterror Intelligence Paradigms

Modern intelligence paradigms have begun to move away from static reductionist

models, and move toward dynamic emergent models.40 Emergent models eschew the assumption

that future outcomes are predictable simply by matching observable causes with their associated

effects. Reductionist models, in their aggressive decomposition of problems, isolate potential

causal properties from each other, eliminating the possibility for interaction. During the

combination phase, where the sub-problems are recombined back into the whole, the model no

longer matches reality. Because there is no room for interaction between the collected data

points, the overall conclusion can be interpreted incorrectly, or may be completely incoherent.

The shift from static to dynamic models is the result of a security environment that has

transformed from a complicated one to a complex one. Complicated systems can be modeled in

siloed manners, with many analysts working extremely deeply in one subject or another,

funneling their conclusions up a hierarchy to a central command. Complex systems have

properties that exhibit extensive co-evolution, where changes in one part of a suspected causal

chain can cause ripples through an entire security environment. For example, 20th century

intelligence analysts looking for possible expansions of communism into the third world did not

have to communicate extensively with analysts charged with tracking Soviet weapons

developments. These problems, while both connected to a single monolithic entity, were distinct.

On the other hand, modern counterterror analysts studying the border permeability between

Afghanistan and Pakistan must also monitor the weapons capabilities of militant groups in the

area, or at least be in constant contact with analysts who do. Changes in one area of interest can

cause major changes in the other.

                                                                                                               40 Lazaroff 56.

Page 30: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  26  

The shift from complicated to complex security problems requires emergent intelligence

paradigms because often no pre-defined pattern exists to which analysts can apply their data

points. They must instead dynamically monitor the situation, and synthesize an emerging pattern

from the observation of the interacting elements of the environment. This marks a move away

from traditional conceptions of intelligence predicting the future, to a model where intelligence

efforts are made to anticipate what future patterns might look like, based on data collected today.

It is no longer as simple as saying that past patterns will take on the same form as future ones.

Instead, the anticipatory paradigm avoids producing models that attribute cause where it may not

exist. Lazaroff refers to this phenomenon as “pattern entrainment,” where analysts repeatedly

rely on a certain pattern, leading to unpleasant surprises down the road.41

Applying this model to the 9/11 attacks can give insight on how predictions were so

wrong about the capabilities of Osama Bin Laden and Al ‘Qaeda. Counterterror agencies were

extremely focused on previously detected patterns of terrorist plots, namely bombings and

shootings.42 They collected extensive evidence and attempted to map it to previously attempted

plots in the United States and abroad. They were not spending nearly enough time looking for

types of attack that diverged from the traditional methods. There is no guarantee that this type of

analysis will stop an attack, but it can increase the probability of doing so.

In practice, implementing a system that handles emergent intelligence models is difficult.

It requires constant dynamism and iteration in the realms of collection, analysis, and synthesis.

Furthermore, the complexity of interaction between intelligence data points requires extensive

integration efforts of both information and analysis. Collaboration between all types of data

collection specialists and analysts is therefore required to conduct successful counterterror

                                                                                                               41 Lazaroff 57. 42 Zelikow, 9/11 Comission Report, 73.

Page 31: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  27  

intelligence operations. However, simply sharing data and conclusions is not enough. As

Lazaroff puts it: “the real value of collaboration is about sharing context (thinking), not data.”43

True collaboration reveals that intelligence conclusions are more than simply a sum of the data

points that support it.

Conclusion

Modern paradigms for counterterror intelligence modeling are no longer as focused on

pure prediction of the future. They exist more abstractly and dynamically, ready to adapt to a

dizzyingly large set of complex interactions of intelligence targets. Major strides have been taken

to develop methods and structures to support the new models of intelligence. Computational

advances in collection have allowed for larger datasets to be captured. Information technology

has bolstered capabilities in managing and structuring collected data. Though still in their

infancy, computer algorithms have also provided value in semi-autonomously analyzing and

synthesizing new knowledge out of structured information. These methods will be explored in

the following chapter.

The organizational changes that have plagued the intelligence community for decades are

obstacles to proper implementation of new intelligence generation methods. The traditional

structure cannot meet the technical requirements that the new environment needs. Similarly, the

collaborative elements of knowledge integration have been proven to be untenable in the multi-

agency setup. The National Counterterrorism Center seeks to remedy these issues, but given the

entrenched problems that continue to face the community, it faces quite a challenge

implementing a dynamic, modern paradigm of counterterror intelligence.

                                                                                                               43 Lazaroff, 61  

Page 32: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  28  

Chapter 2

A Novel Intelligence Toolbox: Computational Analytics in Practice

“There were five exabytes of information created between the dawn of civilization through 2003, but that information is now created every two days”

– Eric Schmidt, CEO of Google, 20101

The new information environment as encountered by contemporary intelligence analysts

has precipitated a new wave of computational tools. These tools are designed to accommodate

the massive amounts of data that is now collected on a daily basis. This chapter explains the

function and basic implementation of these relevant new technologies. Though they are

perpetually in development, computational methods already provide value in the intelligence

production process. Modern algorithms allow machines to ingest and understand both structured

and unstructured information sources. They allow for enhanced investigative breadth and depth

in the form of data management and comprehensive visualization tools. Finally, initial forays are

being made into predictive analytics, in which computers forecast changes in real-world models

that they construct from collected data.

The evolution of big data technology has significantly increased the feasibility of many

of these analytical techniques. The computational power that was out of reach for most

organizations ten years ago is becoming a reality today. Big data analytics are becoming

ubiquitous in both private sector and government agencies. There are especially major pushes in

                                                                                                               1  Schmidt, Eric. Presented at the Techonomy Conference, 2010, http://techonomy.com/tag/eric-schmidt/.  

Page 33: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  29  

the public sector to become more technologically advanced, and the barrier to entry is constantly

lowering.

However, as more organizations attempt to jump on the Big Data bandwagon, they may

be outpacing their ability to actually use it to its full potential. Relying on Big Data analysis is a

large departure from traditional forms of intelligence collection, counterterror or otherwise.

Tried-and true systems of communication, hierarchy, and even analytical culture may no longer

make sense in a data-oriented organization. While the first half of the chapter describes the

computational methods, the second half details the structural and organizational steps required of

taking full advantage of Big Data analytics. Its unique qualities often flip the classical

conceptions of management; meaning major departures from standard practice may be needed.

While the observations about organizational shifts are presented in a more abstract manner, they

are still relevant to multi-agency organizations such as the National Counterterrorism center and

will be applied directly to its structure in Chapter Three.

The growth of “Big Data”

As the information revolution accelerated through the 1990’s, organizations around the

world began to capture increasing amounts of data. The ever-growing speed of computation

coupled with plummeting data storage prices fueled an entirely new practice of data-collection.2

It seemed as if everything could be stored: financial transactions, medical records, historical data,

entertainment media, user profiles; effectively any task that was completed with a computer.

Nothing was too inconsequential to be captured for analysis, as it might be useful later.

                                                                                                               2 Manyika, James, Michael Chui, and Brad Brown. “Big Data: The next Frontier for Innovation, Competition, and Productivity.” McKinsey Global Institute, June 2011, l2.

Page 34: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  30  

The datasets that many private and public organizations had captured quickly ballooned

in volume. By the late 1990’s, these datasets had already surpassed the abilities of human

comprehension in both size and scope. By the early 2000’s, some massive data repositories had

surpassed the ability of most commodity hardware to process. Enterprises began to develop tools

to deal with a phenomenon that was just beginning to have a name: “Big Data.” So big in fact,

that traditional methods of storage and analysis were simply unable to manage it. It instead

required complex custom data-centers. The term Big Data does not (and likely never will) have a

specific meaning. Due to constantly improving computational capabilities, there is no strictly

defined threshold on size and no precise measure of required information complexity. However,

there are three factors that are inherent to any Big Data problem: volume, velocity, and variety.3

Volume corresponds to the amount of data that must be stored. It is generally agreed

upon that if it can fit into a single rack of servers (around 100 terabytes as of 2016), then it most

likely cannot be classified as Big Data.4 The rest of the chapter will use this definition because it

means that the data can only be stored using technologically heavyweight solutions. For

example, the modern datacenters of the Internet giants are massive. As far back as 2014, Amazon

was reported to have over two million separate servers, and though Google remains evasive on

providing a hard number, it is rumored to have far more.5 Velocity refers to the pace at which

more data is collected, requiring constant upgrades in storage capacity and sophisticated methods

to configure new data warehouses. The amount of new information being created on the Internet

per day can get quite large. For instance, as of 2015, Facebook logs over 4.75 billion posts per

                                                                                                               3 Mills, Steve, and Steve Lucas. “Demystifying Big Data: A Practical Guide To Transforming The Business of Government.” IBM, 2012. https://www-304.ibm.com/industries/publicsector/fileserve?contentid=239170, 2 4 Mills, 4. 5 Clark, Jack. “5 Numbers That Illustrate the Mind-Bending Size of Amazon’s Cloud.” Bloomberg Business, November 2014. http://www.bloomberg.com/news/2014-11-14/5-numbers-that-illustrate-the-mind-bending-size-of-amazon-s-cloud.html.

Page 35: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  31  

day.6 Each of these posts is logged with a specific user, time, content, and context. That is on top

of the 300 million photos, 4.5 billion “likes,” and 10 billion messages that Facebook also

processes daily. Finally, variety corresponds to the heterogeneous nature of data that is stored by

a typical data-oriented organization. On a basic level, there is the difference between tabular (i.e.

in rows and columns, such as in an excel document) and unstructured data. Unstructured data

could be anything: a news article, an image, or even a sound recording. The combination of these

three “V” factors makes Big Data extremely hard to store efficiently.

The three V’s of Big Data, while proving difficult to manage, can provide significant

advantages to organizations that can harness them.7 Large-scale storage and analytics can help

uncover patterns that are hidden in the data. Economic trends, unpredictable correlations, and

unforeseen interactions all come into focus when a larger snapshot of the information

environment is analyzed. In counterterrorism intelligence, analyzing massive datasets can

provide value that is found nowhere else.

Computational Methods in Counterterrorism

The first step in knowledge production is the collection and storage of information.

Counterterrorism intelligence can benefit from analyzing the vast amounts of data that comes in

through its various intelligence channels. Unfortunately, the volume, velocity, and variety of the

data make it impossible for analysts to structure by hand.8 This is an especially difficult task

given the real-time nature of counterterrorism. There is simply no way that an analyst or a group

                                                                                                               6 Ho, Kevin. “41 Up-to-Date Facebook Facts and Stats,” April 2015. http://blog.wishpond.com/post/115675435109/40-up-to-date-facebook-facts-and-stats. 7 Wegener, Rasmus. “The Value of Big Data: How Analytics Differentiates Winners.” Bain & Company, 2013. http://www.bain.com/Images/BAIN%20BRIEFThevalueofBigData.pdf. 8 Mills, 3

Page 36: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  32  

of analysts can stay on top of a data pipeline that comprises even a tiny fraction of a modern

system’s data collection capacity. In order to process the flood of data coming in, computational

methods for tagging and structuring are required.

A large amount of raw data without basic structuring or information retrieval tagging is

effectively useless.9 Having a database filled exclusively with text files (articles, books,

communications, etc.) is only marginally better than having hundreds of reams of papers sitting

in boxes. All the information that one might require is technically present but is not easily

knowable. Finding any specific piece of information in an unstructured pool of data requires

searching through everything, hoping to find a specific keyword or sentence. Relating one

snippet of an article to another is an arduous process and has to be done using multiple passes

through millions (or possibly billions) of files. Performing high-quality analytics on data stored

in this manner means slow and incomplete information retrieval, leading to lower quality

conclusions.

Incoming information must therefore be processed in ways that determine its meaning,

relate it to other pieces of data in the database and then store in an easily searchable format.

There are various techniques that are used to accomplish this, and all of them involve adding

significant amounts of metadata to each incoming piece of evidence.10 Analysts working with

properly structured data can quickly find relevant pieces of evidence using a variety of searching

methods. They can also discover new data, as relevant information can be presented that the

analyst did not necessarily know existed.

                                                                                                               9 Boschee, Elizabeth, and Natarajan Premkumar. “Automatic Extraction of Events from Open Source Text for Predictive Forecasting.” In Handbook of Computational Approaches to Counterterrorism, 1st ed. Springer Science, 2013, 51. 10 Schrodt, Philip, and David Brackle. “Automated Coding of Political Event Data.” In Handbook of Computational Approaches to Counterterrorism. Springer, 2013, 32.

Page 37: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  33  

However, Algorithms that can accomplish accurate tagging and information extraction

are exceedingly difficult to implement correctly. No group of engineers can think of every

possible way in which a certain idea may be expressed in natural language. There are too many

variables to consider and too many exceptions to rules. And yet, this impossible task is what they

are assigned to do. Many clever methods have been developed to build these meaning extractors.

The most popular approach is to use algorithms that teach themselves, or train, on how to

interpret vast amounts of information and extract meaning correctly.11 This process involves

running algorithms on a training dataset to learn the patterns and rules required for real-world

information extraction. The learned rules and patterns are then evaluated against a previously

unseen test dataset. Performance is gauged by measuring correct and incorrect extractions and

information tags. Generally, many iterations of training must occur for these tools to be deemed

adequate.12 In fact, knowledge extraction algorithms often are built with a combination of

engineering talent, and pure trial and error. Additionally, these algorithms are continually

evolving as data changes and new improvements are found. The resulting software products must

be powerful, highly mutable, and extremely fast.

Speed becomes a high-value objective because the information revolution has also given

rise to “real-time” data streaming. Real-time information is ingested from the outside world at

such rates that analysts can see extremely up-to-date versions of the data that exist.

Computational methods that provide rich information extraction, tagging, and at reasonably real-

time rates can give employees knowledge stores that would allow them to significantly

outperform those without it.13

                                                                                                               11 Boschee, 54. 12 Sharkey, Brian. “Information Processing at Very High Speed Data Ingestion Rates.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006, 3. 13 Mills, 7.

Page 38: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  34  

However, even the most powerful information preprocessor cannot fully bridge the gap

between humans and important data. The methods used for large-scale data storage use formats

that are completely incomprehensible to humans. Analysts cannot sift through and organize data

that is indexed (sorted) in machine-optimized ways. They need special programs that are able to

traverse and present the massive information sets that are stored within databases. Classic

programs used for data analysis include R and Stata; however these programs begin to lose

effectiveness as data sizes get extremely large.14 More advanced and specialized programs are

used to deal with massive and constantly changing big data projects.

These programs represent the second step in the computational intelligence generation

process. They allow the current information in the database to be knowable as opposed to simply

being present. They offer the ability to sort and aggregate, opening up opportunities for data

transformation and summarization. Analysts can visualize and reshape the data, finding links that

the automatic preprocessors may have missed. They can also concretize the abstract patterns that

exist in the collected information store.

A classic example of this type of concretization is computational network analysis.15 In

an abstract sense, social networks are easy to conceive of, but difficult to visualize in their

entirety. Relationships between entities vary in type, strength, and direction, and as more entities

are added, complexity grows quickly.16 Formal analysis of these networks has existed for

decades, but often is limited in scope and dynamism due to older methods. Networks of

thousands of people were stored as matrices or even using pen and paper. Modern software

packages allow for these networks to be mapped and presented to the user in more complete and

                                                                                                               14 Pavlo, Andrew. “A Comparison of Approaches to Large-Scale Data Analysis.” Paper presented at ACM SIGMOD International Conference on Management of data New York, NY, 2009. http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf, 2. 15 Subrahmanian, V.S. Handbook of Computational Approaches to Counterterrorism. Springer Science, 2013, 3 16 Everton, Sean. Disrupting Dark Networks. Cambridge, UK: Cambridge University Press, 2012, 3.

Page 39: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  35  

coherent ways. The networks are displayed in two and three-dimensional spaces. Links are easily

created, hidden, changed, and destroyed. Entities can have helpful metadata attached to them,

and various graphing algorithms can even produce structure in networks that seem chaotic. These

tools enable the mapping of previously incomprehensible networks, providing useful context on

various terrorist groups.17

Powerful software can even provide tools that find “hidden” links within networks, given

a set of other relevant factors.18 Similar to the way in which a company like Facebook might

present a list of “suggested friends,” a counterterror approach may present a list of possible

associates. This is necessary because data that allows for the modeling of terrorist social

networks is notoriously lacking in breadth and consistency, leading to incomplete representations

of reality. In the past, using existing social ties to determine possible missing links was

cumbersome and impossible to do at scale. New software that uses machine learning to train on

network patterns can perform prediction on thousands of links simultaneously.19 While not

perfectly accurate, these approaches can give clues to analysts on which directions to further

investigate.

Network analysis is only one example of an entire suite of computational tools that

analysts might have at their disposal. There are a number of ways in which software developers

have enabled the dissection, investigation, and visualization of data. These tools can help

analysts make better investigatory decisions.

There is room for more involvement from algorithmic analysis in the intelligence

community. There are movements that are pushing for a far more automated process for the

                                                                                                               17 Everton, 12. 18 Fire, Michael, and Rami Puzis. “Link Prediction in Highly Fractional Data Sets.” In Handbook of Computational Approaches to Counterterrorism. Springer, 2013, 3. 19 Fire and Puzis, 292.

Page 40: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  36  

entire knowledge production process.20 These take the form of end-to-end prediction algorithms.

Currently, terrorist prediction methods are still in their infancy. The operational environment is

exceedingly complex, and the methods of collection and analysis have not yet reached a level

where definitive structure can be created. Nevertheless, some researchers are performing

experimental research in this field and are producing computational models that can allow for

automated event prediction.21 These models involve structuring text and events into data chunks

that can be processed quickly by computers. The algorithms then train on the processed events

and produce models that theoretically allow for future event prediction.

The effectiveness of experimental techniques notwithstanding, computational methods

provide the ability to accommodate the growing amounts of data that counterterror analysts face.

Properly applied, they can significantly improve the investigative capabilities of employees.

However, the technical systems that can store the required volume of data run the algorithms are

not exactly straightforward to set up.22

Technical Requirements for Big Data success

The foundation of any computational platform is the hardware that it runs on. Big Data

systems are assembled using a combination of many different hardware building blocks. The

most fundamental block of these systems is the server (computer). It provides the storage,

                                                                                                               20 Mannes, Aaron. “Qualitative Analysis & Computational Techniques for the Counter-Terror Analyst.” In Handbook of Computational Approaches to Counterterrorism. Springer, 2013, 84. 21 Sliva, Amy. “SOMA: Stochastic Opponent Modeling Agents for Forecasting Violent Behavior.” In Handbook of Computational Approaches to Counterterrorism. Springer Science, 2013. 22 Mills, 8.

Page 41: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  37  

networking, and processing power that houses and parses the data for the system.23 In large-scale

data clusters, these servers can be multiple times more powerful than the average personal

computer. However, in the context of massive datasets, a single machine is a drop in the bucket.

As mentioned earlier, it takes millions of machines to provide the computational power required

of many large Internet companies. This method of utilizing multiple computers to execute tasks

in parallel is known as “distributed computing”.24

The creation of a modern data cluster from these fundamental building blocks is far from

an easy task. It requires herculean efforts from experts in many different fields. First, datacenters

use an enormous amount of power and produce huge amounts of heat. They require specialized

infrastructure that can handle the electricity requirements of both running computers and cooling

them. Furthermore, given the real-time nature of modern intelligence, there must be multiple

redundant systems to ensure that the datacenters do not fail for any reason. For example, most

datacenters have backup generators to ensure service even when the power goes out. Second,

these centers are far more than simply stacks of computers on shelves. They are highly complex

networks of machines, all interconnected in ways that are optimized for the reliability of

networks and the laws of physics. The science of building datacenters has been developed over

decades, and it becomes more complex every year.25

The market for computers evolves extremely quickly. According to Moore’s law,

compute power in new processors doubles every 18 months.26 This is great news for data

scientists, but a headache for datacenter architects. While data scientists reap the rewards of

                                                                                                               23 Singh, Arjun. “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network.” Paper presented at ACM Sigcomm. London, UK, 2015. http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43837.pdf, 14. 24 Valacich, Joseph, and Christoph Schneider. “Managing the Information Systems Infrastructure.” In Information Systems Today: Managing in the Digital World, 2013, 129. 25 Singh, 3. 26 Valacich, 134.

Page 42: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  38  

faster processors, datacenter architects must deal with computer clusters that are perpetually in

the process of becoming obsolete. Clusters must therefore be built in modular ways that facilitate

constant replacement of machines to keep up with contemporary processing speeds.

Given these difficulties, maintaining a modern datacenter would be out of reach for all

but the richest and most tech-savvy organizations. Recently, however, large strides have been

made in the private sector to provide data center solutions for less capable enterprises (including

the government). The data-management industry has gotten so large that open-source software

has been created to manage clusters, making it much easier to both get started and manage

compute clusters. 27 This software is created as a collaborative effort in the community and is

completely free to use, even in a commercial context. While the technical capabilities required to

actually build a center remain exceedingly high, the access to these skills has increased

dramatically, putting big data analytics in the reach of many more organizations and significantly

reducing the barrier to entry to create big data infrastructure.

These services have become so comprehensive that some massive companies are moving

the entirety of their operations to cloud services provided by other enterprises. For example,

Netflix, which at times can account for thirty percent of American Internet traffic, delegates the

entirety of its hardware infrastructure to Amazon’s cloud services.28 Netflix’s delegation of

responsibility to Amazon for its cluster management allows it to focus on its core services as

opposed to putting significant resources into its own hardware infrastructure.

Government agencies have the same access to these types of private sector contractors,

meaning that even though they do not have comparable technical capabilities, they are still able

                                                                                                               27 “Apache Hadoop,” n.d. http://hadoop.apache.org/. 28  Meyer, Robinson. “The Unbelievable Power of Amazon’s Cloud.” The Atlantic, April 23, 2015. http://www.theatlantic.com/technology/archive/2015/04/the-unbelievable-power-of-amazon-web-services/391281/.  

Page 43: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  39  

to field large compute clusters for their needs. The question is no longer whether they can build a

datacenter effectively, but instead whether they are able to use it.

Organizational Requirements for Big Data Success

Simply possessing the requisite technical capabilities of running a functional big data

operation is not enough to generate valuable, data-driven insights. The fastest, most powerful,

and most intuitive tools are of no use if an organization is not set up in a way to take advantage

of them. The machines and algorithms on their own will not produce any meaningful

intelligence; knowledge is still only relevant in the context of people and the decisions that they

make. Big data analytics therefore cannot be integrated without a focus on the humans that it

affects at every level of the organization. There have to be structures and processes in place that

allow for the seamless use of big data with other activities. Without them, there may be

fragmented and inconsistent application of analytic capabilities, leading to mismatched goals and

conclusions.29

There are innumerable organizational factors that are relevant when considering the

overall effectiveness of an organization. The majority of these will not be addressed. However,

the organizational requirements of deploying specifically big data operations can be separated

into four main categories:

1. Commitment to big data analytics 2. Open information sharing environments 3. Feedback channels and iteration 4. Engineering talent and culture

                                                                                                               29 Galbraith, Jay. “Organization Design Challenges Resulting From Big Data.” Journal of Organization Design 3, no. 1 (2014), 3.

Page 44: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  40  

Failure to sufficiently meet the requirements of these factors can significantly reduce the

effectiveness of big data infrastructure. Each step of the intelligence generation process relies on

specific organizational conditions to ensure that they are being completed correctly. Poor

organizational structure can result in decreased knowledge caliber and scope. In fact, the

expensive computational architecture can have a negative impact on the organization by sucking

away money and human resources on fruitless pursuits. Furthermore, none of these factors exist

in a vacuum and the implementation of one requires consideration of the others as well.

Organization-Wide Commitment to Big Data

The first step to creating a big data optimized structure is to understand and frame the

goals of the organization in the context of large-scale analytics.30 Implementing a massive scale

big data infrastructure is a huge undertaking, and the design process requires significant

forethought. Operational requirements must be strictly defined, and each module of software and

hardware must be built for a purpose. Building complex software products is notoriously

difficult, and having poorly defined capabilities can lead to massive headaches down the road. In

fact, it has been acknowledged for decades that the planning phase should actually take up the

largest proportion of time for software development, even over the actual implementation and

testing of the software.31

Lack of adequate planning has proved disastrous for government software projects in the

past. For example, the FBI’s “Trilogy” information technology modernization efforts went

catastrophically wrong throughout the entire process. In 2004, after years of development and

nearly half a billion dollars invested, the FBI’s “modernized” virtual case file (VCF) system was

                                                                                                               30 Mills, 7 31 Brooks, Frederick. The Mythical Man Month. Addison-Wesley, 1974, 20.

Page 45: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  41  

still mostly non-functional and had almost no buy-in from analysts.32 The undertaking was such

a mess that the FBI requested the assistance of the National Research Council (NRC) to

determine the cause for the failure.33 After a thorough investigation, the NRC determined that

among the many faults, the major problem was a lack of understanding of the operational

requirements of the VCF system. When the project hit difficulties inevitable in any large project

and the system became more complicated, there was no adequate plan to keep the project moving

forward. As the project became more derailed, implementation became more haphazard until it

came apart at the seams.

Even when the software project does not completely fail, bad decisions made in the

planning and early implementation phases can have extremely negative consequences down the

road. Inconsistent design choices often cause mismatches between system components and

provide major barriers to the extensibility of the product. This is a well-explored concept in the

software industry and is known as “technical debt.”34 Opportunities to cut corners to save on

time or money incur debt that must be repaid later in the form of extra labor. Features that were

hastily implemented may need to be modified to connect properly with new components.

Complex documentation that was inadequately compiled has to be edited or even rewritten.

These debts often come with interest, meaning they can take longer to rectify than it would have

taken to simply do it correctly the first time. Sometimes architectures are so poorly conceived

that they cannot be modified and must be completely redesigned.

                                                                                                               32 Knorr, Eric. “Anatomy of an IT Disaster: How the FBI Blew It.” InfoWorld, March 21, 2005. http://www.infoworld.com/article/2672020/application-development/anatomy-of-an-it-disaster--how-the-fbi-blew-it.html. 33 Lin, Herbert, and James McGroddy. “A Review of the FBI’s Trilogy Information Technology Modernization Program.” National Research Council, 2004. 34 McConnel, Steve. “Managing Technical Debt.” International Conference on Software Engineering, 2013. http://2013.icse-conferences.org/documents/publicity/MTD-WS-McConnell-slides.pdf.

Page 46: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  42  

Once the big data system is operational, it must still be integrated effectively into daily

activities. In order to do this, the organization as a whole must be committed to the integration of

computational analytics at every relevant level.35 Data-driven analytics must be seen as one of

the essential functions of the organization, equal to the many other essential functions, such as

intelligence collection or administrative support. However, there is a key difference between

these traditional functions and data science. Data amplifies the effectiveness of other functions of

the enterprise while being of significantly less utility in a vacuum.36 Without organization-wide

commitment, data-driven analytics can be sidelined in a bureaucracy whose inertia pulls its daily

workflows in directions that do not include data science.

There are many ways to accomplish this, but nearly all of them involve enhancing the

bureaucratic influence of data analytics. This extends all the way up to the executive level (c-

suite).37 Some organizational theorists posit that a truly committed big data organization must

have a Chief Digital Officer (CDO), or some managerial equivalent.38 Just as Chief Financial

Officers (CFO’s) are responsible for organization-wide finances and Chief Operating Officers

(COO’s) are responsible for organization-wide operations, CDO’s should oversee the digital

analytics at every level. This gives data science a representative at higher levels of management,

bolstering the integration process. Without a high-ranking advocate, data scientists may find

themselves unable to get the resources and credibility that they need to effect real change.

However, the shift in influence should not be relegated to the top levels of the

organization. Those who are making important decisions are often not the ones generating the

insights from big data. A function of committing to big data analytics is imparting additional

                                                                                                               35 Galbraith, 3. 36 Galbraith, 12. 37 Grossman, Robert. “Organizational Models for Big Data and Analytics.” Journal of Organization Design 3, no. 1 (2014), 21. 38 Galbraith 4.

Page 47: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  43  

agency to data scientists to make their own analytical choices and letting them influence larger

organizational decisions. Just as data scientists need an executive championing their insights,

they also need their own expanded powers in order to be truly embedded in an organization.39

Integrating large-scale data analytics into well-established bureaucratic workflows is a

challenging and involved process. It requires comprehensive and precise planning, an overall

commitment to the use of computational analytics, and a delegation of power to lower-level

analysts. Without making these changes, the existing bureaucratic inertia can push data science

to the fringes of usefulness, all but guaranteeing that it remains a highly specialized and only

marginally useful tool.

Collaborative Information Environments

The modern information environment is changing rapidly, and new analytic processes

must be used to tackle it. As explored in Chapter One, the heterogeneity and size of modern data

means that any single piece may be relevant for many different analytical investigations. The

growth of open source and signals intelligence sources dictates that the complexity of analytical

workflows also grows.40 The interconnectedness of information and increasing globalization

mean that many analytical conclusions must also draw on a larger body of data in order to be

relevant. Furthermore, as mentioned in the previous section, when analytics becomes embedded

in traditionally non-technical departments, data scientists no longer work in a single “big data”

group, but instead with the teams in which they are integrated. This has the potential to isolate

data scientists from each other.

                                                                                                               39 Grossman, 23. 40 Choucri, Nazli, Stuart Madnick, and Michael Siegel. “Improving National and Homeland Security Through Context Knowledge Represenation and Reasoning Technologies.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006, 155.

Page 48: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  44  

These broadening information sets and changing organizational structures must be offset

by increased collaboration between departments and agencies.41 Traditionally, teams produce the

work that they are assigned and then report it up the chain of command. However, instead of

simply reporting their conclusions vertically in the hierarchy, they should be sharing their

processes horizontally with other teams. This allows them to hone both their datasets and their

methods. The collaborative effects go both ways, and teams that help others also often find they

are performing at a higher level due to increased horizontal access.

The improved performance is brought about by what Grossman terms a “critical mass” of

data scientists.42 He defines critical mass as the required number of employees whose combined

skills encompass what is needed to derive insights. A large number of employees becomes

necessary because each individual data scientist and analyst might only possess a small subset of

the domain knowledge required to create a full picture from the provided set of data. In the

intelligence fields, these domains exist in two categories: analytical and technical. There must be

enough analysts and data scientists that are able to adequately manage big datasets (technical),

while at the same provide meaningful insights (analytical). These two categories have their own

domain spaces in which employees have even more specific expertise. An analytical problem’s

required skillset can span the specialties of analysts from multiple teams or even agencies, and

isolating employees from each other shrinks the available talent pool.

Avoiding isolation involves creating an information environment that is based around

openness and collaboration. To the extent that it is possible (private and proprietary data can

cause issues), data should be made available to all analysts and data scientists. Beyond simply

data, organizational incentives must exist for teams to collaborate with each other. Pitting teams

                                                                                                               41 Sukumar, Sreenivas, and Regina Ferrell. “‘Big Data’ Collaboration: Exploring, Recording and Sharing Enterprise Knowledge.” Journal of Information Services and Use 33, no. 3 (July 2013), 259. 42 Grossman, 21.

Page 49: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  45  

against each other, or preferring one piece of analysis over another can produce competitive

effects that reduce collaboration. There should be systems to encourage camaraderie across

teams, even in an informal sense. In fact, a meta-analysis of organizational studies has found that

employees often perform better when they are constantly in contact with people outside of their

team due to constant exposure to new ideas.43

Beyond just making data available, concrete steps must be taken in order to produce a

structure that facilitates effective collaboration. The first focuses on the structure data itself. Each

team may collect and structure its data in a unique way, especially if the nature of the

information is distinct from that of other teams. Having knowledge of one’s own data structure

and content is what Sukumar calls “Domain Knowledge.”44 Lack of domain knowledge about a

specific data store can stupefy even the best analyst. It is therefore necessary to impose structural

requirements for each team to document the format of its own data and expose this

documentation to the rest of the organization. This is done in two ways. First, a team must

explain how its data is stored. What technology is used? What are the names of different

categories in the store? What are the sources for this data? Second, a team must explain why the

data is stored the way it is. Without this crucial information other teams may have issues

integrating it into their own analytic workflows and run into problems that have already been

solved by other teams.

Domain knowledge also extends past the syntactic representation of the data and into the

semantics of the data. The data semantics are the actual knowledge content of the data, not just

the way in which is formatted. What context does the data exist in? What conclusions have

already been found? What conclusions have not been found? Requirements for a general

                                                                                                               43 Owen-Smith, Jason. “Workplace Design, Collaboration, and Discovery,” 2013. http://sites.nationalacademies.org/cs/groups/dbassesite/documents/webpage/dbasse_085437.pdf 44 Sukumar, 260.

Page 50: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  46  

semantic description of a dataset can be exceedingly helpful for outside teams. Otherwise, they

may have to re-do analytical work that has already been done. Requiring semantic

documentation can improve collaborative efforts and also help teams understand their own data.

There must be strict requirements about effective horizontal communication, data

sharing, and documentation. The system for collaboration and sharing should be clear, and

analysts should be encouraged to reach out horizontally across the organization. Left without

specific processes and structures that abet communication, analytic teams may flounder as they

find themselves up against problems that they either do not have the data or the expertise to

solve.

Feedback Loops and Iteration

Building software and analytical models is difficult. It is an attempt to map an

exceedingly complex real world onto a digital representation that can be used to find novel

insights. Even with clearly defined goals of operation, creating well functioning tools and

accurate models is an undertaking that cannot be done all at once and must be executed in many

iterative steps. The goals of a big data enterprise must take into account the gradual nature of

software development and define checkpoints along the way.45

In his seminal work, No Silver Bullet, Frederick Brooks (a Turing Award recipient and

former Director of Engineering at IBM)46 likens large projects to organic processes – they grow

and evolve over time and should be functional at nearly every stage of the process. As tools are

                                                                                                               45 Mills, 28. 46  Frederick Brooks is considered one of the fathers of modern software development. His works on the organizational theory of software development earned him the Turing Award in 1999. The Turing Award is the most prestigious award in the field of computer science. Brooks, Frederick. “No Silver Bullet -- Essence And Acccident in Software Engineering.” Univeristy of North Carolina at Chapel Hill, 1986. Available at: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1663532.

Page 51: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  47  

developed, they must be evaluated, then augmented, and eventually expanded in scope. Attempts

to reach end goals in a single iteration will often lead to overly complex solutions that tend to

collapse under their own weight.

Analytical goals function in a similar way. Models start with smaller, more specific

focuses, which then broaden as the methods are honed and improved. However, big data

analytics solutions must be even more dynamic and capable of change than infrastructural

software. While traditional product development has to respond to market trends, analytical

focuses can change on a daily basis, with new types of data, techniques, and questions coming in

constantly.47 Any data analytics team that is unable to adapt quickly to changing information

environments will not perform well.

In a traditional hierarchical structure, achieving effective iteration becomes much more

difficult. When managers set tasks for lower level employees, they expect their goals to be

achieved as they conceived of them. Unfortunately, in an iterative environment it is impractical

to have a one-way channel of communication for task delegation.48 Managers do not have a

monopoly on knowledge of the models being deployed and therefore should query for feedback

from the lower level data scientists on their thoughts for next steps. Having a structure in which

analysts and data scientists simply execute tasks that are assigned to them from above silences a

major source of domain knowledge and hobbles the iterative process. Querying for feedback

from all levels of the organization about problems and possible new directions allows for more

effective iterations.

On the other hand, hierarchy does have its uses, and eventually information does have to

be funneled up to important decision makers at higher levels. Director level choices must be

                                                                                                               47 “Fisher, Danyel. “Interactions With Big Data Analytics.” Interactions, June 2012, 52. http://dl.acm.org/citation.cfm?id=2168943 48 Galbraith, 7.

Page 52: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  48  

made after a careful combination of lower level analytical conclusions. Even with open

information sharing environments, most analytical teams are not able to see the entire picture put

together by all teams in the organization and are therefore unable to see how they are performing

in their tasks. Are their areas of focus relevant to the organization’s goals? Are their models

providing actionable insights? Are their conclusions promoting correct decisions? Data scientists

require constant feedback on their performance from above. Without it, the next step in the

iterative process is difficult to determine.

The analytical requirements of an organization are constantly evolving and must

therefore be approached with a structure that abets dynamic and additive changes. Two-way

vertical communication between different levels of the organizational hierarchy must be

implemented. Otherwise both managers and data scientists will not get the feedback that they

need in order to determine the next steps for their work.

Analytic/Engineering Talent and Culture

Regardless of the powerful technology that is in place and the expertly crafted

organizational processes that have been constructed, in the end, an organization is only as

effective as its employees. They will be the ones leveraging the technology to create actionable

insights. A good structure can help employees do their jobs better but in the end it cannot do

their jobs. However, structure can improve the quality of the work of analysts and data scientists

in the organization. Strategies that attract talent, leverage it correctly, and retain it long-term will

pay dividends, even if they are difficult and expensive to implement.49

                                                                                                               49 Davenport, Thomas, and D.J. Patil. “Data Scientist: The Sexiest Job of the 21st Century.” Harvard Business Review, October 2012.

Page 53: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  49  

It seems obvious that better data scientists will produce higher quality work in a smaller

amount of time. Those who have superior training, more experience, and are generally more

intelligent can respond to evolving scenarios in more effective ways. They can see paths to

insights where others might have missed them and can reveal patterns in data that seemed

impossible to find. One factor that still separates data scientists from other types of analysts is

their ability to code. They must still be engineers and should be evaluated as such. However,

what is not obvious how much better one engineer can be over another.

Effectiveness is a difficult measurement to quantify. It is still a debated topic of study,

but current research suggests that so-called “great” engineers can be entire orders of magnitude

better than simply average ones on various metrics.50 Studies have found that the best

programmers can code nearly twenty times faster than average, debug twenty five times faster,

and write programs that execute ten times faster. They are also capable of building solutions that

average engineers are simply unable to conceive of.51 On the flipside, incompetent engineers can

actually provide negative progress in the form of technical debt due to poorly written code.

Unfortunately, simply adding more engineers to a team cannot be a replacement for a

lack of “great” engineers. While having more team-members can help speed up projects and

improve their quality (especially when attempting to reach a critical mass), simply adding

employees can have diminishing returns. In fact, in his famous book, The Mythical Man Month,

Brooks claims that adding more engineers to an already late project often can make it even later,

more disorganized, and of lower quality.52 The structural challenges that arise from this

phenomenon thus become: how can an organization attract and retain “great” employees?

                                                                                                               50 McConnel, Steve. “Origins of 10X – How Valid Is the Underlying Research?,” n.d. http://www.construx.com/10x_Software_Development/Origins_of_10X_%E2%80%93_How_Valid_is_the_Underlying_Research_/. 51 Brooks, The Mythical Man Month, 32. 52  Brooks, The Mythical Man Month, 25.

Page 54: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  50  

Furthermore, how can it provide an environment in which these employees will realize their

potential?

First, there must be incentives for data scientists to join an organization in the first place.

In the still nascent world of big data analytics, experienced talent is hard to find, and companies

compete ruthlessly in the big data space.53 Basic rules of supply and demand drive salaries up

significantly, and most data scientists start well above six figures, even in entry-level positions.

With so many options, data scientists are not simply looking a job; they are looking for the

highest paid offers, doing the most meaningful and most interesting work. Furthermore, given

that they have a high degree of choice, they can afford to demand an influential role on the team

that they are joining. Will their suggestions be listened to? Will they be given opportunities to be

independent and take control of their tasks? No one likes being a second-class employee.

Once hired, engineers must be incentivized to work hard and excel at what they do.54

Clear paths of advancement in both pay and responsibility must exist. Without these paths,

engineers may not feel the desire to exceed the demands of their position, causing the quality of

their work to stagnate. At best, a lack of motivation can cause employees to produce sub-par

results. At worst, it can create incentives for employees to look for opportunities at other

organizations.

Retaining motivated engineers for longer periods of time also improves their overall

effectiveness. Not only do they gain domain experience in their specific field, they also gain

institutional knowledge: an understanding of the processes of an organization.55 This improves

the efficiency of their work as they have an easier time navigating the idiosyncrasies of their

respective team that may stymie a junior engineer.

                                                                                                               53 Davenport, 3. 54 Eccles, Robert. “The Performance Measurement Manifesto.” Harvard Business Review, February 1991, 135 55 Sukumar, 262

Page 55: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  51  

Finally, as outlined in the section on organizational commitment to big data, data

scientists need the freedom to be entrepreneurial in their pursuits.56 Micromanagement and strict

delegation of tasks and workflows is detrimental both to the working environment and to the

overall effectiveness of the data scientists. Obviously there must be an overarching strategic

theme, but engineers should have the freedom to pursue their own leads in pursuit of that goal.

As they explore, they find novel solutions that may not have been discovered had they stuck to

the beaten path of established protocol.

Overall, engineers, and analytical employees in general, are resources that must be

recruited and retained through organizational decisions. Poor structural and procedural layouts

can provide disincentives for talent to join the organization, or stifle the talent that already works

there. Agencies that are attempting to answer some of the world’s most difficult questions,

working with some of the most obfuscated data sets, need quality analysts. Without them, they

will no doubt fail in their mission.

Conclusion

The growth of big data has fundamentally changed the way in which many organizations,

including counterterror agencies, approach analytics. Over time, the techniques used to structure

and analyze these massive datasets have been refined. Novel approaches to information tagging,

visualization, and even prediction have been developed. The technical requirements for

implementing these algorithms in practice remain high. Datacenters are still extremely difficult

and expensive to build, maintain, and upgrade. However, the evolving big data market has

                                                                                                               56 Newport, C.L., and D.G. Elms. “Effective Engineers.” International Journal of Engineers 13, no. 5 (1997), 331

Page 56: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  52  

provided solutions that significantly lower the barrier to entry on big data analytics, shifting the

challenges from technical to organizational ones.

In an emerging field, the correct ways to integrate large-scale data analytics into

organizations, let alone multi-agency government ones, remain on the cutting edge of

organizational theory. Traditional views on organizational structure must be challenged in order

to fully leverage the power of big data. Strict hierarchies and top-heavy power structures need to

be eschewed in favor of more distributed systems of influence. Cross-team informational barriers

need to be brought down and incentives to share and collaborate need to be put in place.

Similarly, communication between different levels of the structural hierarchy should be

prioritized as necessary objectives. Finally, recruiting and retaining top engineering and

analytical talent should become a top goal for any organization hoping to derive meaning from

big data.

These organizational structures are much easier to describe in theory than to implement

in practice. Most organizations that require this level of structure have huge amounts of

organizational inertia that stifle changes, especially large ones as proposed in this chapter.

Viewing these organizational requirements through the lens of intelligence agencies makes the

changes even more difficult. The 9/11 Commission found that the Intelligence community’s

structural reform attempts floundered for a decade before 9/11. Though the National

Counterterrorism Center was established as a new organization in order to develop these

structures from scratch, it remains to be seen if it has been successful in its endeavor.

Page 57: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  53  

Chapter 3

Organizational Successes and Failures of the NCTC

New data environments and the information revolution have brought new operational

requirements for the Intelligence Community. The National Counterterrorism Center was

established as a way to create a structure that was better equipped to produce intelligence in this

modern, complex environment. However, NCTC’s existence does not necessarily signal the end

of the community’s information woes. Integrating data and generating actionable insights using

modern information technology are daunting organizational tasks, and the traditional structures

of the Intelligence Community are deep-seated.

This chapter first explores the positive contributions that the NCTC has made to the

community. It has brought about many constructive changes to the way that intelligence is

generated, especially with respect to interagency communication and information sharing.

Additionally, its existence has solidified the modern paradigm of counterterror intelligence,

which embraces complexity and deep cross-referential analysis.

However, despite the positive changes that it has made, the NCTC still exhibits many

organizational flaws, some being nearly identical to those that motivated its creation. This

chapter investigates these issues in two distinct ways. First, it highlights specific instances in

which the NCTC, in coordination with the larger Intelligence Community, has been found to fail

in its mission. It walks through the specific failures and relates those to the organizational issues

that they either cause, or demonstrate exist. The two chosen points of failure are the 2009

Christmas Bomber and the currently unfolding foreign fighter crisis.

Page 58: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  54  

Finally, the investigation moves toward more implicit flaws that have not necessarily

contributed directly to the concrete failures outlined above. These flaws are found both within

the internal structure of the NCTC and in the relationships that it maintains with other agencies

in the IC. Furthermore, in the context of Big Data analytics, the NCTC provides many barriers to

the implementation and use of computational techniques, choking off sources of innovation for

the modern era. The existence of these flaws demonstrates that the NCTC is far from fulfilling its

mission of fully integrating all-source intelligence.

Successful components of the National Counterterrorism Center’s Structure Immediately following the events of 9/11, a major investigation was launched in order to

determine why the attack had come without warning. How had the United States intelligence

community (IC), the most capable in the world, completely missed a plot this ambitious? After

two and a half years of dedicated effort, a team of nearly fifty people produced The 9/11

Commission Report: Final Report of the National Commission on Terrorist Attacks Upon the

United States.1 Through extremely extensive work the team managed to trace a path through

millions of documents and decisions in multiple agencies, pinpointing the exact moments at

which major failures occurred.

The report allowed for specific structural components of the IC to be analyzed. For years,

leaders in the community had been pushing for reform.2 Dozens of reports produced hundreds of

structural and analytical suggestions to advance the IC into the modern world. The

recommendations had gone largely ignored for a decade. However, the result of the report

                                                                                                               1 Zelikow, Phillip. “The 9/11 Commission Report: Final Report of the National Commission on Terrorist Attacks Upon the United States” July 22, 2004. https://www.gpo.gov/fdsys/pkg/GPO-911REPORT/content-detail.html. 2 Zegart, Amy. Spying Blind: The CIA, the FBI, and the Origins of 9/11. Princeton, NJ: Princeton Press, 2007.

Page 59: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  55  

proved that the proposed changes were now credibly grounded in reality. The inter-agency

fragmentation that had been identified now came to the forefront. The information that could

have potentially stopped the 9/11 terrorists had been collected but not analyzed by the agencies

or shared with the appropriate parties. Members of the IC were coming to realize that the modern

counterterror intelligence environment was far different than they had previously thought. The

work done in this period laid the foundations for the modern conceptions of effective intelligence

and agency organization.

The most concrete end effect of the report was the establishment of the National

Counterterrorism Center by the Intelligence Reform and Terrorism Prevention Act of 2004

(IRTPA).3 Though the bill does not specifically mention the intelligence failures leading up to

9/11, the events are clear motivators for the center. The legislation makes specific mentions of

access to “all-source intelligence” and “independent, alternative analyses” as main goals for the

new agency. With these objectives in mind, they set about building the new center.

Improved information access and dissemination

The most logical goal of creating a central hub for the counterterror intelligence

community was the centralization of information. Instead of sitting locked away in various

separate agency databases, it would also reside in a single repository, where those with access to

the NCTC data-stores would have access to the databases of the interagency community. This

would hopefully avoid the major catastrophes that were the 9/11 missed opportunities.

The classic example is the fundamental break between FBI and CIA data-stores. In their

own information stores, they housed ominous implications of what may come, but together

                                                                                                               3 Intelligence Reform and Terrorism Prevention Act of 2004, 2004. http://www.nctc.gov/docs/pl108_458.pdf.

Page 60: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  56  

provided a relatively clear picture of what was going on: foreign Al’Qaeda agents had entered

the United States, were taking flying lessons, and should at the very least have been monitored if

not apprehended. Even without specific monitoring, the FBI was in possession of the full names,

addresses, bank information, and telephone numbers of some the terrorists.4 However, they

simply did not know that these men were dangerous.

The NCTC’s information centralization was aimed at alleviating this problem. First, it

houses the data of more than thirty different intelligence and law-enforcement networks, making

them available to the interagency community.5 Furthermore, it maintains a data-store known as

the Terrorist Identities Datamart Environment (TIDE), which stores information on international

terrorists. This data is compiled from a variety of domestic and international sources, providing

real-time information for analysts in many agencies.

The successful combination of these databases is a major technical, political and

bureaucratic feat that should not go overlooked. It is exceedingly hard to convince agencies to

part with any of their precious data and difficult to actually implement it organizationally.6

Furthermore, the technical requirements are massive, due to the fact that many agencies use

custom technologies, causing headaches with integration. The success that it has had here proves

that the NCTC is capable of achieving very technically challenging goals. Future federal

information centralization can be built upon the foundation laid by the NCTC’s data integration

efforts.

Interagency Communication and Collaboration                                                                                                                4 Zegart, 156. 5  Best, Richard. “The National Counterterrorism Center (NCTC)—Responsibilities and Potential Congressional Concerns.” Congressional Research Service, December 2011. https://www.fas.org/sgp/crs/intel/R41022.pdf, 5. 6  Peled, Alon. “Coerce, Consent, and Coax: A Review of U.S. Congressional Efforts to Improve Federal Counterterrorism Information Sharing.” Terrorism and Political Violence 1, no. 18 (August 2014).

Page 61: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  57  

The creation of the NCTC has done more than simply house more data under a single

roof. It has also produced a better environment for actual collaboration with the data. Before the

creation of the center, there was a dearth of opportunities to communicate directly with other

agencies, deepening the siloed nature of the IC. Each department simply collected its own data,

analyzed it in its own informational context and reported it up the chain to command. Final

decision-makers relied on extensive contextual aggregation at top levels to piece together the

varied reports into a single counterterror intelligence product.

Another of the NCTC’s stated goals is to provide a more comprehensive layer of

integration below the Director of National Intelligence level.7 The language of its defining

legislation suggests that information be funneled through the NCTC before it is presented to

policymakers. Reports are therefore available to more members in the intelligence community,

broadening overall understanding of the issues and allowing for alternative analyses.

Furthermore, the NCTC claims to improve the “situational awareness” of the community

as a whole. It does so by hosting three secure video conference calls each day. Some of these

calls happen in the early hours of the morning – regular schedules do not apply to counterterror

intelligence. The meetings occur between NCTC employees and employees of various agencies,

ensuring that they are constantly in contact. This type of contact no doubt fosters better

collaboration between the agencies.

The efforts put forth by the center put the IC leaps and bounds ahead of where it was

when the World Trade Center came down in 2001. The IC had not yet adapted to the modern

counterterror environment, and the NCTC brings a more modern structural element. However, it

is not entirely clear how much better equipped the IC is in its battle against terrorism. Even with

                                                                                                               7  Intelligence Reform and Terrorism Prevention Act of 2004.  

Page 62: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  58  

the massive changes instituted in 2004, major intelligence failures still happen, sometimes for the

similar reasons as on September 11, 2001.

Concrete Points of Failure in the Counterterror Community

Despite the massive budgetary and personnel increases in the past decade and a half, the

counterterror intelligence community still experiences unacceptable failures. First, as noted in

Chapter 1, it is unreasonable to expect any type of intelligence agency to be 100% accurate on

every piece of information. This type of “creeping determinism” produces expectations of

analysts that are impossible to satisfy and eventually counterproductive.

On the other hand, the members of the IC must learn from their mistakes and take

responsibility when they fail to do so. When major intelligence signals are missed due to

structural deficiencies that the NCTC was supposed to solve, there must be some accountability.

Therefore, while this is not an attempt to condemn the employees working at the National

Counterterrorism Center and its sister agencies, it is a condemnation of the structural weaknesses

that reduce their chances of success. Sometimes these weaknesses make it impossible for

analysts to correctly do their jobs. The two presented examples demonstrate the deficiencies that

reduce the effectiveness of the NCTC.

The 2009 Christmas Bomber

On Christmas day in 2009, Umar Farouk Abdulmutallab, a 23-year-old Nigerian man,

boarded a Detroit-bound plane in Amsterdam. In his underwear he had hidden a non-metallic

pouch filled with the chemical pentaerythritol tetranitrate, a major ingredient in some plastic

Page 63: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  59  

explosives.8 As the flight began its descent into Detroit Metropolitan airport, Abdulmutallab

went into the bathroom, where he attempted to inject the pouch with another liquid as a reagent

to begin an explosive reaction. Thankfully, he somehow botched the injection and only managed

to start a fire instead of setting off an explosion. Though there were no air marshals on the flight,

other passengers managed to subdue him and put out the fire that he had started.

Chemists posit that the amount of explosive that he had was more than enough to

puncture the hull of a commercial airliner and would most likely have done serious damage to

the airplane.9 It is impossible to know whether or not he could have brought down the aircraft,

but if he had, 278 lives would have been lost. Airline security is supposedly extremely robust,

especially following the events of 9/11. Furthermore, the flight was entering the US, meaning it

should pass American standards of security. This raises the question: how did this kind of attack

get through? What failures occurred that allowed this to happen?

The Senate Intelligence Committee put together an extensive report on the failures of the

intelligence community in this instance, and the findings were not favorable for any agency. The

National Counterterrorism Center sits at the center of these agencies and is intended to be the

glue that holds together the analytical efforts of the entire community.

First, the report identifies the Department of State as having made mistakes with

Abdulmutallab’s multiple-reentry visa to the United States. He had originally applied for it in

2008 while in a Master’s program in mechanical engineering in London. However, after he

abandoned his family to attend an extremist training camp in Yemen, his father went to the US

                                                                                                               8 Burr, Richard. “Unclassified Executive Summary of the CommitteeReport on the Attempted Terrorist Attack on Northwest Airlines Flight 253,” May 2010. http://www.intelligence.senate.gov/publications/report-attempted-terrorist-attack-northwest-airlines-flight-253-may-24-2010. 9 Johnson, Carrie. “Explosive in Detroit Terror Case Could Have Blown Hole in Airplane, Sources Say.” The Washington Post, December 29, 2009, sec. Nation. http://www.washingtonpost.com/wp-dyn/content/article/2009/12/28/AR2009122800582.html.

Page 64: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  60  

embassy in Nigeria to report that his son had likely been radicalized.10 This information went to

both the State Department and the CIA. The State Department opted not to revoke

Abdulmutallab’s visa, despite the credible accusation made by his own father. However, even

had his visa been revoked, there was no automated electronic way for the State Department to

notify related parties (such as the NCTC), or even the airlines that would be processing him.11 It

is the responsibility of the NCTC to ensure that information like this gets disseminated to the

entire community, instead of being ignored.

The failures did not end with the State Department. The CIA had previously generated

reports on Abdulmutallab due to his involvement in Yemeni extremist camps. However, the

existence of this information was not reported widely, and many CIA offices had no idea that it

existed.12 As a result, regional divisions that were not focused on Yemeni extremism did not

search databases that contained the reports related to Abdulmutallab. The information was spread

across too many different sources for them to find anything coherent. And finally, the

information that was collected by the CIA was not given to the NCTC until after the attacks had

occurred. This means that the CIA was in possession of highly relevant counterterror intelligence

that it had opted to withhold from the rest of the community. Had the CIA shared the reports, it is

more likely that other agencies would have been able to identify the threat the Abdulmutallab

posed. This exact type of information centralization is one of the stated goals of the NCTC.

Investigations of the FBI revealed more failures. Even if all of the CIA information had

been centralized correctly in the NCTC, it may not have mattered. In the aftermath of the failed

attack, investigators found that critical FBI analysts did not have access to the CIA data streams

                                                                                                               10 DeYoung, Dan Eggen, Karen, and Spencer S. Hsu. “Plane Suspect Was Listed in Terror Database after Father Alerted U.S. Officials.” The Washington Post, December 27, 2009, sec. Nation. http://www.washingtonpost.com/wp-dyn/content/article/2009/12/25/AR2009122501355.html. 11 Burr, 4. 12 Burr, 5.

Page 65: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  61  

coming through the NCTC.13 Due to a basic technical misconfiguration of security profiles, the

information was blocked from showing up in database searches, meaning the FBI personnel were

not even aware that they were unable to access critical information. Had the terror attempt not

occurred, it is likely that this misconfiguration would have persisted for a long time.

Finally, the rigid and incomplete standards set up between the agencies prevented

Abdulmutallab from being nominated for the no-fly list.14 Because each agency only had smaller

pieces of the puzzle, none had the entire picture of his extremism. This initial information

fragmentation in itself is a consequence of the complex intelligence environment and cannot be

avoided: no agency can expect to find all of the puzzle pieces on its own. However, there was no

specific mechanism to begin the process of integrating the information. No agency was tasked

with seeding an initial, though incomplete, profile that could be built upon by other

organizations. Overly complicated or rigorous standards for establishing new entries hindered the

ability of the NCTC to even begin the integration process.

The withholding of information and lack of collaboration (or any type of communication,

really) caused Abdulmutallab to be excluded from any type of potential terrorist database,

despite the wealth of evidence against him. Basic technical glitches added additional problems

on top of the basic organizational ones, creating an intelligence environment that removed the

possibility of predicting the threat that Abdulmutallab presented. He was therefore able to get on

a plane to Detroit with explosives sewn into his clothing, and only luck prevented the deaths of

hundreds of civilians. These intelligence failures occurred in spite of the NCTC’s specifically

stated goals to prevent them.

                                                                                                               13 Burr, 8. 14 Burr, 11.  

Page 66: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  62  

The Foreign Fighter Phenomenon

The Syrian civil war has raged since 2011. The clashes between Bashar Al-Assad’s

government forces and the separatist groups have seriously destabilized the region. The rebel

fighters splintered into subgroups and reformed in new ways. Eventually a new power emerged

from the chaos: the Islamic State of Iraq and Syria (ISIS).15

From the outset of the war, civilians from western countries began to leave their homes to

fight in the struggle against what they considered Assad’s “oppressive” regime. Huge numbers of

civilian deaths and the alleged use of chemical weapons further fueled the influx of what

eventually became known as “foreign fighters.” As the dynamic of the conflict changed and ISIS

gained ground and manpower, the foreign fighters began joining almost exclusively its ranks.

Massive recruitment campaigns began to flood the Internet as ISIS recruiters took to the web in

order to bolster their numbers.16

As its forces swelled with local and foreign fighters alike, ISIS became bolder, declaring

itself a renewed caliphate in mid 2014, and capturing major Iraqi cities such as Ramadi, Falluja,

and Mosul. With its newfound influence, it accelerated its recruitment apparatus and began to

engage heavily with the western world, convincing uncommitted extremists to join the nascent

caliphate. The methods used for recruitment often begin in publicly accessible channels over

open social media sites such as Facebook, Twitter, or Tumblr. Within months of the declaration

of the caliphate, an estimated ten thousand fighters arrived in Syria from nearly 80 countries.17 It

is estimated that hundreds of the recruited fighters are American citizens.

                                                                                                               15 Wood, Graeme. “What ISIS Really Wants.” The Atlantic, March 2015. http://www.theatlantic.com/magazine/archive/2015/03/what-isis-really-wants/384980/. 16 “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel.” Homeland Security Committee, September 29, 2015. https://homeland.house.gov/wp-content/uploads/2015/09/TaskForceFinalReport.pdf. 17 “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel,” 10.  

Page 67: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  63  

The foreign fighters that make their way to Iraq and Syria provide grave threats to the

American homeland. These threats manifest themselves in two main ways. First, they provide

English-speaking recruiters to ISIS, increasing their outreach in western spheres of influence.

These recruiters are capable not only of convincing more civilians to join the war in Syria and

Iraq, but to inspire local acts of violence as well. In fact, as more foreign fighters have joined

ISIS, the number of ISIS inspired terror attacks in the west have gone up dramatically. In 2015,

the number of attempted and successful attacks was nearly double that of 2014 (37 compared to

20).18

Concrete examples of ISIS’s increased influence continue to manifest themselves.

Perhaps one of the most notable is the San Bernadino attack, in which 14 people were killed and

21 people were injured. The couple responsible for the violence had pledged their allegiance to

the Islamic State.19

A possibly more dangerous consequence of the foreign fighter phenomenon is that of

fighters who return to their home countries. These are men and women have received training

and combat experience in a brutal civil war. Their expertise could prove deadly upon their return.

One of the most pertinent examples of the threats of returning foreign fighters is the November

2015 Paris terror attack. The mastermind of the attacks was a Belgian citizen named Abdelhamid

Abaaoud.20 He had previously made his way to Syria to fight for the Islamic State and received

combat training and experience. He eventually returned to Paris, where he planned the attacks

that took the lives of 130 people and injured nearly 400 more.

                                                                                                               18  “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel,”15.  19 Koren, Marina. “How the San Bernardino Shooters Planned for Jihad.” The Atlantic, December 9, 2015. http://www.theatlantic.com/national/archive/2015/12/san-bernardino-shooters-radicalization/419610/. 20 Freytas-tamura, Aurelien Breeden, Kimiko De, and Katrin Bennhold. “Call to Arms in France Amid Hunt for Belgian Suspect in Paris Attacks.” The New York Times, November 16, 2015. http://www.nytimes.com/2015/11/17/world/europe/paris-terror-attack.html.

Page 68: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  64  

Out of the hundreds of Americans that have attempted to get to Syria and Iraq, only 28

have been successfully interdicted.21 Though many have died or opted to stay in the region, 40

American foreign fighters are thought to have made it back into the United States. Each

represents a possible threat. One has already been apprehended for plotting an attack against a

US military base.22 An extensive report conducted by the Department of Homeland Security

(DHS) has concluded that the counterterror community is not reacting effectively to this growing

threat and must improve its interdiction capability. Furthermore, the NCTC has acknowledged

the central role that it is playing in the identity resolution of potential foreign fighters leaving or

entering the country.23 This is not to say that the foreign fighter threat is the fault of the NCTC. It

is not expected to solve the problem on its own. However, it is not fulfilling its role, and

exploring this failure can provide a useful example to investigate the ways in which NCTC’s

structural flaws manifest themselves.

The NCTC has not been able to prove that its watch listing capabilities have improved

since failures in 2009 (Christmas) and 2013 (Boston). There have been no independent reviews

of its progress. Even so, what has been reported by DHS does not seem promising. In fact,

despite efforts to truly centralize information, the IC still relies on two separate terrorist watch

list databases. The NCTC manages the aforementioned TIDE data instance, while the FBI

maintains its own “Terrorist Screening Database” (TSDB).24 Both databases often contain only

partial information on hundreds of thousands of suspected terrorists. There is no guarantee that

the databases do not contain partially overlapping information, causing further information

                                                                                                               21 “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel”, 23. 22 “Columbus, Ohio Man Charged with Providing Material Support to Terrorists.” Department of Justice, April 2015. https://www.fbi.gov/cincinnati/press-releases/2015/columbus-ohio-man-charged-with-providing-material-support-to-terrorists. 23  Rasmussen, Nicholas. Hearing before the House Committee on Homeland Security “Countering Violent Islamist Extremism: The Urgent Threat of Foreign Fighters and Homegrown Terror,” February 12, 2015, 3. 24  “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel,” 26.  

Page 69: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  65  

fragmentation. Additionally, these databases are difficult to access, even for appropriate analysts

outside of the center. For example, the TSA reported that when it submitted a pattern-matching

query to the NCTC to run against its TIDE instance, the results came back from the NCTC

eighteen months after the initial request.25 By that time, the results were mostly meaningless. The

delay effectively signaled that the TSA does not have access to this database.

Additionally, the interagency community, including the NCTC, continues to disregard the

vital role that state and local fusion centers can serve in the intelligence process.26 Federal

intelligence services often fail to query local fusion centers for data, alienating a potentially

critical source of counterterror information. They rarely share their own data with the state and

local centers; meaning local investigations are often working partially blind. These state and

local fusion centers are important because they are often the closest geographically to suspected

terrorists and have a better understanding of the local environment.

Finally, and perhaps ironically, the DHS report itself represents an organizational schism

that exists in the community. Among the many recommendations found in the report, DHS posits

that it should be given more responsibility to coordinate terrorist watch-list databases, a task that

aligns better with the stated goals of the National Counterterrorism Center. Having both parties

attempting to centralize information in different locations is confusing and counterproductive.

These conflicts of interest in terms of responsibility only add to the fragmentation of data in the

intelligence community. This position taken by the DHS exacerbates the problems with the

power dynamics that exist within the IC.

                                                                                                               25  Inspector General Roth, John. TSA: Security Gaps  : Statement of John Roth Inspector General, Department of Homeland Security, Before the Committee on Oversight and Government Reform. US House of Representatives, November, 2015. https://oversight.house.gov/wp-content/uploads/2015/11/11-3-2015-Committee-Hearing-on-TSA-Roth-DHS-OIG-Testimony.pdf, 5. 26  Best,  2.  

Page 70: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  66  

Implicit Organizational Deficiencies Within the NCTC

While the National Counterterrorism Center has exhibited concrete technical and

organizational deficiencies, it still suffers from a number of other issues that have not manifested

themselves directly in post-event reports. This does not mean that these structural problems have

not contributed to the aforementioned intelligence failures. It is likely that each of these have

exacerbated the reported issues in their own way.

Organizationally, the National Counterterrorism Center is in a unique position. It operates

in two distinct spheres of influence and responsibility. The first is inherent to any organization:

its internal processes. Any organization has multiple interacting components that must function

on their own and integrate smoothly with all other components. As discussed in Chapter Two,

this is not an easy feat to achieve. The second sphere is external: the NCTC’s position in the

interagency community. The NCTC is tasked with making a multi-agency group emulate a

centralized organization. The abstract concept of interacting components remains relevant, where

each agency in the space represents a structural component of the larger organization.

Ultimately, each component should integrate successfully with all others in order to achieve the

goals of the community.

This puts the NCTC in an exceedingly difficult position, as it must not only worry about

the function of its own complex inner-workings, but on the inner-workings and interactions of

the rest of the agencies in the counterterror intelligence community. Because it is responsible for

the goals of the community as a whole, the internal processes that exist in agencies outside of its

explicit control directly affect its ability to achieve those goals. Therefore, the failures of these

agencies as they interact in the interagency space can be seen as structural flaws that the NCTC

is responsible for.

Page 71: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  67  

Information Sharing and Access

The NCTC is meant to be a melting pot of analysts from a variety of agencies, a place

where they step out of their isolated information siloes and collaborate on important intelligence

issues. The first step to this collaboration is the sharing of “proprietary” data from the

intelligence collection pipelines of the individual agencies. As this is one of the main goals of the

center, it should have a well-defined set of processes for its analysts to follow when engaging in

information sharing. Likewise, it should have a robust and intuitive information-sharing platform

for the analysts to use.

Unfortunately, this is not the case. The information environment is set up in such a way

that it delegates responsibility to data-collectors to define who is allowed to access the

information.27 These data-collectors are wildly inconsistent in their decisions and do not have a

strictly defined process to determine what classification information should have. Often, this

information is only disseminated to those that have a very clear and documented “need-to-know”

reason for accessing the data.

The data collectors are held responsible for their choice of classification long after they

make their decision. It is a naïve assumption that theses collectors will err on the side of

openness, especially given the current climate in classification matters. For example, in the

Hillary Clinton email case, there are talks of indictments over emails that were retroactively

classified.28 This practice is not uncommon and creates incentive structures against sharing. This

means that any information that could possibly be sensitive in any way will be largely

                                                                                                               27  Putbrese, Daniel. “Intelligence Sharing: Getting the National Counterterrorism Analysts on the Same Data Sheet.” Atlantic Council International Security Papers, 2006. http://www.atlanticcouncil.org/publications/reports/intelligence-sharing-getting-the-national-counterterrorism-analysts-on-the-same-data-sheet, 13. 28 Twitter, Krishnadev Calamur. “Some Clinton Emails Were Retroactively Classified.” NPR.org. Accessed April 13, 2016. http://www.npr.org/sections/thetwo-way/2015/05/22/408774111/state-department-to-release-more-clinton-emails-today.

Page 72: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  68  

inaccessible. Only analysts with an inarguable “need” to know the contents of the data will have

access.29 As a result, even though shared databases do exist, not everything that could be in them

is actually stored. Finally, in a modern data environment, where a massive amount of

information is being collected each day, having humans read and classify all information by hand

creates an unacceptable bottleneck. This bottleneck significantly slows the process of

information dissemination.

Furthermore, the data that does manage to get past the need-to-know filter may not be

easily navigable. As mentioned in the previous chapter, shared information requires significant

syntactic and semantic documentation to truly be of use, otherwise many analysts must orient

themselves on their own. This process is arduous and has a steep learning curve. In descriptions

of the database structures at the NCTC, it seems that these data stores are simply made available

to analysts with little documentation. 30

Finally, while the centralization of this data is technically impressive, it still falls short of

what is necessary. In fact, it still manifests many of the problems that the NCTC was supposed to

solve. In the late 1990’s a large technological problem came to the fore: each organization stored

its data in its own format and structure. This came to be known as “stove-piping,” where each

organization would funnel its data through its own specific pipeline. These pipelines were

inaccessible and inscrutable to others. Unfortunately, while analysts at the NCTC technically

now have access to more of these “stove pipes,” as of 2013, they still could not access them all in

a single search. The difficulty compounds when they attempt to switch databases, as the security

measures prompt them for passwords upon entry to each agency’s data-space.

                                                                                                               29 Putbrese, 14.  30 Nolan, Bridget. “Information Sharing And Collaboration in the United States Intelligence Community: An Ethnographic Study of the National Counterterrorism Center.” PhD Dissertation, University of Pennsylvania, 2013.

Page 73: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  69  

These informational barriers can make many forms of computational analysis impossible,

especially when an analysis is attempting to synthesize information from multiple contexts.

Instead of being able to take advantage of modern advances in computational speed and data

access, data scientists might need to feed in one data point at a time, effectively removing the

advantage of even having a high-speed computational tool. Even if the analysts are able to access

all data-points in an entire database at a time, the fact that information is spread across more than

thirty different password-protected barriers makes the task an arduous one and difficult to iterate

on.

The flaws that exist within the NCTC’s data sharing model cause serious issues that must

be addressed. The process for determining what information to share is poorly defined. The

incentives for sharing are inverted, limiting the opportunities for the spread of information.

Finally, the information that is shared is not centralized in a way that promotes contextual

integration. It instead simply makes disjointed data stores available for individual queries. What

exists at the NCTC is much better than no access and no sharing, but the system must improve in

order to facilitate truly big-data oriented analytic practices.

Personnel Quality

The driving forces behind any organization are the employees that work there. The

NCTC’s final product, centralized and cross-referenced analysis, is a direct result of the analytic

teams that work there. The quality of this final product is dependent on the quality of the analysts

and data scientists doing the analysis. The employees working in counterterror must have

expertise in a wide range of topics and tools. They must also be willing to share their specific

Page 74: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  70  

talents with the rest of the community to get the best analysis. Finally, these employees must be

incentivized to work hard toward a mission and must have attainable objectives to achieve.

There are few signals coming directly from the NCTC about the quality of its analysts

and data scientists. However, the majority of its analytical and engineering staff comes from

other agencies, meaning the quality of its staff depends on the quality coming from CIA, NSA,

FBI, and others.31 In this space, there are plenty of signals that federal agencies are having issues

hiring quality new employees, especially among those under thirty.32 The intelligence and law

enforcement sectors are hit especially hard given the higher analytical and technical capabilities

that are often required of their employees.

Numerous factors affect the desirability of the intelligence community for talented

engineers, analysts, and data scientists. Two important ones that surface constantly are salary and

operational freedom. Operational freedom refers to an engineer or analyst’s ability to make some

of his or her own choices on what work to pursue and what tools to use.

First, the salaries offered in government positions are simply not competitive with

salaries in the private sector.33 In a sample of self-reported salaries from the company-rating site

Glassdoor.com, NSA and CIA positions for analysts and engineers reported an average salary of

close to $80,000.34 On the other hand, Facebook’s reported average base salary for data scientists

                                                                                                               31 Nolan, 71. 32 Feintzeig, Rachel. “U.S. Struggles to Draw Young, Savvy Staff.” Wall Street Journal, June 11, 2014, sec. Careers. http://www.wsj.com/articles/u-s-government-struggles-to-attract-young-savvy-staff-members-1402445198. 33 Lunney, Kelly. “Public-Private Sector Pay Gap Remains at 35 Percent.” Government Executive. http://www.govexec.com/pay-benefits/2014/10/public-private-sector-pay-gap-remains-35-percent/96830/. 34  “CIA Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/CIA-Salaries-E41381.htm. “NSA Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/NSA-Salaries-E41534.htm. Glassdoor is a website dedicated to housing employee reviews, salary reports, and interview reviews. The content is self-reported, but with enough corroborating stories, can be deemed mostly trustworthy.  

Page 75: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  71  

is closer to $140,000, with $40,000 in stock options, and a $100,000 signing bonus.35 It is not

uncommon for many people to take a slight pay cut to work for a cause that they believe in, but

when the difference is this colossal, only the most dedicated will choose federal intelligence

work. Unfortunately, dedication to the cause is not a replacement for talent and ability.

The working environment presented at the National Counterterrorism Center can deter

even those who are dedicated to service. Making an impact as an individual can be extremely

difficult. The working culture is notoriously rigid, with analysts and data scientists given little

freedom to pursue their own leads.36 The occurrence of “tasking”, where a higher-level manager

or policymaker assigns a very specific question to a lower level analyst is commonplace,

meaning analysts have much less agency in choosing important problems to tackle.

The restriction of essential tools can also serve as deterrents to would-be employees.

Security and structural requirements of the respective agencies create problems that manifest

themselves in technical ways. For example, giving an analyst access to a new database, perhaps

one from the NSA, would take a mere ten minutes for an IT professional.37 However, some

employees report that the process takes months due to bureaucratic red tape.38 These problems

provide major disincentives for desirable employees to seek employment in the Intelligence

Community.

The NCTC is actually in a worse position than the rest of the IC because it pulls analysts

and data scientists from other agencies to work at the center. The CIA, NSA, and FBI are

required to provide employees to work at the NCTC. However, sending employees to the NCTC

does little to further the specific mission of that agency. Therefore, each agency is not

                                                                                                               35  “Facebook Research Scientist Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/Facebook-Research-Scientist-Salaries-E40772_D_KO9,27.htm. 36 Nolan, 90. 37 Nolan, 29. 38  Nolan,  29.  

Page 76: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  72  

incentivized to provide their best analysts, because they would lose them. Instead, they are

motivated to send expendable employees, those whose absence would not severely impact the

operations of the agency. Colloquially, the NCTC is known as the “dumping ground” for

underperforming employees, especially during its early days.39

Note that this section is not an attempt to denigrate the analytical and engineering

capabilities of the employees working at the NCTC. It simply is an observation that there is a

considerable source of talent in the private sector that is going untapped by the intelligence

community.

Analytic Collaboration and Culture

In order to provide the “alternative analyses” that are in the main goals of the NCTC, the

analysts at the center must work collaboratively on its final intelligence products. The collective

set of talents and areas of expertise of the analysts and data scientists are meant to combine into a

“critical mass” of competence, allowing them achieve levels of clarity not possible at each of the

individual agencies.

These new analytical conclusions require the unselfish sharing of time, talents, and data

with other employees. Additionally, there must be a sense of mutual respect and camaraderie

when integrating data; otherwise collaboration efforts could be stymied by agency personality

differences. At the NCTC, the cultural bridges that allow for collaboration are shaky at best and

destructively counterproductive at worst. The ways in which analysts are trained at a specific

“home” agency and then sent to the NCTC as a representative of their own agency can create us-

versus-them attitudes that can severely diminish opportunities for collaboration.40 The home-turf

                                                                                                               39 Nolan, 80.  40 Nolan, 71.

Page 77: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  73  

loyalties exist for the majority of analysts and manifest themselves in the attitudes that they have

toward the “other” agencies. The differences mostly arise in interactions between the largest

agencies: NSA, CIA, FBI, and DIA.

When interviewed anonymously about their opinions of other analysts, representatives

from each agency have no shortage of caustic comments about their counterparts. CIA analysts

were considered by all to be relatively competent, but snobby and arrogant about their status.

One DIA agent went as far as claiming that all CIA analysts were “WASPy, Harvard-educated

and fluent in Yiddish or whatever.”41 There was a general sense of distrust of CIA analysts, due

to their perceived cutthroat nature. People considered CIA employees to be only out for

themselves.

FBI employees were colloquially known as “those idiots with the guns in the building.”42

They were considered traditional and generally inept at pursuing the mission. Other analysts

disliked working with them because they thought of them as constantly “a step behind, with an

inferiority complex.”43 NSA did not fare much better than FBI or CIA, having labels such as the

“idiot savants.”44 A common joke for NSA analysts is that while they are intelligent, they are so

socially inept that their most extroverted employees manage to look at other people’s shoes

instead of their own. DIA is considered bottom-rung by the other major agencies, operating as

the agency that is “always certain – never right.” 45

The negative attitudes that these analysts have toward each other can significantly

discourage attempts to collaborate and share data. Why waste time sharing information and

                                                                                                               41 Nolan, 72. 42  Nolan,  73.  43  Nolan,  74.  44  Nolan,  73.  45  Nolan,  74.  

Page 78: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  74  

methods with employees when they are either too stupid to deal with it or would possibly throw

you under the bus?

The collaboration issue is exacerbated by the incentive structure that exists around

performance metrics. Analysts are not evaluated by how well they can collaborate and create

innovative new solutions with their colleagues. They are instead evaluated almost purely on the

number of reports that they author, regardless of whose data they used or whom they worked

with.46 This means that a month-long collaborative effort to develop new an innovative

computational model would put the analysts behind another analyst who simply authored ten 2-

page reports in the same time period.

Collaboration also requires a significant amount of trust between analysts. Some NCTC

employees report situations in which their work was effectively “stolen” by another analyst.47

This is often done by taking parts of work and classifying it, cutting off one analyst from all of

the work that they had done. The permissibility of these practices creates an environment in

which collaboration is a dice-roll, not a necessity for good performance.

External Organizational Deficiencies of the NCTC

The NCTC’s internal structural and cultural flaws can be seen as microcosms of the

larger intelligence community. The NCTC’s role as the centralized organizer of the interagency

space means it must have the power to affect change in the community as it sees fit.

Unfortunately, it appears that the NCTC is still subordinate to many of the more powerful

agencies in the community. This subordination manifests itself in ways that contribute to the

center’s internal struggles.                                                                                                                46 Nolan, 99. 47 Nolan, 102.  

Page 79: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  75  

Clearly Defined Roles and Capabilities

As elaborated in Chapter Two, a prerequisite to constructing a successful Big Data

analytic operation is a clearly defined goal and a detailed plan on how to get there. The

operational requirements of a new system, technical or organizational, must be very explicitly

laid out. If this is done correctly, then the process of moving toward the objective can survive

setbacks and changes in the operational environment. If the goals are poorly defined or the plan

is lackluster, then inevitable setbacks can derail the project as the goals are reevaluated and

adjusted. Furthermore, technical debt can be acquired that makes repairing damage extremely

costly. Building a system haphazardly is akin to building a jet engine while the airplane is in

mid-air: it might be possible, but is not advisable.

The NCTC has suffered in this regard since its conception. It begins with the descriptions

of its functions in the Intelligence Reform and Terrorism Prevention Act of 2004. The center is

described as being the “primary organization” for analyzing and integrating all-source

intelligence. Furthermore, the act states that the NCTC should ensure that agencies have

“appropriate” access to the intelligence that is “needed” to accomplish their analytical goals. All

of this seems very reasonable. The community suffered from fragmentation and required

centralization to overcome it. But under closer examination, it becomes clearer that these

descriptions do not robustly define the roles and powers of the center. Whether or not access is

“appropriate” is not an objective term. Likewise it can be impossible to know beforehand

whether or not data is “needed” to complete an assignment.

Page 80: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  76  

These weaknesses have been repeatedly been surfaced in reports on the NCTC’s

performance. One report claims that the NCTC’s planning apparatus is “rudimentary,” especially

given the fact that the center has no authority to mandate the implementation of its plans.48 It

claims that the center takes on the role of a “non-confrontational think tank” instead of a

centralized authority that integrates information and leads the analytical work of the rest of the

community. Another report claims that the NCTC’s lack of formal authority required that its

director “persuaded,  embarrassed,  created  consensus,  or  invoked  higher  authorities”  instead  of  

simply  delegating  what  needed  to  be  done.49  

 

Information Sharing in the Wider Community

The NCTC’s lack of definitive authority in the IC can cause further information-sharing

problems. Because it cannot mandate the dissemination of data, it must therefore ask for it.

While orders must be followed, requests can be denied.50 The tendency that many agencies have

of “protecting” their data makes this denial relatively common, especially when the data may

contain sensitive information that could compromise one of their collection sources.

This type of prejudice does not exist only for the NCTC. The aforementioned caustic

attitudes held by many in the IC about other agencies can prevent sharing as well. The CIA, if it

does not trust the capabilities of the FBI, may choose to not make sensitive information available

to it. The NCTC, the supposed arbiter of information sharing problems, is powerless to stop this

type of behavior. The 2009 Christmas bombing debacle demonstarted that often the NCTC and

                                                                                                               48  Col. Brian Reinwald. “Assessing the National Counterterrorism Center’s Effectiveness in the Global War on Terror.” Masters Thesis, Army War College, 2007, 9. 49  Kravinsky, Robert. “Toward Integrating Complex National Missions: Lessons From The National Counterterrorism Center’s Directorate of Strategic Operational Planning.” Project On National Security Reform, February 2010. http://0183896.netsolhost.com/site/wp-content/uploads/2011/12/pnsr_nctc_dsop_report.pdf. 50 Putbrese, 10.

Page 81: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  77  

other agencies are completely unaware that information is being withheld. The NCTC does not

have the authority to audit all collection of any agency.

Feedback and Iteration Within the Community

Even if the authority existed for the National Counterterrorism Center to impose its

organizational will on other agencies, it might not matter. There is a lack of directive on what

intermediate or final structures should look like. When is the restructuring “complete”? What Big

Data capabilities are required in order to say a certain milestone has been reached? Admittedly it

is a difficult question to answer, but there must be a definition of what a “working as intended”

NCTC looks like.

Making progress in an environment like this is exceedingly difficult. Furthermore,

measuring that progress based solely on outcomes may actually be impossible. A review of the

NCTC’s Directorate of Strategic Operations has found that the center struggles with its “impact

assessment” reports.51 It is difficult to make credible correlations between NCTC actions and the

state of the war on terror. How easily can one prove that it is “making a difference”? In the 18

months leading up to 9/11 America experienced little in terms of terrorist attacks, and yet the IC

was failing catastrophically every day.

As mentioned earlier, a sense of creeping determinism can cloud judgment of past errors,

making them seem more egregious than they actually were. In this way, measuring outcomes can

be an inexact science and can give inaccurate measurements on the performance of the center.

These measurements are still important, but they should be combined with an analysis of the

function of the structures that the NCTC employs to affect change. Was the community put in a

position where it could identify threats and stop them? How well did it react to changes in the                                                                                                                51 Kravinsky, 18.

Page 82: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  78  

operation environment? Did the changes mandated at top levels make their way down to an

operational context? This is just a small sample of possible feedback mechanisms to use in order

to evaluate performance of the center. It remains difficult to do this when there are no structural

goals to evaluate.

  Conclusion

Before the creation of the NCTC information fragmentation was rampant within the

counterterror community. With it, the NCTC brought some semblance of authority and process to

information sharing and collaborative intelligence generation. And yet, while the National

Counterterrorism Center has brought with it a number of improvements to the IC, many issues still

remain.

Complete failures in engaging in the collaborative environment significantly reduced the

possibility of interdicting major threats. Events like the 2009 Christmas bomber showed that

sometimes the only thing preventing the deaths of hundreds of civilians is a large dose of luck.

Similarly, the foreign fighter threat continues to grow, and the intelligence community is attacking

the problem in a disjointed manner.

Other, less obvious problems still exist within the NCTC and continue to plague its place in

the interagency community. It is in a poor position to hire the best analysts and data scientists, it has

a toxic culture in which analysts are pitted against each other and bureaucratic requirements cause

major technical headaches. Furthermore, its poorly defined mission and lack of concrete authority put

it in an often subordinate position within the larger community, meaning it is unable to offset the

failures that exist within other agencies. Overall, the National Counterterrorism Center is a step

forward in terms of what it represents in the IC. Its existence is a general acknowledgement that in a

modern data environment, modern data and analytic practices must be adopted. However, since its

Page 83: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  79  

inception it was set up to struggle with this mission. Major changes must be made to get it to work

effectively.

Page 84: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  80  

Chapter 4

Looking to the Future: Innovative Models for the NCTC

“Lots of companies don’t succeed over time. What do they fundamentally do wrong? They usually miss the future.”

– Larry Page, Founder of Google1

The data problems brought on by the Internet age can be intimidating. The scale of data

created every day is beyond human comprehension. Data sizes often have to be abstracted into

units of billions and trillions of bytes (giga and tera). Currently, the size of the Internet is

measured in zetta-notation, which represents a twenty-one-digit number of bytes.2 Even so,

companies exist that aim to discover, index, and expose nearly the entirety of this information.

Many business models depend solely on the ability of the company’s engineers and analysts to

control this massive data effectively. The most obvious examples of this phenomenon are Google

and Facebook, each presiding over their own massively interconnected data empires. They both

handle heterogeneous datasets that are similar to counterterrorism data in size and scope.

This chapter explores the way in which the Big Data analytical products that these

companies build and use are applicable to data science. The tools that consumers use can have

strong parallels with the main analytic workflows that are used in counterterror intelligence

operations. These commonalities make these companies prime candidates for comparison, and

sources of inspiration for improvements to the NCTC’s Big Data analytics structure.

                                                                                                               1 “Computing Is Still Too Clunky: Charlie Rose and Larry Page in Conversation.” TED Blog, March 19, 2014. http://blog.ted.com/computing-is-still-too-clunky-charlie-rose-and-larry-page-in-conversation/. 2 “The Zettabyte Era—Trends and Analysis.” Cisco. Accessed April 22, 2016. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/VNI_Hyperconnectivity_WP.html.

Page 85: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  81  

These companies are optimized not only for the creation of the tools, but their

deployment as well. Some of these companies are the original innovators on Big Data and have

the authority to claim that their structures are suited for its use. In fact, many of the modern

organizational theories on data analytics stem from the work that these companies have done.

The inspiration found in these company’s structures may provide useful ideas on how to

improve the NCTC. However, the NCTC is not a private sector organization and has specific

limitations in its scope and capabilities that must first be addressed before attempting to

implement private sector structures in its environment. Even so, significant lessons can be

learned from these tech giants and provide a path forward for the National Counterterrorism

Center.

Modern Big Data Tools

Google truly is an “all source” company. It attempts to index every webpage, sound file,

video, and image on the Internet. It allows for efficient pattern matching for these objects on the

web. For example, if someone is looking for a specific phrase that appears in a text somewhere

on the Internet, Google can find that phrase quickly. Twenty years ago, search on this scale

would be impossible, and yet Google has managed to build out a system where the average user

can navigate billions of pieces of information with ease.

Contemporary Google search is actually far more ambitious than simply pattern

matching. It also attempts to build a “knowledge graph” that semantically connects knowledge

“entities” using links.3 This means that its algorithms no longer treat data simply as information

that can be matched with other information, but knowable entities that exist in a contextual                                                                                                                3 Sullivan, Danny. “Google Launches Knowledge Graph To Provide Answers, Not Just Links.” Search Engine Land, May 16, 2012. http://searchengineland.com/google-launches-knowledge-graph-121585.

Page 86: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  82  

fabric. Google built its search engine with an understanding that information does not exist in a

vacuum; each piece of data exists relative to every other one.

A possible workflow with the knowledge graph feature is as follows: users search the

name of a specific actor that they would like to know more about. They receive the usual

relevant blue links that Google is known for. However, they also receive possible continuations

of their search, such as movies that the actor has appeared in, other related actors, or events that

the actor is famous for participating in. The user is therefore more aware of the context that this

entity (in this case the actor) exists in.

The current knowledge graph is extremely extensive. When it first launched, it housed

close to 500 million entities, with 3.5 billion links between them.4 The overarching goal of the

feature is to allow users to better answer queries that require structured answers and context, not

simply relevant links. On the other hand, Google’s product is built with the average Internet user

in mind and is therefore not necessarily as useful for a task as specific as counterterror analysis.

A custom product or a modified version of Google’s would be necessary to be useful in practice.

Facebook’s monolithic platform also stores and curates large amounts of heterogeneous

data. Facebook’s engineers make this data nearly instantaneously available to each of its 1.7

billion active monthly users.5 Unlike Google, Facebook has the luxury of controlling the format

of the majority of the data that it uses. This is possible because the majority of the data that it

houses is generated by the users of the service, and these users can only interact with the

platform in predefined ways, with predefined input types.

                                                                                                               4  Sullivan.  5 “Number of Facebook Users Worldwide 2008-2016 | Statistic.” Statista. Accessed April 30, 2016. http://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/.

Page 87: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  83  

The control that Facebook exercises over its environment allows it to produce even more

robust and accurate models of its burgeoning networks.6 It builds its own knowledge graph of the

Facebook data environment, producing a complex set of entities and links between them. Using

this data structure it is able to produce detailed models of the behaviors of its users. Based on

users’ friends, self-proclaimed interests, and browsing activities, Facebook can generate

personalized feeds that match their information needs. It can even go as far as guessing which

unconnected users might be in the specific social network of another user, just based on their

activities.

Facebook’s acumen in this field has produced a significant number of new discoveries

about the behaviors of social networks. For example, it recently conducted a study on its users’

networks, mapping the degrees of separation that existed within its user-base.7 It showed that

classical graph formulation techniques were useful in practice and paved the way for future

research into social network interconnectivity.

The value of tools like these in counterterror cannot be understated. One of the most

difficult aspects of counterterror intelligence is navigating the immensely rich context that each

investigation takes place in. It is easy to miss very important but subtle semantic links between

entities. For example, a shared hometown by the perpetrators of seemingly unrelated attacks

could signal an exploitable pattern to be used in future analysis. However, links like this are not

immediately obvious, especially given that human analysts cannot check every possible attribute

of each militant that they research. However, a computer can. Automatically generated

knowledge graphs can provide invaluable clues to stumped analysts.

                                                                                                               6 Novet, Jordan. “How Facebook Matured Its Data Structure and Stepped into the Graph World,” June 25, 2013. https://gigaom.com/2013/06/25/how-facebook-matured-its-data-structure-and-stepped-into-the-graph-world/. 7 Edunov, Sergey. “Three and a Half Degrees of Separation.” Research at Facebook. Accessed May 1, 2016. https://research.facebook.com/blog/three-and-a-half-degrees-of-separation/.

Page 88: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  84  

The type of data analytic programs that are employed at Facebook and Google are not

foreign concepts to organizations in the intelligence community. The NSA has built its own

graphical database of the information that it collects.8 The agency began working on it in 2007

and by 2013 had an efficient and useful product. In 2016, the system is no doubt far more robust.

The NSA uses this tool in counterterror intelligence and claim that it derives value from it.

However, even the NSA accepts the limitations of its scope given the information sources that it

has access to. This shows that intelligence agencies are aware that these technologies are useful,

and are actively pursuing ways to integrate them into their analytic workflows.

Both the Internet giants and NCTC-integrated intelligence agencies have systems that can

structure heterogeneous data into context-aware graphical knowledge stores. Yet their successes

are markedly different. In the previous chapter, the intelligence community exhibited major

procedural and structural shortcomings with respect to these methods. How do the Internet giants

structure themselves to take advantage of the technology that they build?

Successful Big Data Structures

The success of modern technology companies has spurred massive organizational studies

over the past few decades.9 The challenges of computational technology and the advent of

massive datasets have produced companies that eschew the traditional models of hierarchy and

delegation. These companies embrace the complexity that modern information environments

provide. The complicated models that sufficed in a less globalized world are no longer tenable,

and organizational structures must reflect this shift.                                                                                                                8 Gallagher, Sean. “What the NSA Can Do with ‘Big Data.’” Ars Technica, June 12, 2013. http://arstechnica.com/information-technology/2013/06/what-the-nsa-can-do-with-big-data/. 9 Galbraith, Jay. “Organization Design Challenges Resulting From Big Data.” Journal of Organization Design 3, no. 1 (2014).

Page 89: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  85  

The technology industry is built on the littered corpses of failed technology companies. In

fact, it is thought that nearly 90% of all Silicon Valley startups fail. 10 Only those that have a

combination of powerful technology and the ability to leverage it effectively can survive in the

market. As Google and Facebook emerged in the early 2000’s, they became the poster-children

(among a few others) for data-driven companies that could thrive in a globalized information

environment. When compared to the traditional intelligence agencies that have existed for

decades (in the FBI’s case, over a century), their structures appear almost diametrically opposed.

In order to effectively explore these differences, it is important to revisit the

organizational requirements for big data success. For ease of access, they are repeated here:

1. Commitment to Big Data analytics 2. Open information sharing environments 3. Feedback channels and iteration 4. Engineering talent and culture

These factors do not exist in a vacuum and are in fact tied to many of the lessons that were

learned during the creation of the very companies that are being examined in this section. It

therefore follows that these companies perform very well when evaluated using these categories.

Even so, it is important to investigate the ways in which these successful structures are built and

how they have changed.

The Internet Giants

Google is famous for its innovative approaches to hard information problems. Often, it

solves problems that many people do not even know exist and continues to invent products that

revolutionize the way users interact with data. Google search has always been the beating heart

                                                                                                               10 Caroll, Rory. “Silicon Valley’s Culture of Failure … and ‘the Walking Dead’ It Leaves behind” The Guardian. June, 2014. https://www.theguardian.com/technology/2014/jun/28/silicon-valley-startup-failure-culture-success-myth.

Page 90: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  86  

of this pursuit. It continues to drive nearly 40% of all traffic on the Internet.11 The company’s

search platform is an extremely complex piece of software that exists almost organically in the

Internet, constantly self-adjusting and evolving as the information around it changes. Google

implemented this fantastically successful system by maintaining an almost militant commitment

to important organizational principles.

First, Google has a very specific mission with its search product. It aims to solve the

information needs of its users in the best possible way, in the most efficient way.12 Google’s

steadfast commitment to this mission informs its analytical tools. The process of implementing a

search engine is entrenched in Big Data ideals by nature. There is little doubt that Google is

firmly committed to its belief in the power of Big Data.

Google dedicates significant resources to building out its analytic capabilities to

capitalize on this mission. In fact, it was the pioneer for a prevalent paradigm for large-scale

analytics in practice today.13 Since then, it has continued to improve its analytic capabilities and

has entire teams of research staff committed to making improvements in their infrastructure and

their processes.14 This research often results in novel tools that improve analytical capability or

improve access to previously cumbersome tools.

The search giant is uncompromising in its commitment to quality, not allowing anything

to subvert its core objective of democratically exposing the world’s information to Internet users.

For example, the company is extremely strict about the purity of its search results, not allowing

                                                                                                               11 Worstall, Tim. “Fascinating Number: Google Is Now 40% Of The Internet.” Forbes. Accessed May 15, 2016. http://www.forbes.com/sites/timworstall/2013/08/17/fascinating-number-google-is-now-40-of-the-internet/. 12 “Ten Things We Know to Be True” – Google. Google https://www.google.com/about/company/philosophy/. This source is the proclamations of the core values of the company, among which is the dedication to an uncompromised search. 13 Dean, Jeff, and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters.” Operating Systems Design and Implementation: Google, Inc., 2004. http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf. 14 Sato, Kazunori. “An Inside Look at Google BigQuery.” Google, 2012. https://cloud.google.com/files/BigQueryTechnicalWP.pdf.

Page 91: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  87  

any type of internal or external incentives shape the way that these results are displayed to a

user.15 Not even ads, which generate the lion’s share of its profits, are allowed to have any

influence on the search algorithms. Google is committed to getting search results as close as

possible to the information need of the user, not necessarily as close to its preferred short-term

profit model.

The way in which Google pursues its goals of relevant search results is a lesson in

collaborative information environments. There is an understanding that building out products as

expansive as Google Search is not possible without constant collaboration between teams of

engineers. It promotes this collaboration by concentrating on the quality of internal

communications.16 It emphasizes to its employees that they are not at the company simply to

perform a certain task that is handed to them. Instead they are encouraged to become “topic

experts” and to share their expertise with those around the company. The idea is that the

collective set of knowledge will produce solutions that were otherwise impossible to conceive of.

Decision-making is also a highly collaborative process at Google. Nick Leeder, the CEO

of Google France, claims that the managers are not instructed to override employees. Instead,

employees are there to “encourage consensus” among their engineering and analytical teams.17

By requiring that employees collectively agree on forward steps, Google effectively incentivizes

collaboration, as each engineer’s ideas must pass muster with his or her fellow employees. This

is a significant departure from the strict hierarchical structure that has remained entrenched in

intelligence organizations for decades.

                                                                                                               15 “Ten Things We Know to Be True”, Google. 16 Dubois, David. “Google, the Network Company: From Theory to Practice.” INSEAD Knowledge, September 11, 2013. http://knowledge.insead.edu/leadership-organisations/google-the-network-company-from-theory-to-practice-2602. 17 Dubois.

Page 92: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  88  

The openness of information at Google also plays a central role in its analytical strategy.

Google’s central code repository is thought to house nearly two billion lines of code.18 Nearly all

of it is available to any engineer working at the company. Some very sensitive code is only

available to specific engineers, the core search algorithm code being an example.19 The rest is

available for engineers to download, read, and use for their own devices. The complex products

at Google actually share parts of code due to this structure, and one positive change made by an

engineer can improve the capabilities of everyone at the company. The openness allows

engineers and analysts to learn from the work previously done by others and avoid re-inventing

the wheel for many workflows. Udi Manber, the head of Google search products from 2010 to

2014 claimed that “if you need something you look around…If you don’t like what’s available

you build your own.”20

The open access to source code can be compared reasonably with the access to all-source

intelligence data at the NCTC. They are each core products of the sectors that they operate in and

represent the lifeblood of the respective organizations. Open access can significantly increase the

productivity of employees by allowing them to integrate the ideas of others into their own

workflows.

However, as Manber states, not all problems can be solved by using what someone else

has built. Sometimes completely new solutions must be developed. “Building your own tool” at

Google is not a one-step process. It requires teams of engineers to build a product, and teams of

data scientists to evaluate it. Manber describes an iterative relationship between the teams that

                                                                                                               18 Metz, Cade. “Google Is 2 Billion Lines of Code—And It’s All in One Place.” WIRED, September 16, 2015. http://www.wired.com/2015/09/google-2-billion-lines-codeand-one-place/. 19 Even if the core algorithms are hidden, a large part of the code is most likely still exposed. Furthermore, the information openness can help the search engineers improve their product, which is by far the most important at the company. 20 Udi, Manber. Interview by Ben Mittelberger. Email, April 22, 2016.

Page 93: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  89  

develop the search algorithms and the teams that evaluate them. Google spends a significant

amount of time coming up with ways to evaluate its prized engine.21 In a constant push and pull,

the teams propel each other forward in discrete steps, giving and getting feedback, sometimes

discarding changes, and other times keeping them.

The feedback process at Google is not an impulsive process driven by intuition; it is

deeply analytical in nature. It will almost never accept anecdotal evidence as sufficient proof of

the efficacy of a change. It instead relies on statistically significant figures to drive decisions.

This requires access to extensive logging information, something that Manber claims is

absolutely essential to providing any type of feedback.22 Even so, the engineers make plenty of

changes. In 2007 alone, Google launched more than 450 different modifications to its ranking

algorithms, more than one per day.23

The consensus-driven management style at the company also improves the iterative

process. Nick Leeder calls Google a “quasi-flat organization,” in which collaborators can interact

more freely, improving organizational flexibility in the face of difficult problems. The flatter

structure also gives managers better proximity to the working environment, giving the ultimate

decision makers the operational knowledge required to make good choices.

The problem of evaluating and iterating on a search engine may seem unconnected to

counterterror intelligence work; however there are unmistakable parallels. In both fields there is

no correct answer, and there is no end goal in sight. The environment is constantly changing, and

the answers are not entirely obvious. The intelligence community does not own a perfectly

                                                                                                               21 Mease, David, and Ya Xu. “Evaluating Web Search Using Task Completion Time.” Boston, MA: Google Research, 2009. http://static.googleusercontent.com/media/research.google.com/en//archive/dmease-sigir09-full.pdf. “Search Evaluation at Google.” Official Google Blog. Accessed May 2, 2016. https://googleblog.blogspot.com/2008/09/search-evaluation-at-google.html. 22 Manber, Udi. “Guest Lecture: CS276 - Information Retrieval and Web Search.” Stanford University, April 2016. 23 Mease.  

Page 94: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  90  

accurate terrorism database that it can check its conclusions against. Likewise, there is no “gold

standard” answer that Google can refer to when evaluating its engine either. They both exist in

degrees of certainty, and there is an understanding that there can always be improvements to

their methods.

Finally, the engineers and analysts who work at Google are the ones who back the

commitment to big data analytics, the open information sharing environments, and the robust

feedback mechanisms. The technical problems that they have solved would have stumped a large

majority of others working on the same issues. Their intelligence and work ethic cannot be

understated. Google’s success in this case is its ability to identify, recruit, and retain these

exceptionally talented employees.

Google’s main attractor for talented employees is the company culture of innovation and

engineering freedom.24 Smart analysts enjoy being given the agency to build out their own

conceptions of progress for the company. Google has a famous though unwritten policy of “20%

time” for its employees.25 Engineers and analysts are encouraged to go outside of their day-to-

day responsibilities to explore possible avenues for improvement and innovation. This time is not

required but the option exists and it exhilarates many of the employees who work there.

The benefits that are offered to employees are also hard to beat. They eat for free at any

of a variety of cafes around Google’s many campuses. They get free childcare, dry-cleaning,

transportation, and 100% health and dental coverage for them and their families. They can even

bring in their pets to work. These are not benefits that are generally offered by (aging)

                                                                                                               24 Kuntze, Ronald, and Erika Matulich. “Google: Searching for Value.” Journal of Case Research in Business and Economics, 2009. http://www.aabri.com/manuscripts/09429.pdf, 4. 25 D’Onfro, Jillian, 2015 Apr. 17, “The Truth about Google’s Famous ‘20% Time’ Policy.” Business Insider. Accessed May 2, 2016. http://www.businessinsider.com/google-20-percent-time-policy-2015-4.

Page 95: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  91  

government organizations. These benefits are in addition to overall better pay, meaning the

benefits compound with savings on general living expenses alongside higher salaries.

Facebook also shares a large number of traits with Google. It has an unwavering

commitment to analytics and embeds its data science team with responsibilities across the entire

organization.26 It has a comparable information environment, with code being open to a majority

of engineers.27 It is continually taking iterative steps with the features of its product. Finally, it

provides an excellent work environment for its employees and compensates them well. Yet,

while both Google and Facebook have similar operating environments, Facebook can provide

additional lessons that are also applicable to the structure of the NCTC.

Facebook decided, unlike the majority of other technology companies, to remove the

barrier between development, testing and, deployment. The same group of engineers and data

scientists does all three in tandem. When things go right or wrong, the employees receive the

feedback directly and must take collective responsibility if fixes are required. The result is a

product suite that is built by people who understand the full context of the environment that they

are developing for. The idea is that when engineers and data scientists are directly exposed to the

impacts of their actions they are able to produce better work. This type of direct involvement is

unrealistic for the intelligence community – it simply does not make sense for an analyst to be

involved in the application of their conclusions. In fact, it is antithetical to the role of intelligence

to make the policy decisions. Still, allowing a higher level of involvement to lower level analysts

in the eventual use of their work may give them perspective that is eventually useful for

improving their performance.

                                                                                                               26 Zimmerman, Thomas. “The Emerging Role of Data Scientists on Software Development Teams.” Redmond, WA: Microsoft Research, April 2015. http://research.microsoft.com/pubs/242286/MSR-TR-2015-30.pdf. 27 Feitelson, Dror. “Development and Deployment at Facebook.” IEEE Internet Computing 17, no. 3 (July 2013).

Page 96: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  92  

The lessons that Google and Facebook have learned over the past decade of explosive

growth have not gone unnoticed by the rest of the industry. The success of their business

structures has propelled a large number of companies to mirror them.28 There is a general

consensus among modern information giants that a commitment to analytics is necessary to

make further progress in the field. The majority of successful tech companies embed analytics

teams within their organizations and dedicate significant resources to capturing and structuring

their performance and usage metrics. They are constantly seeking new directions that could lead

to new and innovative problem solutions. In doing so, they keep pace with the present and

prepare themselves for the future.

Bigger May Not Always Be Better

So far, comparisons have been made between the NCTC and similarly sized

organizations: ones with tens of thousands of employees. They are behemoths that have the

ability to throw massive numbers of people and resources at large problems and hopefully solve

them. However, not all organizations that succeed in tackling important data analytic issues have

comparable workforces or resources. In some situations the work is not necessarily a resource

issue and is instead a focus and agility one.

The constantly shifting startup market in Silicon Valley is a manifestation of this idea.

Even Google and Facebook began as small enterprises focusing on specific and difficult

questions. The difference between the larger monolithic organizations and the nimble startups is

their level of focus and commitment to the mission. Larger organizations are constantly slowed

                                                                                                               28 Dill, Kathryn. “‘It’s OK If They Copy Us’: Google’s HR Chief On The Upside Of Giving Away Staffing Secrets.” Forbes. March, 2015. http://www.forbes.com/sites/kathryndill/2015/03/25/its-ok-if-they-copy-us-googles-hr-chief-on-the-upside-of-giving-away-staffing-secrets/.

Page 97: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  93  

by their legacy commitments and pour significant resources into maintaining them.29 Smaller

teams are unencumbered by aging infrastructure and institutional inertia; they are free to work on

purely forward-thinking pursuits. They commit fully to new and big ideas, building them out,

and constantly improving them.

The dedication to innovative and untested projects is ideal for smaller and highly agile

teams. First, the nature of unexplored paths is that of uncertainty. Tim Junio, the CEO of a

successful network analytics startup called Qadium claims, “creating new technology and

product categories is an unguided process almost by definition. You don’t just wake up and

decide ‘I’m going to go from A to B.’”30 It takes bold, unguided moves to determine what works

and what does not. These moves are often much less intentional than many would like to believe

and they often end in failure. Junio himself admits that Qadium could have originally gone in

five different directions, but only one of them proved to be feasible. The fact that nearly 90% of

startups either fail to grow or die out is proof that most of the time “innovative” ideas do not

work. The small scale of these projects is what makes this failure palatable; it would be

unacceptable if a billion dollar company went belly-up after a committing fully to a new

direction.

A startup’s ability to fail may be necessary to achieving truly new capabilities.31

Innovation rarely results from playing a safe game and focusing on short-term requirements for

the organization. Risks have to be taken, and long-term improvement must be placed on at least

equal footing as daily operations. Smaller organizations have the luxury of being able to make

                                                                                                               29 Blank, Steve. “Why the Lean Start-Up Changes Everything.” Harvard Business Review, May 2013. http://www.vto.at/wp-content/uploads/2013/10/Why-the-Lean-Startup-Changes-Everything_S.Plank_HBR-052013.pdf. 30 Junio, Tim. San Francisco. In Person Interview, January 13, 2016. Qadium is a network sensing company that scans public facing devices on the “dark net” for a variety of customers. Their focus is on cyber-security. More information can be found at www.Qadium.com. 31 Blank, 5

Page 98: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  94  

these commitments, something that existing bureaucracies do not have. For example, the

growing analytics company Palantir allows its engineers and analysts to stop work (as they are

able) for an entire week during its annual “hack week.”32 Employees are encouraged to take this

time to come up with possible new directions for the company, even at the expense of short-term

objectives. These “hack weeks” have been wildly successful, and many of the innovative

analytical offerings of Palantir have come from them.

Large resource pools are not necessary to make these types of innovative leaps. The

smaller team sizes can make the innovative process more effective, not less. Junio claims that in

his experience, and the experience of many other entrepreneurs, larger teams are often

unnecessarily bloated, causing disorganization and a loss in quality.33 What he opts for instead is

a small, focused team of committed and skilled engineers. This view is not necessarily

revolutionary and has been studied for decades.34 However, lean technology startup

environments take it into a more modern context.

Small teams might actually be more likely to fulfill the structural requirements for proper

analytics and development. They can be fully committed to their end objectives because they

have few other processes to run. Their information environment is inherently open because the

team is not large enough to contain organizational information barriers. Iteration and feedback

are necessary because the team itself may be too small to take large steps. Feedback is easy to

get because each employee is embedded in the process of developing the product. The personnel

are compelled to stay and work hard by the ownership that they feel over what they are working

                                                                                                               32 Trump, Whitney. “Palantir Hack Week 2015.” Palantir. August, 2015. https://palantir.com/2015/08/hack-week-2015. 33 Tim Junio, 2016. 34 Carmel, Erran, and Barbara J. Bird. “Small Is Beautiful: A Study of Packaged Software Development Teams.” The Journal of High Technology Management Research 8, no. 1 (1997). http://www.sciencedirect.com/science/article/pii/S1047831097900171.

Page 99: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  95  

on. While it is no doubt required that startups focus on maintaining these concepts, they likely do

not have to reshape their organizational structure to fulfill them.

It is obviously not possible for NCTC to mimic the structure and capabilities of an agile

startup: it is an already large organization operating in cooperation with over a dozen even larger

ones. However, it does not mean that the lessons learned from the failure-ridden culture of

Silicon Valley startups can be ignored. As the NCTC and the larger intelligence community are

faced with increasingly complex data analytic problems, they must begin to search for more

effective methods of developing innovative techniques.

The Comparability of the NCTC

The NCTC is not as effective as these organizations when it comes to handling big data,

both from a technical and an organizational perspective. They simply cannot compete in terms of

compute power, innovative potential, or analytic capability. However, the companies described

in this chapter are the gold standard for the world of big data analytics. They invented the entire

computational paradigm, and it makes sense that they can leverage it better. Additionally, the

NCTC is not a private company; it is a taxpayer funded government organization that handles

highly classified information. Therefore, before enumerating recommendations for the NCTC in

the next chapter, it is first necessary to understand the limitations that restrict the NCTC’s

actions.

Operational Limitations of the NCTC

Even if the organizational inertia were surmountable, which it most likely is not, the

NCTC is not able to fully mimic a successful Internet giant or technology startup. To start, it

Page 100: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  96  

does not have the ability to dictate its own goals and measurable objectives.35 Directors of the

center were not responsible for the drafting of its role and operational goals and therefore. They

inherited the incomplete organization from its creators. NCTC directors and their executive staff

do not have the same power as a private sector Board of Directors and CEO. They are instead

beholden to the decisions of the legislative and executive branches of the federal government.

The political goals of these two bodies are not necessarily aligned with the analytical goals of the

center and may even hamstring it by stripping it of unpopular (though necessary) authority.36

Furthermore, political interests may force the center to focus on more short-term goals, instead of

investing in long-term strategy and innovation.

Politicians must seem that they are making forward progress and making their

constituents safer in a very concrete way.37 They must communicate the narrative that they have

fixed the problematic processes with their reforms. An example is the Transportation Security

Administration (TSA), which was hastily thrown together within a month of 9/11. It was a quick

fix that seemed reasonable and timely, but proved to be ill conceived and demonstrated

considerable shortcomings. It has not shown to be effective at interdicting possible terrorists, and

a previous director, James Loy, has claimed that it is an “abominable failure”.38 However, in

November of 2001, Americans were terrified. Focusing on less visible and longer-term safety

and intelligence measures would not have been feasible politically; a new program had to be

developed, and it had to be developed quickly. While the NCTC’s creation was more measured,

it is another example of a highly visible security reform that has not had a stellar track record.

                                                                                                               35 Refer to chapter one, especially the description of Amy Zegart’s Book Spying Blind, which describes the dozens and reports and hundreds of recommendations that went unimplemented in the 1990’s. 36  Betts, Richard. Enemies of Intelligence. New York, NY: Columbia University Press, 2009, 135. 37  Betts,  136.  38  Lerner, Adam B. “TSA, an ‘Abominable Failure.’” POLITICO. http://www.politico.com/story/2015/06/tsa-airport-security-failure-jeh-johnson-118557.html.

Page 101: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  97  

Nonetheless, some political goals can be well founded and affect important structural

elements of the organization. Protection of classified information is paramount politically and in

the intelligence generation process. Disseminating sensitive data to more groups inherently

makes the information less secure.39 More points of access mean more endpoints to protect. In a

computational environment hacking can become a serious concern. Each additional location that

data is available gives another opportunity to a potential hacker. In reality, having a segmented

data store is much more secure, as a breach to one store is much less likely to propagate across to

other protected databases. Unfortunately, outside threats are not the only concerns for

information security. With more internal access, the set of potential insiders leaking information

increases significantly, especially when they are not specifically cleared to see that information.

In the words of former CIA director James R. Woolsey: “sharing is fine if you’re not sharing

with the Walkers, Aldrich Ames, Robert Hanssen, or some idiot who just enjoys talking to the

press about how we are intercepting bin Laden’s satellite telephone calls.”40

Breaches in information security are much more serious in the intelligence community

than they are in the private sector. If, due to its open access policy, top-secret Google search code

is leaked, the company may lose a technological edge over competing search engines. If top-

secret information about CIA or FBI informants is leaked, people may lose their lives. The

political costs could be enormous. A concrete example of major political fallout is the 2011 NSA

leak perpetrated by Edward Snowden.41 He was given very open access to a large portion of the

                                                                                                               39 Putbrese, Daniel. “Intelligence Sharing: Getting the National Counterterrorism Analysts on the Same Data Sheet,” Atlantic Council International Security Papers, 2006. http://www.atlanticcouncil.org/publications/reports/intelligence-sharing-getting-the-national-counterterrorism-analysts-on-the-same-data-sheet, 14. 40 Woolsey, R. James. R. James Woolsey Testimony To U.S. Senate Committee on Governmental Affairs, 2004. 41 Kerr, Orin. “Edward Snowden’s Impact.” The Washington Post, April 9, 2015. https://www.washingtonpost.com/news/volokh-conspiracy/wp/2015/04/09/edward-snowdens-impact/.

Page 102: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  98  

data stores at the NSA.42 The only barrier to him stealing thousands of sensitive documents was

his security clearance contract, which he broke. This is not a political commentary on the

revelations of Snowden, but instead an observation that those with universal access to data can

do serious damage to the organizations that they work for.

The inherent requirements of government also hinder the ability for innovation and

iterative feedback. It is impossible for the NCTC to simply drop its current workflows in favor of

developing long-term innovative models. It has pressing responsibilities that it cannot lose sight

of. Furthermore, it is difficult to determine the number of resources that can reasonably be set

aside while maintaining an adequate focus on current terror threats. There is no definition of

what “reasonable” is; the environment is too complex. This is especially difficult when the

NCTC is under constant pressure from all sides to perform its defined mission. To what extent

can the NCTC de-emphasize its traditional workflows?

Feedback mechanisms in government make iteration even more difficult. The feedback

structures that might inform long-term strategy extend outside of the agency itself. If the

president makes a decision, his or her detailed thoughts on the quality or breadth of intelligence

may not effectively trickle back down to the analysts who generated the base intelligence.

Finally, the NCTC’s staffing issues may stem from broader organizational requirements

that extend to the rest of government. Hiring processes are strict, and they cannot ignore the risks

of making the hiring process too open. Budgeting also plays a large role in hiring, as the NCTC

often does not have the tools to make itself a more desirable employer. It is beholden to the

discretion of the Office of Budget Management (OMB), where conceptions of employee value

                                                                                                               42 Greenwald, Glenn, Ewen MacAskill, and Laura Poitras. “Edward Snowden: The Whistleblower behind the NSA Surveillance Revelations.” The Guardian, June 11, 2013. http://www.theguardian.com/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance.

Page 103: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  99  

may not be aligned with those in the intelligence community.43 This makes it difficult to improve

perks and salaries for its most valued employees.

The NCTC struggles against these operational requirements. In some ways it has been

able to inch toward solving these problems and is evolving toward a better role in the

community.44 It has already come a long way. Still, it has yet to remedy many issues.

Fortunately, these limitations do not completely hamstring the NCTC. While described

limitations may make it impossible for the NCTC to fully replicate the structures of massive

Silicon Valley companies, there remain many positive moves for the NCTC to improve its big

data analytic capabilities.

Conclusion

At a certain scale, many data problems begin to look alike. As the volume of data on the

Internet grows larger, and private companies become bolder in their attempts to capture and

analyze it, the tools that they build become more applicable to modern counterterror intelligence.

The exceedingly complex analytical products and models that Internet giants like Google and

Facebook have created are proof that it is possible to control the the gushing, incoherent fire-

hose of data that is all-source intelligence. Furthermore, the harder analytical problems that have

not yet been addressed do not necessarily need massive teams or resources addressing them. The

lean startup model is becoming increasingly popular when it comes to solving hard data

problems, providing even more inspiration for the NCTC’s data needs.

                                                                                                               43 Kravinsky, Robert. “Toward Integrating Complex National Missions: Lessons From The National Counterterrorism Center’s Directorate of Strategic Operational Planning.” Project On National Security Reform, February 2010. http://0183896.netsolhost.com/site/wp-content/uploads/2011/12/pnsr_nctc_dsop_report.pdf, 81. 44 ODNI Public Affairs. “NCTC 10 Years Later - A Decade of Service,” September 2014, 5.

Page 104: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  100  

These companies are not successful because of their technology alone. They were only

able to build and use their technology because of their structures and processes. These specific

organizational and cultural traits allow them to leverage data, employees, and technology in

effective ways. They are committed to their missions, they practice openness and collaboration,

they are constantly attempting to make changes to improve their models, and they all work hard

to recruit and retain extremely smart and motivated employees. It is unlikely that these

companies would have enjoyed the same level of success without a strict adherence to these

organizational principles.

One of the largest lessons to be learned from these companies is their acknowledgement

of the complexity that they face in their mission. They do not hold the illusion that they know

what the future holds, but their structures allow them to create flexible products and models that

can accommodate a continuously shifting information environment. In fact, an adherence to

traditional structures that favor older practices of business development can leave an

organization reeling in the face of modern information needs.

It would be convenient to simply export the successful traits from these companies to the

NCTC. Unfortunately, government intelligence agencies cannot simply realign themselves with

the same incentives as private organizations. The NCTC has its own strict requirements that

block the implementation of many of these structures. Furthermore, a realistic approach

recognizes the past attempts at major change and the organizational energy required to institute

even small changes in a structure as large as the interagency counterterror intelligence

community. Massive structural shifts are just not possible, and instead the NCTC must be

nudged in the right direction with a set of targeted changes.

The importance of learning these lessons now cannot be understated. The NCTC has

Page 105: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  101  

done a huge amount of good in the intelligence community, but it could have done so much

more. Its directors and cadre have consistently struggled against its crippling bureaucratic

clumsiness and relative lack of authority. They rail against the forces that keep them in the dark

and other agencies that maintain an iron grip over their precious data.

Page 106: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  102  

Chapter 5

Conclusion: Looking Back and Moving Forward

Intelligence work is not what it used to be. For decades, as the Cold War simmered, the

United States developed a very specific set of intelligence capabilities geared toward monitoring

the Soviet threat. However, after the fall of the Soviet Union, both the nature of the world’s

threats and its informational structures began evolving rapidly. With the disappearance of its

adversary, the intelligence community’s informational mooring point had dissolved completely.

As the US attempted to maintain control in a unipolar world, potential threats became

increasingly global and decentralized.

The intelligence environment has become exceedingly complex in ways that the

community was not designed to tackle. The complexity is in large part due to the increasing

speed of information creation and dispersal. Computers have spawned an era of democratized

and instant communication, causing an explosion in the amount of available data. This massive

quantity generates noise that can obfuscate the true threats that intelligence analysts look for.

Thankfully, the technology that creates this cacophony is also capable of helping sift and sort this

data, giving valuable insights to modern computational analysts. The information revolution

represents a race between the growing complexity of information, and the attempts to control it.

In the past several decades, there are few fields that have not been significantly impacted

by the meteoric rise of information technology. Computers first began as useful extra tools that

could perform specific computationally intensive jobs very quickly, but were used only in

Page 107: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  103  

specialized contexts. As hardware and software evolved, computers began to take increasingly

central roles in offices where they were used.

Their storage and analytical power have completely transformed the way in which

information-based work is completed. Humans working with data understand it in ways that is

linked to the computational tools used to evaluate it. Physical and email conversations flow into

each other, each being essential tools in collaboration. Writing products are malleable pieces of

data instead of finished, printed pages. Many analysts now conceive of information as it exists in

an excel spreadsheet. Often, the work that people do cannot be divorced from the tools that they

used to do it.

This marriage between human intellect and computational power has its limits.

Contemporary datasets can become so complex that they are beyond the ability of human

analysts to comprehend, let alone manipulate. Furthermore, datasets can get so large that their

size is beyond the limits of conventional computational tools. In the Internet age, the data being

collected each day more than fulfills these traits. These datasets, which require heavyweight

hardware and software infrastructure to manage properly have come to be known as “Big Data.”

While it is notoriously difficult to handle properly, the breadth and depth of information

contained in these datasets can provide insights that would have otherwise remained hidden.

These opportunities for generating enhanced insights extend into the realm of counterterror

intelligence.

Over the past decade, the technical challenges of Big Data have mostly been overcome,

and the methods to do so are becoming increasingly accessible. Dozens of companies develop

and sell their data solutions, creating a competitive and open marketplace, driving down prices

and improving quality. However, despite the technical improvements, analytics remains a

Page 108: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  104  

complex, difficult task. But technology is no longer the main limiting factor; the challenges of

Big Data have shifted from technical to organizational ones.

Simply having the technology to store and analyze Big Data does not guarantee any

results. It is entirely possible for an organization to acquire a multi-billion dollar Big Data

infrastructure, and not derive value from it. The organization must be structured in a way that

supports the implementation and growth of computational analytics. The proper structures and

processes for a successful Big Data organization can be broken up into four main categories.

First, the organization must have a detailed and strong commitment to analytics. Without

this, integrating Big Data processes with traditional workflows will most likely flounder. Second,

there must be an open information culture that encourages data sharing and analytic

collaboration across the organization. Analysts must be incentivized to work with each other, and

share their methods and conclusions. Third, there must be distinct set of processes and structures

dedicated to feedback and iteration on analytical methods and conclusions. The process of taking

small, focused steps in analytics has been shown to produce better computational models and

conclusions. Fourth, an organization must build out a culture that recruits and retains the best

possible analysts and data scientists.

These characteristics are central when it comes to investigating the Big Data capabilities

of the National Counterterrorism Center. Though the NCTC is in a position to acquire high-end

Big Data technology from the private sector, it cannot outsource its internal structure and

analytical culture. It must therefore focus on these aspects as it looks to move forward.

Page 109: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  105  

The State of the National Counterterrorism Center

From its beginning, the NCTC had more than just data and analytic challenges to face. It

was established to fix the entrenched organizational problems that the rest of the Intelligence

Community had tried – and failed – to solve. Dozens of reports over the period of a decade had

attempted to push the IC in the correct direction. Only after the events of 9/11 was Congress able

to make the herculean effort to unite the agencies responsible for counterterror intelligence.

To its credit, the NCTC has made significant contributions to the quality and

comprehensiveness of intelligence in counterterror. Its establishment propelled the entire

community forward in terms of collaboration and technical integration. Unfortunately, some

marked failures have exposed the institutional flaws that it has yet to resolve. Investigations have

found that the NCTC is not adequately centralizing intelligence and still has trouble providing

the necessary context to many counterterror analysts working there. When the NCTC is

examined through the lens of big data processes, the underlying causes of these failures become

clearer.

The NCTC exhibits shortcomings in multiple of the structural categories that are essential

for properly leveraging data-oriented intelligence:

1. It lacks the required specific analytic objectives required for embedding analytics into the

organization. Without more specified objectives and detailed plans on how to reach them,

the center may drift back to its traditional workflows. This is especially important in the

case of the NCTC, as it operates in an interagency space and is responsible for

overcoming the traditionalist inertia of over a dozen other agencies.

Page 110: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  106  

2. The center also struggles with providing a truly centralized data store for the intelligence

agency. Different political and bureaucratic requirements can limit the access to data for

many analysts, meaning they are unable to access the contextual information that exists in

a wide variety of databases. Furthermore, competing incentives between employees and

between organizations can push them to avoid good data sharing practices.

3. The isolation produced by these competing incentives can significantly reduce the

amount and quality of feedback that many analysts get, as they are often only interacting

with analysts of their own agencies. Additionally, adding the bureaucratic layer of the

NCTC creates even more hierarchy, further separating analysts from the end results of

their intelligence products. The lack of proper feedback structures may actually stifle

innovation, restricting newer, more effective forms of analysis.

4. The NCTC faces serious barriers in recruiting and retaining the best analysts and

engineers. Significant pay differences combined with a lackluster working environment

means that it will lose many potentially elite people to the private sector. This is not to

say that the NCTC does not employ extremely talented analysts, but it does miss out on

many opportunities for top-tier talent.

As computational data becomes ever more central to analytical workflows, the NCTC

should focus on improving its capabilities by developing plans and strategies to address each of

these key issues. While it has limitations that are necessary and appropriately related to its

governmental role, it could benefit from further integration of successful structures found in the

Page 111: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  107  

private sector. There are concrete steps that the NCTC may take in order to improve its

capabilities and produce better analytical results.

Recommendations for the National Counterterrorism Center

These recommendations are given with a full understanding of the history of intelligence

“reform.” Many previous attempts have failed to produce any real change in the community. On

the other hand, some smaller, more incremental changes may have a higher possibility of being

implemented, and can push the NCTC in a better direction. As a further disclaimer, it is not

possible to outline specific steps that the NCTC should take, as this would require detailed

operational knowledge of the center. This is information that is not readily available. These

recommendations instead attempt to identify effective and realistic guidelines that the NCTC

should follow for meaningful improvement.

Operational Mission and Capabilities

The NCTC’s stated missions should be reevaluated, and be rewritten in ways that

specifically designate its role and its desired capabilities. The original conception of the NCTC

was thrown into a fragmented intelligence environment with little regard as to how it would

build its competencies. Its capabilities and focuses grew organically, following paths of least

resistance, and filling in only cracks that were easy to fill. An updated conception of the NCTC

should include its specific access requirements to databases at other agencies in the counterterror

Page 112: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  108  

community. It should similarly include desired metrics for the integration of data.1 This would

replace the vague goal of “improved information sharing.” Similar to the way that Google

measures its search engine performance, the NCTC would have a benchmark on progress that it

is making in terms of information centralization. Granted, this will not immediately solve the

data access problems that exist in the community, but it gives a clearer target that the community

is working toward. While they might meet significant resistance, at least the NCTC officials

would not be aimlessly attempting to find a role for the center.

The NCTC can only do so much about this from within and requires congressional

assistance. The Congressional Intelligence Committees are responsible for the original language

of the NCTC, and have the power to change its dedicated mission. Though it may be politically

difficult, they should take incremental steps to improve both the specificity of the NCTC’s

mission and its authority to carry it out. They should work closely with both IC officials and data

scientists to understand the needs of modern intelligence work.

Internal Operating Structure and Information Procedures

It is impossible for the NCTC to make sweeping organizational changes without

legislative aid; however, its leadership can still have an impact on its future. The NCTC’s

leadership determines the internal procedures that back the analytical products of the center.

Though they may not fix every problem, high-ranking officials can remove small barriers that

have large impacts on analytical capability.

It may be the case that information-sharing problems in the intelligence community will

                                                                                                               1 Possible questions to ask to get metrics include: What sources were eventually used in a report? How were they accessed? What sources were checked? What sources could not be accessed due to information barriers? Measuring the answers to these questions can give a sense of progress, and also reveal pain points in the intelligence generation process.

Page 113: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  109  

never be solved fully. Top-secret intelligence information has inherent sharing limitations.

However, the excessive hurdles for information sharing can be overcome. Understanding the

human aspect of analytical collaboration is the first step. Often the choices to over-classify

information are made by individuals (not the institutions themselves), and these people can be

convinced to share. The NCTC should work to avoid pitting these analysts against each other

through rigid performance measurement and draconian punitive structures. Instead, it should

work to produce metrics that incorporate collaboration as a goal, and reduce the regulations on

shared information. This has been addressed in the private sector by evaluating employees not

only through their direct contributions but also by the utility that they bring to their peers.

The incentive structures that exist within the center should be also modified to encourage

a focus on both short and long-term goals. Innovation rarely occurs when employees are

focusing solely on routine intelligence reports. Analysts and data scientists should be required to

finish their urgent daily work, while also taking time to look to the future. This can accomplished

by further modifying incentive structures that reward analysts for valuable products that may not

be immediately useful but might be in the long-term.

In order to accelerate its innovative capabilities, the NCTC can learn from the lean

startup model and begin instituting separate innovation-focused programs that move alongside

the daily operations at the center. A focus on small, prototypical products can produce amazing

dividends in the long run. These ideas are making their way to government, and are currently

being implemented at the Pentagon with Secretary Carter’s Innovation Initiative.2 The

Department of Defense (DoD) is putting together an innovation advisory board that sits outside

                                                                                                               2 Carter, Ashton. “Drell Lecture: "Rewiring the Pentagon: Charting a New Path on Innovation.” Stanford University, April 2015. http://www.defense.gov/News/Speeches/Speech-View/Article/606666/drell-lecture-rewiring-the-pentagon-charting-a-new-path-on-innovation-and-cyber.

Page 114: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  110  

of the traditional military planning structure.3 The board’s role is to identify potential innovative

directions for many of the Pentagon’s departments.

In the context of the NCTC, smaller, agile teams like the DoD innovation advisory board

may operate outside of the structure that has already crystallized around the center and its sister

agencies. They would be dedicated to long-term technological and analytical models to facilitate

better counterterror intelligence products. They would also be unfettered by the current feedback

structures and pressures of daily counterterror work. When these smaller teams manage to

construct something useful, they can work to integrate the product into daily workflows. Using

this model, the NCTC can practice its traditional techniques, while improving its effectiveness in

parallel.

Currently, intelligence community leadership attitudes are trending in the direction of

modernization. Senior intelligence officials have noticed the issues and are acknowledging the

difficulties of a complex environment. As a result, they are beginning to make strong pushes for

a more computational approach to problem solving and analytics.4 Furthermore, policymakers

are beginning to understand the new information environment, and have begun imposing more

modern informational requirements on intelligence agencies.5 The institutional will exists to

make these necessary changes, but it must be translated into action.

Personnel

The NCTC’s personnel problem is difficult to solve, especially considering the size and                                                                                                                3  “Pentagon to Establish Defense Innovation Advisory Board.” US Department of Defense. http://www.defense.gov/News-Article-View/Article/684366/pentagon-to-establish-defense-innovation-advisory-board. 4 Kerbel, Josh. “The Complexity Challenge: The U.S. Government’s Struggle to Keep Up with the Times.” The National Interest, August 2015. 5 Savage, Charlie. “Obama Administration Set to Expand Sharing of Data That N.S.A. Intercepts.” The New York Times, February 25, 2016. http://www.nytimes.com/2016/02/26/us/politics/obama-administration-set-to-expand-sharing-of-data-that-nsa-intercepts.html.

Page 115: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  111  

complexity of the intelligence community. However, the NCTC can still take steps to improve

the quality of its workforce, even with major budgetary constraints. As elaborated earlier, simply

throwing more analysts at the growing information base cannot solve many of these problems. In

fact, more may not be better: it may be the case that smaller numbers of focused analysts can

actually produce better results.

In fact, the counterterror community may simply be bloated with a large number of

analysts. Even employees working at the center acknowledge that there are far too many of

them.6 They claim that the number of counterterror analysts actually produces a negative impact

on intelligence work, as there are too many people competing with each other. The current

numbers are so high that many analysts are unable able to produce separate work; they are only

stepping on each other’s toes.

The NCTC can address this problem this by removing analysts who do not perform well

with modern analytic techniques. However, instead of re-hiring new analysts to compensate for

lost numbers, it should consider keeping the downsized teams. First, the smaller number of

employees simplifies operations, and reduces the bloat that exists in the organization. Second,

the NCTC would also be dealing with freed resources. The strict budgeting requirements of

government mean personnel expenses are a zero sum game. If the teams gets smaller and the

budget remains the same, then the ratio of resources to employees goes up. This could result in

higher salaries and better employee benefits. The improved compensation and working

environment can lure top-tier talent away from the private sector. By engaging in employee

downsizing and resource redistribution, the NCTC can create a smaller, higher quality, and

better-tuned workforce.                                                                                                                6  Nolan, Bridget. “Information Sharing And Collaboration in the United States Intelligence Community: An Ethnographic Study of the National Counterterrorism Center.” PhD Dissertation, University of Pennsylvania, 2013, 158.

Page 116: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  112  

The NCTC can retain skilled analysts and engineers by putting them in a structure where

they are given the freedom and responsibility to work on new and interesting problems. If the

center places high-value employees in teams that are focused on innovation, they are more likely

to stay and work toward improving the capabilities of the entire center. This means that creating

innovation centric, lean-startup inspired teams has another added benefit: it can help alleviate the

pain that the NCTC feels when it aims to recruit and keep the best that the industry has to offer.

Focusing on the needs of its employees is extremely important for the NCTC. In the end,

it is they who are producing the intelligence used at the top levels. If they suffer, the quality of

their work suffers. The analysts and engineers who do intelligence work are the beating heart of

the NCTC, and their needs must be prioritized.

Lessons Beyond the National Counterterrorism Center The NCTC is not the first attempt by the federal government at centralizing intelligence.

As covered in Chapter One, the CIA was the original conception of an intelligence integration

center. It did not work out to be that simple. Lessons were learned from this failure, and the

intelligence community moved on and adapted accordingly with the Cold War information

environment. Since then, the world has globalized, and questions asked at intelligence agencies

have gotten much harder to answer. The NCTC was the first attempt to formally centralize

intelligence in this modern environment. The organizational and analytical lessons that have

been learned in the 21st century were not nearly as clear in 2004. However, given the difficulties

that the NCTC has experienced, some of the flaws of its original conception have been exposed.

It is imperative that the failures of this modern attempt at centralization are not repeated.

The creation of the National Counterterrorism Center was an exercise in solving a bureaucratic

Page 117: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  113  

problem with yet another layer of bureaucracy. The NCTC inherited the organizational inertia

and the technical debt of the other agencies. The plans, the structures, and even the employees at

the center fully mimicked the structures of the previously ineffective intelligence community.

What could it really change, if it was subject to the same rules as everyone else, except it had

more responsibility?

The case of the NCTC has implications beyond counterterror intelligence. The United

States Federal Government is going to continue to require data centralization efforts across all of

its branches, and all of its missions. The nature of an increasingly interconnected world dictates

this trend. As it moves to take advantage of large-scale computational analytics in all of its

forms, it must learn that it can no longer afford to continue operating “business as usual.” It must

learn lessons from those that have managed to build the data-oriented present and use them as it

looks to the future.

Page 118: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  114  

Works Cited “Alphabet: Number of Google Employees 2015 | Statistic.” Statista. Accessed May 3, 2016.

http://www.statista.com/statistics/273744/number-of-full-time-google-employees/. “Apache Hadoop,” n.d. http://hadoop.apache.org/. “Apache Hive,” n.d. http://hive.apache.org/. Bergen, Peter. “Do NSA’s Bulk Surveillance Programs Stop Terrorists?” New American Foundation, January 2014.

https://static.newamerica.org/attachments/1311-do-nsas-bulk-surveillance-programs-stop-terrorists/IS_NSA_surveillance.pdf.

Berner, Martin, Enrico Graupner, and Alexander Maedche. “The Information Panopticon in the Big Data Era.” Journal of

Organization Design 3, no. 1 (2014). Best, Richard. “Intelligence Information: Need-to-Know vs. Need-to-Share,” June 2011.

https://www.fas.org/sgp/crs/intel/R41848.pdf. ———. “The National Counterterrorism Center (NCTC)—Responsibilities and Potential Congressional Concerns.”

Congressional Research Service, December 2011. https://www.fas.org/sgp/crs/intel/R41022.pdf. Betts, Richard. Enemies of Intelligence. New York, NY: Columbia University Press, 2009. Blank, Steve. “Why the Lean Start-Up Changes Everything.” Harvard Business Review, May 2013. http://www.vto.at/wp-

content/uploads/2013/10/Why-the-Lean-Startup-Changes-Everything_S.Plank_HBR-052013.pdf. Boschee, Elizabeth, and Natarajan Premkumar. “Automatic Extraction of Events from Open Source Text for Predictive

Forecasting.” In Handbook of Computational Approaches to Counterterrorism, 1st ed. Springer Science, 2013. Brooks, Frederick. “No Silver Bullet -- Essence And Acccident in Software Engineering.” Univeristy of North Carolina at

Chapel Hill, 1986. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1663532. ———. The Mythical Man Month. Addison-Wesley, 1974. “Building a Scalable Big Data Infrastructure for Dynamic Workflows.” EMC2, n.d.

http://www.emc.com/collateral/solution-overview/h11761-building-scalable-big-data-infra-so.pdf. Burr, Richard. “Unclassified Executive Summary of the Committee Report on the Attempted Terrorist Attack on

Northwest Airlines Flight 253,” May 2010. http://www.intelligence.senate.gov/publications/report-attempted-terrorist-attack-northwest-airlines-flight-253-may-24-2010.

Carabin, David. “Intelligence-Sharing Continuum: Next Generation Requirements for U.S. Counterterrorism Efforts.”

Naval Postgraduate School, 2011. https://www.hsdl.org/?abstract&did=691253. Carmel, Erran, and Barbara J. Bird. “Small Is Beautiful: A Study of Packaged Software Development Teams.” The

Journal of High Technology Management Research 8, no. 1 (1997). http://www.sciencedirect.com/science/article/pii/S1047831097900171.

Caroll, Rory. “Silicon Valley’s Culture of Failure … and ‘the Walking Dead’ It Leaves behind” The Guardian. June, 2014.

https://www.theguardian.com/technology/2014/jun/28/silicon-valley-startup-failure-culture-success-myth. Carter, Ashton. “Drell Lecture: ‘Rewiring the Pentagon: Charting a New Path on Innovation.’” Stanford University, April

2015. http://www.defense.gov/News/Speeches/Speech-View/Article/606666/drell-lecture-rewiring-the-pentagon-charting-a-new-path-on-innovation-and-cyber.

Page 119: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  115  

Choucri, Nazli, Stuart Madnick, and Michael Siegel. “Improving National and Homeland Security Through Context Knowledge Represenation and Reasoning Technologies.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006.

“CIA Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/CIA-Salaries-E41381.htm.

Clark, Jack. “5 Numbers That Illustrate the Mind-Bending Size of Amazon’s Cloud.” Bloomberg Business, November

2014. http://www.bloomberg.com/news/2014-11-14/5-numbers-that-illustrate-the-mind-bending-size-of-amazon-s-cloud.html.

Col. Brian Reinwald. “Assessing the National Counterterrorism Center’s Effectiveness in the Global War on Terror.”

Masters Thesis, Army War College, 2007. “Columbus, Ohio Man Charged with Providing Material Support to Terrorists.” Department of Justice, April 2015.

https://www.fbi.gov/cincinnati/press-releases/2015/columbus-ohio-man-charged-with-providing-material-support-to-terrorists.

“Computing Is Still Too Clunky: Charlie Rose and Larry Page in Conversation.” TED Blog, March 19, 2014.

http://blog.ted.com/computing-is-still-too-clunky-charlie-rose-and-larry-page-in-conversation/. Davenport, Thomas, and D.J. Patil. “Data Scientist: The Sexiest Job of the 21st Century.” Harvard Business Review,

October 2012. Dean, Jeff, and Sanjay Ghemawat. “MapReduce: Simplified Data Processing on Large Clusters.” Operating Systems

Design and Implementation: Google, Inc., 2004. http://static.googleusercontent.com/media/research.google.com/en//archive/mapreduce-osdi04.pdf.

DeYoung, Dan Eggen, Karen, and Spencer S. Hsu. “Plane Suspect Was Listed in Terror Database after Father Alerted

U.S. Officials.” The Washington Post, December 27, 2009, sec. Nation. http://www.washingtonpost.com/wp-dyn/content/article/2009/12/25/AR2009122501355.html.

Dill, Kathryn. “‘It’s OK If They Copy Us’: Google’s HR Chief On The Upside Of Giving Away Staffing Secrets.”

Forbes. March, 2015. http://www.forbes.com/sites/kathryndill/2015/03/25/its-ok-if-they-copy-us-googles-hr-chief-on-the-upside-of-giving-away-staffing-secrets/.

D’Onfro, Jillian, 2015 Apr. 17, 842 46, and 1. “The Truth about Google’s Famous ‘20% Time’ Policy.” Business Insider.

Accessed May 2, 2016. http://www.businessinsider.com/google-20-percent-time-policy-2015-4. Dubois, David. “Google, the Network Company: From Theory to Practice.” INSEAD Knowledge, September 11, 2013.

http://knowledge.insead.edu/leadership-organisations/google-the-network-company-from-theory-to-practice-2602.

Eccles, Robert. “The Performance Measurement Manifesto.” Harvard Business Review, February 1991.] Edunov, Sergey. “Three and a Half Degrees of Separation.” Research at Facebook. Accessed May 1, 2016.

https://research.facebook.com/blog/three-and-a-half-degrees-of-separation/. “Edward Snowden’s Impact - The Washington Post.” Accessed May 4, 2016.

https://www.washingtonpost.com/news/volokh-conspiracy/wp/2015/04/09/edward-snowdens-impact/. Everton, Sean. Disrupting Dark Networks. Cambridge, UK: Cambridge University Press, 2012. Executive Order 13354: National Counterterrorism Center, 2004. http://www.nctc.gov/docs/eo13354.pdf.

“Facebook Research Scientist Salaries.” Glassdoor. Accessed April 15, 2016.

https://www.glassdoor.com/Salary/Facebook-Research-Scientist-Salaries-E40772_D_KO9,27.htm.

Page 120: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  116  

Feintzeig, Rachel. “U.S. Struggles to Draw Young, Savvy Staff.” Wall Street Journal, June 11, 2014, sec. Careers. http://www.wsj.com/articles/u-s-government-struggles-to-attract-young-savvy-staff-members-1402445198.

Feitelson, Dror. “Development and Deployment at Facebook.” IEEE Internet Computing 17, no. 3 (July 2013). “Final Report of the Task Force on Combating Terrorist and Foreign Fighter Travel.” Homeland Security Committee,

September 29, 2015. https://homeland.house.gov/wp-content/uploads/2015/09/TaskForceFinalReport.pdf. Fire, Michael, and Rami Puzis. “Link Prediction in Highly Fractional Data Sets.” In Handbook of Computational

Approaches to Counterterrorism. Springer, 2013. Fisher, Danyel. “Interactions With Big Data Analytics.” Interactions, June 2012.

http://dl.acm.org/citation.cfm?id=2168943. Freytas-tamura, Aurelien Breeden, Kimiko De, and Katrin Bennhold. “Call to Arms in France Amid Hunt for Belgian

Suspect in Paris Attacks.” The New York Times, November 16, 2015. http://www.nytimes.com/2015/11/17/world/europe/paris-terror-attack.html.

Galbraith, Jay. “Organization Design Challenges Resulting From Big Data.” Journal of Organization Design 3, no. 1

(2014). Gallagher, Sean. “What the NSA Can Do with ‘big Data.’” Ars Technica, June 12, 2013.

http://arstechnica.com/information-technology/2013/06/what-the-nsa-can-do-with-big-data/. Greenwald, Glenn, Ewen MacAskill, and Laura Poitras. “Edward Snowden: The Whistleblower behind the NSA

Surveillance Revelations.” The Guardian, June 11, 2013. http://www.theguardian.com/world/2013/jun/09/edward-snowden-nsa-whistleblower-surveillance.

Grossman, Robert. “Organizational Models for Big Data and Analytics.” Journal of Organization Design 3, no. 1 (2014). Hillis, Ken, Michael Petit, and Kylie Jarrett. Google and the Culture of Search. New York, NY: Routledge, 2013.

Ho, Kevin. “41 Up-to-Date Facebook Facts and Stats,” April 2015. http://blog.wishpond.com/post/115675435109/40-up-

to-date-facebook-facts-and-stats. Inspector General Roth, John. TSA: Security Gaps  : Statement of John Roth Inspector General, Department of Homeland

Security, Before the Committee on Oversight and Government Reform. US House of Representatives, 2015. https://oversight.house.gov/wp-content/uploads/2015/11/11-3-2015-Committee-Hearing-on-TSA-Roth-DHS-OIG-Testimony.pdf.

Johnson, Carrie. “Explosive in Detroit Terror Case Could Have Blown Hole in Airplane, Sources Say.” The Washington

Post, December 29, 2009, sec. Nation. http://www.washingtonpost.com/wp-dyn/content/article/2009/12/28/AR2009122800582.html.

Junio, Tim. Interview With Tim Junio. In Person, January 13, 2016.

Kerbel, Josh. “The Complexity Challenge: The U.S. Government’s Struggle to Keep Up with the Times.” The National

Interest, August 2015. Kerr, Orin. “Edward Snowden’s Impact.” The Washington Post, April 9, 2015.

https://www.washingtonpost.com/news/volokh-conspiracy/wp/2015/04/09/edward-snowdens-impact/. Knorr, Eric. “Anatomy of an IT Disaster: How the FBI Blew It.” InfoWorld, March 21, 2005.

http://www.infoworld.com/article/2672020/application-development/anatomy-of-an-it-disaster--how-the-fbi-blew-it.html.

Page 121: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  117  

Koren, Marina. “How the San Bernardino Shooters Planned for Jihad.” The Atlantic, December 9, 2015. http://www.theatlantic.com/national/archive/2015/12/san-bernardino-shooters-radicalization/419610/.

Kravinsky, Robert. “Toward Integrating Complex National Missions: Lessons From The National Counterterrorism

Center’s Directorate of Strategic Operational Planning.” Project On National Security Reform, February 2010. http://0183896.netsolhost.com/site/wp-content/uploads/2011/12/pnsr_nctc_dsop_report.pdf.

Kuntze, Ronald, and Erika Matulich. “Google: Searching for Value.” Journal of Case Research in Business and

Economics, 2009. http://www.aabri.com/manuscripts/09429.pdf.

Lazaroff, Mark. “Anticipatory Models for Counter-Terrorism.” In Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006.

Lerner, Adam B. “TSA, an ‘Abominable Failure.’” POLITICO. Accessed May 17, 2016.

http://www.politico.com/story/2015/06/tsa-airport-security-failure-jeh-johnson-118557.html. Lin, Herbert, and James McGroddy. “A Review of the FBI’s Trilogy Information Technology Modernization Program.”

National Research Council, 2004. Lunney, Kelly. “Public-Private Sector Pay Gap Remains at 35 Percent.” Government Executive. Accessed April 15, 2016.

http://www.govexec.com/pay-benefits/2014/10/public-private-sector-pay-gap-remains-35-percent/96830/. Manber, Udi. “Guest Lecture: CS276 - Information Retrieval and Web Search.” Stanford University, April 2016.\ Mannes, Aaron. “Qualitative Analysis & Computational Techniques for the Counter-Terror Analyst.” In Handbook of

Computational Approaches to Counterterrorism. Springer, 2013. Manyika, James, Michael Chui, and Brad Brown. “Big Data: The next Frontier for Innovation, Competition, and

Productivity.” McKinsey Global Institute, June 2011. Margolis, Gabriel. “The Lack of HUMINT: A Recurring Intelligence Problem.” Global Security Studies 4, no. 2 (spring

2013). http://globalsecuritystudies.com/Margolis%20Intelligence%20(ag%20edits).pdf. McConnel, Steve. “Managing Technical Debt.” International Conference on Software Engineering, 2013. http://2013.icse-

conferences.org/documents/publicity/MTD-WS-McConnell-slides.pdf. ———. “Origins of 10X – How Valid Is the Underlying Research?,” n.d.

http://www.construx.com/10x_Software_Development/Origins_of_10X_%E2%80%93_How_Valid_is_the_Underlying_Research_/.

Mease, David, and Ya Xu. “Evaluating Web Search Using Task Completion Time.” Boston, MA: Google Research, 2009.

http://static.googleusercontent.com/media/research.google.com/en//archive/dmease-sigir09-full.pdf. Metz, Cade. “Google Is 2 Billion Lines of Code—And It’s All in One Place.” WIRED, September 16, 2015.

http://www.wired.com/2015/09/google-2-billion-lines-codeand-one-place/. Meyer, Robinson. “The Unbelievable Power of Amazon’s Cloud.” The Atlantic, April 23, 2015.

http://www.theatlantic.com/technology/archive/2015/04/the-unbelievable-power-of-amazon-web-services/391281/. Mills, Steve, and Steve Lucas. “Demystifying Big Data: A Practical Guide To Transforming The Business of

Government.” IBM, 2012. https://www-304.ibm.com/industries/publicsector/fileserve?contentid=239170. Intelligence Reform and Terrorism Prevention Act of 2004, 2004. http://www.nctc.gov/docs/pl108_458.pdf. Newport, C.L., and D.G. Elms. “Effective Engineers.” International Journal of Engineers 13, no. 5 (1997).

Page 122: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  118  

Nolan, Bridget. “Information Sharing And Collaboration in the United States Intelligence Community: An Ethnographic Study of the National Counterterrorism Center.” PhD Dissertation, University of Pennsylvania, 2013.

Novet, Jordan. “How Facebook Matured Its Data Structure and Stepped into the Graph World,” June 25, 2013.

https://gigaom.com/2013/06/25/how-facebook-matured-its-data-structure-and-stepped-into-the-graph-world/. NSA. “NSA 60th Anniversary Book,” 2012. https://www.nsa.gov/about/cryptologic-heritage/historical-figures-

publications/nsa-60th/. “NSA Salaries.” Glassdoor. Accessed April 15, 2016. https://www.glassdoor.com/Salary/NSA-Salaries-E41534.htm. “Number of Facebook Users Worldwide 2008-2016 | Statistic.” Statista. Accessed April 30, 2016.

http://www.statista.com/statistics/264810/number-of-monthly-active-facebook-users-worldwide/. ODNI Public Affairs. “NCTC 10 Years Later - A Decade of Service,” September 2014. Owen-Smith, Jason. “Workplace Design, Collaboration, and Discovery,” 2013.

http://sites.nationalacademies.org/cs/groups/dbassesite/documents/webpage/dbasse_085437.pdf. Pavlo, Andrew. “A Comparison of Approaches to Large-Scale Data Analysis.” Paper presented at ACM SIGMOD

International Conference on Management of data New York, NY, 2009. http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf

Peled, Alon. “Coerce, Consent, and Coax: A Review of U.S. Congressional Efforts to Improve Federal Counterterrorism

Information Sharing.” Terrorism and Political Violence 1, no. 18 (August 2014). “Pentagon to Establish Defense Innovation Advisory Board.” US Department of Defense. http://www.defense.gov/News-

Article-View/Article/684366/pentagon-to-establish-defense-innovation-advisory-board.

Popp, Robert, and John Yen. Emergent Information Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006.

Putbrese, Daniel. “Intelligence Sharing: Getting the National Counterterrorism Analysts on the Same Data Sheet.” Atlantic

Council International Security Papers, 2006. http://www.atlanticcouncil.org/publications/reports/intelligence- sharing-getting-the-national-counterterrorism-analysts-on-the-same-data-sheet. Rasmussen, Nicholas. Hearing before the House Committee on Homeland Security “Countering Violent Islamist

Extremism: The Urgent Threat of Foreign Fighters and Homegrown Terror,” 2015.

Rumsfeld, Donald. “DoD News Briefing - Secretary Rumsfeld and Gen. Myers.” February 12, 2002. http://archive.defense.gov/Transcripts/Transcript.aspx?TranscriptID=2636.

Sagan, Scott, and Kenneth Waltz. The Spread of Nuclear Weapons: An Enduring Debate. New York, NY: W. W. Norton,

1995. Sato, Kazunori. “An Inside Look at Google BigQuery.” Google, 2012.

https://cloud.google.com/files/BigQueryTechnicalWP.pdf. Savage, Charlie. “Obama Administration Set to Expand Sharing of Data That N.S.A. Intercepts.” The New York Times,

February 25, 2016. http://www.nytimes.com/2016/02/26/us/politics/obama-administration-set-to-expand-sharing-of-data-that-nsa-intercepts.html.

Schmidt, Eric. Presented at the Techonomy Conference, 2010, n.d. http://techonomy.com/tag/eric-schmidt/. Schrodt, Philip, and David Brackle. “Automated Coding of Political Event Data.” In Handbook of Computational

Approaches to Counterterrorism. Springer, 2013.

Page 123: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  119  

“Search Evaluation at Google.” Official Google Blog. https://googleblog.blogspot.com/2008/09/search-evaluation-at-

google.html. Sharkey, Brian. “Information Processing at Very High Speed Data Ingestion Rates.” In Emergent Information

Technologies and Enabling Policies for Counter-Terrorism. IEEE Press, 2006. Silke, Andrew. “Research On Terrorism.” In Terrorism Informatics: Knowledge Management and Data Mining for

Homeland Security. University of East London School of Law, 2008. Singh, Arjun. “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network.”

Paper presented at ACM Sigcomm. London, UK, 2015. Sliva, Amy. “SOMA: Stochastic Opponent Modeling Agents for Forecasting Violent Behavior.” In Handbook of

Computational Approaches to Counterterrorism. Springer Science, 2013.

Stern, Jessica, and J.M. Berger. “ISIS and the Foreign-Fighter Phenomenon.” The Atlantic, March 2015. http://www.theatlantic.com/international/archive/2015/03/isis-and-the-foreign-fighter-problem/387166/.

Subrahmanian, V.S. Handbook of Computational Approaches to Counterterrorism. Springer Science, 2013. Sukumar, Sreenivas, and Regina Ferrell. “‘Big Data’ Collaboration: Exploring, Recording and Sharing Enterprise

Knowledge.” Journal of Information Services and Use 33, no. 3 (July 2013). Sullivan, Danny. “Google Launches Knowledge Graph To Provide Answers, Not Just Links.” Search Engine Land, May

16, 2012. http://searchengineland.com/google-launches-knowledge-graph-121585. “Ten Things We Know to Be True.” Google. Accessed May 1, 2016.

https://www.google.com/about/company/philosophy/. The New York Times. “Many Say U.S. Planned for Terror but Failed to Take Action.” The New York Times, December

30, 2001, sec. National. http://www.nytimes.com/2001/12/30/national/30TERR.html. “The Zettabyte Era—Trends and Analysis.” Cisco. Accessed April 22, 2016.

http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/VNI_Hyperconnectivity_WP.html.

Trump, Whitney. “Palantir Hack Week 2015.” Palantir. August, 2015. https://palantir.com/2015/08/hack-week-2015. Twitter, Krishnadev Calamur. “Some Clinton Emails Were Retroactively Classified.” NPR.org. Accessed April 13, 2016.

http://www.npr.org/sections/thetwo-way/2015/05/22/408774111/state-department-to-release-more-clinton-emails-today.

Udi, Manber. Interviewed by Ben Mittelberger. Email, April 2016. Uffe, Wiil. Counterterrorism and Open Source Intelligence. Springer, 2011. United And Strengthening America By Providing Appropriate Tools Required To Intercept And Obstruct Terrorism (USA

PATRIOT ACT) ACT OF 2001, 2001. http://www.gpo.gov/fdsys/pkg/PLAW-107publ56/pdf/PLAW-107publ56.pdf. Valacich, Joseph, and Christoph Schneider. “Managing the Information Systems Infrastructure.” In Information Systems

Today: Managing in the Digital World, 2013. Waltz, Edward. Knowledge Management in the Intelligence Enterprise. Artech House, 2003.

Page 124: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  120  

Warner, Michael. “Wanted: A Definition of ‘Intelligence.’” Journal of the American Intelligence Professional 46, no. 3 (2002). https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol46no3/article02.html#rfn7.

Wegener, Rasmus. “The Value of Big Data: How Analytics Differentiates Winners.” Bain & Company, 2013.

http://www.bain.com/Images/BAIN%20_BRIEF_The_value_of_Big_Data.pdf. Wood, Graeme. “What ISIS Really Wants.” The Atlantic, March 2015.

http://www.theatlantic.com/magazine/archive/2015/03/what-isis-really-wants/384980/. Woolsey, R. James. R. James Woolsey Testimony To U.S. Senate Committee on Governmental Affair, 2004. Worstall, Tim. “Fascinating Number: Google Is Now 40% Of The Internet.” Forbes. Accessed May 15, 2016.

http://www.forbes.com/sites/timworstall/2013/08/17/fascinating-number-google-is-now-40-of-the-internet/. Zegart, Amy. Spying Blind: The CIA, the FBI, and the Origins of 9/11. Princeton, NJ: Princeton Press, 2007. Zelikow, Phillip. “The 9/11 Commission Report: Final Report of the National Commission on Terrorist Attacks Upon the

United States,” July 22, 2004. https://www.gpo.gov/fdsys/pkg/GPO-911REPORT/content-detail.html. Zimmerman, Thomas. “The Emerging Role of Data Scientists on Software Development Teams.” Redmond, WA:

Microsoft Research, April 2015. http://research.microsoft.com/pubs/242286/MSR-TR-2015-30.pdf.

Page 125: In Data We Trust? The Big Data Capabilities of the ...km541mf6208... · data needs. It explores the concept of Big Data and the organizational structures required to leverage it effectively

 

  121