is6600-11 big data, intelligence & surveillance 1

33
IS6600-11 Big Data, Intelligence & Surveillance 1

Upload: leonard-daniel

Post on 19-Jan-2016

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: IS6600-11 Big Data, Intelligence & Surveillance 1

IS6600-11

Big Data, Intelligence & Surveillance

1

Page 2: IS6600-11 Big Data, Intelligence & Surveillance 1

2

Hype, Reality or …?

Page 3: IS6600-11 Big Data, Intelligence & Surveillance 1

3

Purpose

• The purpose of this class is to introduce the concept of Big Data, examine its potential and value for organisations and governments, as well as the downside effects on privacy

• There is also a lot of hype about Big Data• I also hope to stimulate your own thinking

about Big Data – and how it affects you

Page 4: IS6600-11 Big Data, Intelligence & Surveillance 1

4

Basics

• Big Data refers to the vast quantities of data that businesses and governments gather

• This data is *believed* to contain useful, actionable intelligence that could lead to – Process efficiencies– Lower costs, – Higher profits, – Identification of terrorism threats/plans

• What is needed is the will and expertise to perform the relevant analysis.

Page 5: IS6600-11 Big Data, Intelligence & Surveillance 1

5

How Big is Big?

• It depends on how quickly you can access and process data (with normal database management tools)

• For a small company, hundreds of gigabytes could be big. For a larger company, hundreds of terabytes– 1 terabyte = 1000 gigabytes– 1 petabyte = 1000 terabytes– 1 exabyte = 1000 petabytes

• Zettabyte, Yottabyte

Page 6: IS6600-11 Big Data, Intelligence & Surveillance 1

6

Size Contexts

• Some areas of science generate huge amounts of data:– Meteorology (weather forecasting) & Remote Sensing– Genomics (genome sequencing)– Physics, e.g. CERN

• 150 million sensors each deliver data 40 million times per second• Working with only 0.001% of the data collected, still 25 petabytes

a year is collected• If all data was used, it would be 500 exabytes a day – 200 times

more than all other global data sources combined– Social data, RFID data,– Surveillance – NSA & GCHQ

Page 7: IS6600-11 Big Data, Intelligence & Surveillance 1

7

The History

• Big Data is not a new topic– Data has been getting bigger continually ever since the first

byte was created– It is related to storage capacity and processing power –

which also keep growing continually

• Over the last 25 years, many governments have attempted to consolidate data holdings into single databases controlled by single parties– National ID Schemes– National Health Records Management

Page 8: IS6600-11 Big Data, Intelligence & Surveillance 1

8

Corporate Examples

• Amazon handles millions of back-end operations every day, as well as queries from more than half a million third-party sellers.

• Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes (2560 terabytes) of data

• Facebook handles 50 billion photos.• TaoBao & Alibaba – again, billions of transactions• Consumer profile databases, Loyalty Cards, Octopus• Park’n Shop’s Money Back Card is the same thing

Page 9: IS6600-11 Big Data, Intelligence & Surveillance 1

Ford• http://www.datanami.com/datanami/2013-03-16/how_ford_is_putt

ing_hadoop_pedal_to_the_metal.html

• Ford’s modern hybrid Fusion model generates up to 25 gigabytes of data per hour– Data that is a potential goldmine for Ford, as long as it

can find the right analytical tools for the job. • The data can be used to

– understand driving behaviors and reduce accidents, – understand wear and tear – identify issues that lower maintenance costs, – avoid collisions

• But who should own the data? Ford? The car owner?9

Page 10: IS6600-11 Big Data, Intelligence & Surveillance 1

10

Needles & Haystacks

• The volume of data is huge, beyond imagination, and the consultants and software firms want us to believe that somewhere, if you can find them, there may be some needles – pieces of actionable intelligence

Page 11: IS6600-11 Big Data, Intelligence & Surveillance 1

11

Who is Pushing Big Data?

• IBM!– Because they want to sell you their software that

(they claim) will help you to analyse the data and find the needles

• Consultants stand to make millions, by panicking their clients into spending on software solutions

• Globally, this is a US$100 billion industry, growing 10% a year

Page 12: IS6600-11 Big Data, Intelligence & Surveillance 1

12

Is Everyone Happy?

• The consultants suggest not. Accenture:– 22% of companies are very satisfied– 35% are quite satisfied– 34% are dissatisfied– 39% say that they have data that is relevant to

their business strategy• Big data can be useful – if you know what to

look for and how to get that ‘intelligence’ to the people who can use it

Page 13: IS6600-11 Big Data, Intelligence & Surveillance 1

13

Consultant Perspectives

• Companies have lots of data, but “most organisations measure too many things that don’t matter and don’t put sufficient focus onto the things that do” (Accenture).

• “Companies are buried in information” and are struggling to use it (McKinsey)

• The more data they have, the less they seem to know!– The more you know, the more you don’t know?!

Page 14: IS6600-11 Big Data, Intelligence & Surveillance 1

14

Then What Should the Companies Do?

• Spend more money (say the consultants)– “a large investment in new data capabilities”

• McKinsey– “embed analytics into business processes”

• Accenture

• Alternatively– Go and ask people what they think is happening!– Ask your lost customers why they got lost!

• A survey or big data analytics won’t tell you why.

Page 15: IS6600-11 Big Data, Intelligence & Surveillance 1

15

Gartner’s Hype Cycle

Page 16: IS6600-11 Big Data, Intelligence & Surveillance 1

16

Big Data and Intelligence

• One of the highest impact news stories since June 2013 has concerned the secret surveillance activities of the NSA and GCHQ agencies – as revealed by Edward Snowden

• These surveillance activities are fundamentally about big data and analytics, just as they are also about privacy and security, espionage and politics

Page 17: IS6600-11 Big Data, Intelligence & Surveillance 1

17

Key Terms

• NSA – National Security Agency (US) (www.nsa.gov) • GCHQ – General Communications Headquarters (UK)

(www.gchq.gov.uk) • Prism, Tempora, Xkeyscore, Bullrun,

– Systems that store, retrieve and analyze the data

• The Guardian (http://www.theguardian.com/international) – UK newspaper that first published the stories

• Patriot Act – US Act for Homeland Security post 11-9-11

http://en.wikipedia.org/wiki/Patriot_Act

Page 18: IS6600-11 Big Data, Intelligence & Surveillance 1

18

The Government’s Perspective

• Looking for needles in the metadata– Phone numbers, call duration & frequency– Global patterns that may involve terrorism– If a bombing in India can be matched to a sudden

increase of calls in another country, that might be of interest

– To be effective, they need as much data as possible – in short, everything.

Page 19: IS6600-11 Big Data, Intelligence & Surveillance 1

19

The Surveillance Picture

• Edward Snowden has leaked a LOT of information• The stories are still coming. We have learned a LOT

about what governments do – with their own citizens’ data, and with data from other countries

• You may recall stories about data being captured in Hong Kong and China from the Chinese University and Tsinghua University Internet hubs– http://www.reuters.com/article/2013/06/24/us-usa-securi

ty-tsinghua-idUSBRE95N0M220130624• This is a series of events of global proportion• We should not be surprised at anything any more

– If they want to collect it, anything, then they can and will.

Page 20: IS6600-11 Big Data, Intelligence & Surveillance 1

20

Selected Events• Publication of a top-secret court order against

Verizon mandating it to hand over the call records of all its customers

• http://www.theguardian.com/world/2013/jul/19/nsa-extended-verizon-trawl-through-court-order

• Orders for all other telecoms firms also existed• Large-scale collection of data without individual

warrants– Prism

• http://en.wikipedia.org/wiki/PRISM_(surveillance_program)

Page 21: IS6600-11 Big Data, Intelligence & Surveillance 1

21

Prism

• A system that gives the NSA access to the personal information of non-US people from US Internet companies– Apple, Facebook, Google, Microsoft, Skype, Yahoo,…

• These companies always claimed that they protected individual privacy, but … it seems that this was not the case

• However, they were legally required to say nothing – the court orders prohibited them saying anything about their data sharing with the NSA

• Data obtained by cable tapping– Metadata & content from 4 US telecoms providers’ cables

Page 22: IS6600-11 Big Data, Intelligence & Surveillance 1

22

Facebook

• During Jan-June 2013, governments requested info on 38,000 Facebook users– 11,000 + from the US (79% compliance)– 4000+ from India (50% compliance)– 170 from Turkey (47% compliance)– 11 from Egypt (0% compliance)– http://www.theguardian.com/technology/2013/a

ug/27/facebook-government-user-requests

Page 23: IS6600-11 Big Data, Intelligence & Surveillance 1

23

XKeyscore

• This is the data retrieval system used to collect, process and search the data

• http://en.wikipedia.org/wiki/XKeyscore

• It allows an NSA analyst to query “nearly everything a typical user does on the Internet” in near-real time, including:– Email content– Websites visited and searches– Metadata

• In theory these systems were designed to analyse data about foreigners, but many Americans were also included in the databases

Page 24: IS6600-11 Big Data, Intelligence & Surveillance 1

24

GCHQ• This is the UK’s government department that

deals with Telecommunications Signals & Intelligence

• http://www.gchq.gov.uk • http://en.wikipedia.org/wiki/Government_Communicat

ions_Headquarters

• Access to Prism since 2010 • Operates Tempora, similar to Prism, for

collecting data from the Internet and Telecomms.

Page 25: IS6600-11 Big Data, Intelligence & Surveillance 1

25

GCHQ

• In 2009, GCHQ spied on foreign politicians visiting the UK for a G20 summit– Eavesdropping phonecalls, emails – Monitoring computers– Installing keyloggers and then tracking activities

post-summit– Turkish Finance Minister (Simsek)– Russian leader (Medvedev)

• Purpose – Economic/Political Intelligence

Page 26: IS6600-11 Big Data, Intelligence & Surveillance 1

26

Tempora

• Much of the data is harvested from Internet cables that enter the UK (GBs-TBs per second)– 300 GCHQ and 250 NSA analysts are involved

• Telephone calls, Email messages, Facebook entries, Personal Internet history, IM chats, pwds,

– Cooperation with private telecoms companies– Data held for 3 days, metadata for 30

• http://en.wikipedia.org/wiki/Tempora• http://www.theguardian.com/uk/2013/jun/21/gchq-ca

bles-secret-world-communications-nsa

Page 27: IS6600-11 Big Data, Intelligence & Surveillance 1

Bullrun

• NSA and GCHQ spend millions developing programmes that can break Internet security (cryptography) protocols like https, ssl, etc.

• They also work directly with the telecom providers to ensure that they have backdoors that help them to access data that clients think is private/secret (AT&T and the UN)

• There are no Secrets! – http://www.theguardian.com/world/2013/sep/05/nsa-gchq-encryption-codes-security

27

Page 28: IS6600-11 Big Data, Intelligence & Surveillance 1

28

Collusion or Legal Obligation?

• One defence offered by the private companies that hold the data is that they are required to obey the law of the countries in which they operate– They have no choice – they must hand over the

data, or cooperate with the security agencies– Also, they cannot reveal that they are cooperating

– they are gagged from revealing the existence of the Prism/Tempora/Bullrun systems

Page 29: IS6600-11 Big Data, Intelligence & Surveillance 1

29

Payouts

• GCHQ and NSA are working with each other, sharing each other’s data

• NSA subsidizes GCHQ’s costs @ GBP millions annually

• http://www.theguardian.com/uk-news/2013/aug/01/nsa-paid-gchq-spying-edward-snowden

• NSA benefits by GCHQ operating under less strict operating & oversight rules

• NSA expects returns… reports, intelligence.

Page 30: IS6600-11 Big Data, Intelligence & Surveillance 1

30

Problems

• Big data is HUGE – there is simply too much data to collect and analyse– GCHQ may collect up to 20% of the actual data

flow• Big data is getting bigger

– Cables that carry hundreds of GBs/second make that task harder still

• As always, 99.999% of the data is not useful.– Can you find the 0.001% that might be?

Page 31: IS6600-11 Big Data, Intelligence & Surveillance 1

31

Reactions

• There have been attempts to stop media organizations from reporting on the surveillance programmes

• Computers owned by the Guardian newspaper were physically destroyed in an attempt to remove the data & prevent further publication– Additional copies are held in Brazil and the US– http://www.wired.com/threatlevel/2013/08/guar

dian-snowden-files-destroyed/

Page 32: IS6600-11 Big Data, Intelligence & Surveillance 1

32

Implications for Individuals

• Is your data being harvested?– It seems likely.

• Are your private communications, including online purchases, secure? Private?– Not very.

• Are you protected by data privacy laws?– Not against governments.– Perhaps against private companies.

• http://www.pcpd.org.hk/

Page 33: IS6600-11 Big Data, Intelligence & Surveillance 1

33

Questions• What kind of data is being collected?

– Where, By Who, For What Purposes???– Can we see/find (some of) the data anywhere?– Are you personally at risk?

• That depends on who you are, what you do, who you talk to and what about.

– Should we be concerned?• Is there anything we can do as individuals, as decision

makers, as companies? – http://www.theguardian.com/world/2013/sep/05/nsa-how-to-remain-secure-surveillance

• Or is it more sensible just to get on with our lives?

• Do some Internet research now and try to answer some of these questions.