big challenges for big data...

28
Big Challenges for Big Data Security Prof. Oliver Popov PhD Student: Irvin Homem PhD Student: Spyridon Dossis

Upload: lykhuong

Post on 27-Apr-2018

221 views

Category:

Documents


3 download

TRANSCRIPT

Big Challenges for Big Data Security

Prof. Oliver Popov PhD Student: Irvin Homem

PhD Student: Spyridon Dossis

• Two major research areas

– Natural Sciences

– Humanities and Social Sciences

• >60000 students

• 2000 postgraduate students

• 5000 faculty and employees

• 4 Nobel Prize winners

• Participation in international networks

– EUA, IMHE, NCI, UNICA etc.

2013-10-10 Department of Computer and System Sciences/SU

• Established in 1965 – First and the largest CS

department in Sweden

• Part of the Faculty of Social Sciences

• Located in Kista Science City, third largest ICT

cluster in the world

• 5400 students annually

• 300 employees

• Major Profile Areas

– e-Government, TEL, ICT4D, RATS

Department of Computer and System Sciences/SU 2013-10-10

Agenda

• Everything BIG

– Data

– Opportunities

– Challenges

– Security Problems

2013-10-10 Department of Computer and System Sciences/SU

What makes Big Data…BIG?

• NIST: “The set of technical capabilities and management

processes for converting vast, fast and varied data into

useful knowledge”. (one of the myriad definitions)

• The point at which traditional data management, tools for

analysis and practices no longer apply.

2013-10-10 Department of Computer and System Sciences/SU

“We are drowning in

information, while

starving for wisdom”

E.O. Wilson (Harvard

University)

Characteristics of Big Data

• Large scale analytics

• Distributed redundant data storage

• Parallel task processing

• Fast data insertion

• Central management and orchestration

• Hardware agnostic

• Accessible

• Extensible

• Cost effective

2013-10-10 Department of Computer and System Sciences/SU

The Three V’s of Big Data

• Volume

– Humankind has produced 5 exabytes until 2003

– Today, it is 5 exabytes every 10min

– 90% of total data produced in the last 2 years

• Variety

– Structured (data warehouses)

– Semi-structured (XML, graph, pdf)

– Unstructured (natural text, video)

• Velocity

– Batch processing

– Stream processing

2013-10-10 Department of Computer and System Sciences/SU

The Three V’s of Big Data

Department of Computer and Systems Sciences/SU

Current Limitations in Large Scale

Analytics

2013-10-10 Department of Computer and System Sciences/SU

• Computing and storage: Insufficient

CPU power and disk I/O random access

speeds.

• Data traffic jam: Difficulties in mass

network transfers.

• Scarcity and potential: Available

resources are not enough for current

power and space needs

“Smarter, not faster is the future of

computing research” E. Lazowska

(Washington University)

• Scientific experimental measurements

– SETI, LHC, SKTA, Genome projects

• Computer simulations

• E-commerce and search engines

• Social media

– Facebook, Twitter, Google

• Internet of Services / Things

Sources of Big Data

2013-10-10 Department of Computer and System Sciences/SU

Big Data Impact Areas

• Natural Sciences

• Biological and Medical Research

• Telecommunications and Networking

• Social Network Analysis

• National (Cyber-)Intelligence, and counting…

2013-10-10 Department of Computer and System Sciences/SU

Big Data Opportunities

• Completeness

• Personalization

• Real-time

• Data Relationships

• Exploration

Department of Computer and System Sciences/SU 2013-10-10

Big Data Building Blocks

• Hardware parallelism

• Scalable and elastic computing and storage

infrastructures (such as cloud computing,

cluster-based systems)

• Parallel programming frameworks (such as

MapReduce, workflows)

• Service-oriented architectures

• Distributed database systems

• Federated security mechanisms

• Models for information representation

(ontologies) and data mining

Department of Computer and System Sciences/SU 2013-10-10

Harnessing Big Data

• (Semi)automated data-driven decision making

• Better planning and forecasting

• Risk quantification (thus avoiding elusiveness)

• Consolidation of government data

• Adaptive e-services

Department of Computer and System Sciences/SU 2013-10-10

Big Challenges for BIG DATA

• Highly distributed sources

• Authenticity & provenance

• Velocity and heterogeneity

• Systems diversity

• Parallel, distributed scalable algorithms

• Security and integrity

• Sharing and integration

• Massive visualization

• Governance & curation

• Privacy, retention & compliance

Department of Computer and System Sciences/SU 2013-10-10

Big Data Concerns

• Legality

– Collection, disclosure, consolidation/correlation

– Data ownership, control and rights

• Data quality

– Accuracy, relevance, timeliness

• Disparate data meanings

– Semantic coherence

• Overconfidence in data and models

– Consistent justification, analytical integrity

• Privacy

– Generalization in science vs. Particularization in

business

Department of Computer and System Sciences/SU 2013-10-10

Mind the Big-

Data –

InfoSec Gap

• Lack of clear definition of Big Data and related

products maturity / awareness

• Knowledge gap among security practitioners

on the value Big Data can provide

• Slow current adoption by organizations while

regarded as strategic priority for IT

• Need for speed and prioritization of high risk,

low-frequency security events

Department of Computer and System Sciences/SU 2013-10-10

Adding Value

to InfoSec

Department of Computer and System Sciences/SU 2013-10-10

Source: Gartner (March 2012)

Big Data Security Myths

• Big data security - no different from traditional

security

– Storage, query and processing models are different

• Big data is not used in production systems

– Similar notion with the Internet in the late 90’s

• Existing security tools work with big data

– Security products may affect deployment, scalability and

communication protocols, limiting big data capabilities

• No sensitive data is stored in big data clusters

– Correlation may result is sensitive data

• Security is only needed on the back-end

– Big data has a lot of links to all ends

Department of Computer and System Sciences/SU 2013-10-10

• Security must be built in rather than an after-

thought.

• Plugging in commonplace security mechanisms

into big data applications is usually non-trivial,

intractable and hence not sufficient.

• Separation of concerns / duties for

administration and management

• Need for strong federated identification /

authentication solutions

Department of Computer and System Sciences/SU 2013-10-10

• Distributed nodes create a complicated

environment, avoiding the traditional security

“choke-point” that would impede scalability

• Data “sharding” cancels the traditional data

security model

• Granular Authentication, Authorization &

Accountability on inter-node communication

• Current tools have not been thoroughly security

reviewed (e.g. OWASP)

• Security on metadata and transaction logs

• Data mining & Analytics ignore privacy

Department of Computer and System Sciences/SU 2013-10-10

• File layer encryption and key management

• Automated configuration and patch management

• Monitoring & filtering (Distributed & Real time)

– Avoid introducing a single point of failure

• Audit and logging (meta-Big-Data)

• Harden the infrastructure

– Node authentication (e.g. Kerberos)

– Traffic encryption (e.g. TLS)

– Protect the management plane

Department of Computer and System Sciences/SU 2013-10-10

• Performance gains over traditional SIEM tools for

log/network event aggregation, correlation and

search

• Network flow and packet analysis for anomaly

detection (e.g. botnets, e-crime syndicates)

• Behavior profiling for detecting “low-noise”

Advanced Persistent Threats

• Community-based reputation scoring and

malware detection

• Identity and access intelligence

• Threat-intelligence networks

Department of Computer and System Sciences/SU 2013-10-10

• Need for transparency in data collection, tools

and techniques

• Abuse the “wealth of the data” to influence,

manipulate and restrict the “Quantified Self”

• Power balance between data producers and

inference/decision makers

• Development of “Big Data Ethics”

– Data-driven but not data-ruled

Department of Computer and System Sciences/SU 2013-10-10

Systems Analysis & Security (SAS) Unit

Key activities

• decision and risk analysis

• big data, innovation and eGov services

• data mining

• simulation of complex systems.

• security, privacy and trust

• digital and cyber forensics

Projects:

• ICT NG (Formas), EnRiMa (EU), STORK 2(EU), IRIS (Vinnova), DEDAL (VR),

iMENTORS (EU), Multimodal communication , SSL (SU), e-SENS (EU),

SENS4US (EU) and DFET (EU)

• eGov lab including the Cyber Systems Security (CS2) lab with platform for

simulation of security, privacy, events and forensics analysis

Cooperation

• SE government (local and national), EU, Sida, UAS, UCL, IASSA, leading

universities in Europe, North America and Asia

Department of Computer and System Sciences/SU 2013-10-10

Systems Analysis & Security (SAS) Unit

• STORK 2.0

– Federated, cross-border authentication and authorization

• eSENS

– Secure infrastructure for interoperable public services in Europe

• DFET

– Cloud-based cybercrime training environment to include real life

simulation and scenario analysis

• Scalable and Automated Aggregation of Forensic

Evidence from the Internet of Things

• Semantic Integration and Analysis of Digital Security &

Forensic Data

2013-10-10 Department of Computer and System Sciences/SU

Open for collaboration and cooperation

• Analyzing the Digital Past for Improving the

Digital future (such as e-Discovery for Business

Intelligence)

• Digital services – for preserving security, privacy

and integrity of access and usage, while being

aware of accountability and responsibility, the

impact of surveillance and the issue of data

retention – simply - building the digital trust and

trustworthiness

Department of Computer and System Sciences/SU 2013-10-10

Thank you

International networks

2013-10-10 Department of Computer and System Sciences/SU

Contact: Oliver Popov ([email protected])