Download - Digital Forensics - Iowa State Universitys2erc.iastate.edu/wp-content/uploads/guan_2014_0710...Digital Forensics Cyber Crimes: The Transformation of Crime in Information Age Yong Guan

Digital Forensics

Cyber Crimes: The Transformation of Crime in

Information Age

Yong Guan

Associate Director for Research, Information Assurance Center

Associate Professor of Electrical and Computer Engineering

Iowa State University

July10, 2014

S2ERC Ames 2014

Our Efforts

Cyber Crimes: A painful side-effect of the innovations of

Computer and Internet technologies

Increasing criminal activities online

Almost all physical crimes involve digital evidence

Low percentage of cases reported to law enforcement

Our Research Foci:

Digital Forensics: Investigative and Build Accountability

Network and System Security

Assurance Modeling and Risk Analysis

One of the first 7 NSA-designated IA CAEs

Increased data size & urgency

Data centers, Storage, Time

Increased use of encryption

Child pornography case

Increased complexity

ISP, device diversity

Anti-Forensics

Anonymity

Steganography

“Co-Space”

Growing sophistication and stealthiness of cyber criminals!

Massive base of installed infrastructure with insufficient support for security

Paradigm Shift of Incident Response

Identify the cause of problem vs. Fix the problem - Priority?

Legal Influence

Evidence presented, examined, and challenged by the jury and the judges in the courtroom

Social Impacts

Concern of negative publicity

Low percentage of cases reported to Law Enforcement

The Uses of Digital Forensic Solutions

Understand the root causes and impacts of incidents and misbehaviors

Internal and 3rd party auditing

Incident response (IT, insurance, healthcare, e-business)

Electronic evidence discovery and recovery

Network and security monitoring

Attack and inside threat attribution

Data analytics

Physical security (device fingerprinting, biometric security)

Countering Anti-Forensics

Evidence principles (legal, social, technical)

Hidden, Low-Profile

Coordinated

Geographically-distributed

Financial loss and societal impacts

Botnets

Malware

Click Frauds

DDoS

Spam/Phishing

Privacy-violation

… …

Recent Trends - Cyber Attacks & Crimes

Problem Definition

Abnormal/malicious activities and patterns thereof are

often the meaningful signs for many security problems.

7/11/2014 7

Super-spreaders

DDOS attack

Spam emails

Worm spreading

Botnet takeover

Botnets

Malware

Click Frauds

DDoS

Spam/Phishing

Privacy-violation

… …

Sketches

Log … …

… … … … Log … …

… … … … Sketches

Sketches

Log … …

… … … … Log … …

… … … … Sketches

Process each packet

in a wire speed

In-memory sketches

capture traffic status

Error-bounded measurements enable low profile

attack detection

Time-decaying window model is used to detect on-going attacks

Scalable to process network-wide

measurements

Mergeable from

multiple monitoring

points

Reversible for

Identification of

Problem Sources

Research Problems on Security

Monitoring and Attribution

Requirements of the Designed Solutions

7/11/2014 9

Network-wide traffic view

Duplicate removing

Mergeable measurements

Super-spreader identification

Space & time limitation

Our Idea - Sketch Design

Group Testing

Cardinality

Estimation

Error-correcting Code

Our Sketch

7/11/2014 10

• Sketches – Give (ε,δ)-approximations on cardinalities of super-

spreaders in each data stream with using space and time.

– Mergeable: merging two sketches equals to merging two data streams.

– Reversible: recover the identity of the super-spreaders from the sketch.

L layers(groups).

Te

xtᶯ subgroups for group testing

+ 1 subgroup for FP removing

Text

Counters used in

cardinality estimation

for each subgroup.

1. Each packet is independently hashed

into multiple groups according to the

source s.

Hash functions are based on the quotient

and remainder of s divided by L.

2. In each group, (s,d) is mapped into multiple

subgroups according to the 1-bit of quotient q

of s divided by L. Error-correcting code is used to encode q to

w(q) before mapping.

3. Each subgroup where (s,d) is mapped to

will update its cardinality using the

destination d.

Proposed Approach

7/11/2014 11

1 0 0 0 1 0 0 0 1 0

0

0

0

1

0

1

0

1

0

1 1 0 0 0 1 1 0 0

0 1 0 0 0 0 0 0 1

0 0 1 0 1 0 0 0 0

0 0 1 0 0 0 0 0 0

1 0 0 1 1 0 1 0 0

1 1 0 1 1 1 1 0 0

0 0 0 1 1 0 0 0 0

0 0 0 0 0 1 0 1 0

0 0 1 1 0 0 0 0 0

Bt[*,*]

W(y) = 000001010

try each of the hash functions on

decoded y

a = 1000. Layer number is

also used to recover the

super-spreader’s ID.

y = 0010. y is the quotient of

the super-spreader in this

group with high probability.

8th layer

decoding

Text

Layers

(groups)T

extSubgroups

Text

super-spreader

candidate x

Create a 2D binary matrix from C[*,*,*]: test each subgroup C[a,b,*]

in each layer/group to see if its cardinality is larger than the

threshold. If yes, set B[a,b]=1, else set B[a.b]=0.

A Snapshot of Our Recent and On-going Efforts

Reversible

Sketches

Coding

Theory

Dynamic

Membership

Query

Time-decaying

Window

Hash Functions:

Bloom Filters

Hash Tables

Super-spreader

Detection

PCA-based Traffic

Anomaly

Click frauds

Traffic Activity

Graph Analysis

Using the low-

rank properties

Low-rank Matrix

Approximation

Persistent attacks

Botnet C&C

Communication

Heavy-Change

Detection

Entropy and Distribution

Property Tests Linear Algebra for Matrix

Approximation

Social Graph

Analysis

E-evidence Imaging and Recovery

File and FS analysis

Imaging (SSD, mobile platforms)

File Carving

Family & Friends Businesses Activists Media Military & Law Enforcement

Anonymity <--> Accountability

Anonymous systems: the ring of Gyges in cyber world.

Well-known online services

Tor

Anonymizer

Use of Tor

Wikileaks

Threatening emails/phone calls

German Child Porn, 2006

Darknet – Silk Road, summer of 2011

The Design of

Accountable

Anonymity

https://www.torproject.org/about/torusers.html.en#normalusers

https://www.torproject.org/about/torusers.html.en#executives

https://www.torproject.org/about/torusers.html.en#activists

https://www.torproject.org/about/torusers.html.en#journalist

https://www.torproject.org/about/torusers.html.en#military

A More Recent Effort

Security & Privacy, and Forensics of Medical Devices

Shelby Kobes

Thanks Q&A

Yong Guan [email protected]

Iowa State University

Challenges in Security Monitoring and

Forensic Analytics

Internet/cellular service providers have to measure and analyze network traffic: Maintenance:

Equipment Failures

Vendor Implementation Errors

Software Bugs

Usage monitoring: Flash Crowds,

Large File Transfers

Term-of-service Abuse

Security: Online Fraud Activities

Malware Spreading

DDoS Attacks

Network-wide Traffic Anomaly

Our algorithm provides theoretical bounds for the PCA-based traffic anomaly detection.

The space requirement, the communication cost, and other resources can be optimized over a distributed network monitoring environment.

Yang LIU, Linfeng ZHANG, and Yong GUAN. A Distributed Data Streaming Algorithm for Network-wide Traffic

Anomaly Detection. SIGMETRICS Perform. Eval. Rev. 37, 2, July 2009.

Yang LIU, Linfeng ZHANG, and Yong GUAN, Sketch-based Streaming PCA Algorithm for Network-wide Traffic

Anomaly Detection. ICDCS 2010.

Super Spreader (malware spreading)

A new reversible sketch to aggregate traffic information for

super spreader detection

Running time for the sketch updating is near-optimal

The number of aggregated flows achieves the lower bound.

Yang LIU, Wenji CHEN, and Yong GUAN. A Fast Sketch for Aggregate Queries over High-Speed Network

Traffic. INFOCOM 2012.

.

Long Duration Flow of Botnet C&Cs

We propose a data streaming algorithm for tracking LDFs in a

high-speed network, which can detect LDFs with only few false

negatives but no false positive.

Our algorithm can provide the strongest error bound for the

flow duration estimation, which is optimal for this problem.

The running time to process each packet in our algorithm is

constant, regardless of the error bound.

Time

Yang LIU, Wenji CHEN, and Yong GUAN, False Positive or False Negative: Data

Streaming Algorithms for Tracking Long Duration Flows. Submitted to IEEE Transactions

on Parallel and Distributed Systems (TPDS).

Duplicate Detection for DoS and Attack

Pattern Analysis

We propose a novel data structure using Cuckoo hashing in a time-decaying window model.

We introduce a new algorithm to maintain a time information for each item.

Our data structure is near-optimal in both space and running time.

Time

Duplication

Yang LIU, Wenji CHEN, and Yong GUAN, Near-optimal Approximate Membership

Query over Time-decaying Windows. INFOCOM 2013.

Reversible Sketches

Motivation: Change Detection, Super Spreaders, etc.

Problem: Aggregate Queries, but difficult to identify the root

causes of the alarm

1+log(n/ℓ)

( f i , si )

( f1 , s1 )

( f2 , s2 )

Pa

cke

t

Stre

am

ℓ

+

si

+

si

+

si

+

si

+

si

+

si

+

si

+

si

+

si

At each row, we update multiple

counters to maintain enough

information to recover keys later

Each packet is hashed

into multiple rows

Running time for the sketch updating is near-optimal.

The number of aggregated flows achieves the lower bound of the heavy-change problem.

Can be implemented with other aggregate queries, and improve their efficiency and reliability, e.g. super spreader detection.

Yang LIU, Wenji CHEN, and Yong GUAN. A Fast Sketch for Aggregate

Queries over High-Speed Network Traffic. INFOCOM 2012.

Yang LIU, Wenji CHEN, and Yong GUAN, Identifying High-Cardinality Hosts

from Network-wide Traffic Measurements. IEEE CNS 2013.

f({e2})=25

KPI: 93 0.16

f({e2,e4})=3

KPI:62 1 f({e2,e9})=8

KPI:88 1

f({e2,e11})=75

KPI: 91 1

f({e3,e11})=12

KPI: 80 1

f({e2,e3,e9})=7

KPI: 79 0.47

f({e2,e3,e11})=13

KPI: 78 1

f({e2,e3,e9,e11})=8

KPI: 74 1

f({e2,e3})=15

KPI: 81 0.35

f({e3})=0 0

f({e3,e9})=0 0 f({e9,e11})=0 0

f({e2,e9,e11})=0 0 f({e3,e9,e11})=0 0

D2

F1 F2

G1

Forecast for a subset of events. Suppose one event has been

observed, and you are interested in knowing if another event will

occur by the end of the process instance. Suppose you have

observed that event e2 has occurred. What is the probability that

event e3 will occur by the end of the process instance?

P(e3 is in end event log | e2 is in end event log)=(15+7+13+8) /

(25+3+43+8+75)

A manager might ask herself: what is the probability that a car is

made with a bad weld knowing that robot access control was

blocked? The manager may have a second production line

model based on the KPI of dollar cost, and knows that a bad

weld is very expensive to fix after the car is produced, and is

curious to know how blocked robot access is associated with

bad welds. She can check the probability that e3 occurs with

other events as well to determine any high-level probabilities,

and make adjustments in the production line that reduce her

dollar cost in the second model.

P(D2 is final event set)=

15 / (15+7+13+8)

P(F1 is final event set)=

7 / (15+7+13+8)

P(F2 is final event set)=

13 / (15+7+13+8)

P(G1 is final event set)=

8 / (15+7+13+8)

POMM. The POMM allows us to model

the conditional probabilities on the

nodes. We give a short example here.

One property of POMMs is that local

conditional probabilities can be multiplied

to produce other conditional probabilities.

This can reduce computation time for

computing marginal probabilities.

P(F1|D2)=(7+8) / (15+7+8+13)

P(G1|D2)=8 / (15+7+8+13)

P(G1|F1)=8 / (7+8)

P(G1|F1)*P(F1|D2)=P(G1|D2).

Attack Impact Analysis and Assurance

Modeling

The Given G(V,E) |V|= n, |E|= m, m >> n.

Edge Sparsification: Create an approximation G’ of G, s.t.:

G’ has fewer edges than G sparsity,

while guaranteeing that G’ preserves certain property of G

and computing on G’ is much cheaper than that on G.

G

cut-value

max-flow

K-connectiv

ity

conductance

shortest paths

Spectrum of

Laplacian

Component-based structura

l prop.

distributions

Cuts: Cut-value C = sum of weights of edges with

one end in S and another in V\S.

Conductance: For a cut (S,T), T=V\S,

For the graph G,

Conductance of S measures how hard it is to

leave S when taking a random walk on the graph.

Spectrum of Laplacian matrix: Normalized Laplacian matrix L = D-1/2LD-1/2 ,

where L = D – A. Let the eigenvalues (spectrum) of L be λ1, λ2, …,

λn, then 0 = λ1 ≤ λ2 ≤ … ≤ λn,

Multiplicity of eigenvalue zero = # of connected

components

When G is connected, λ2 > 0.

Analyzing G’ instead of G:

Save a lot of time

and allow run-time analysis

Graph Sparsification/Streaming for

Complex System Analysis

What information is at risk in medical

records and devices

Some of the data stored in hospital Computers that could be used by a hacker

Surgical history

Obstetric history

Medications Allergies

Family history

Social history

Habits

Immunization history

Prescriptions

Test results

Current Health

Device Information

Insurance Information

Medical X-rays

Personal Device Function

Download - Digital Forensics - Iowa State Universitys2erc.iastate.edu/wp-content/uploads/guan_2014_0710...Digital Forensics Cyber Crimes: The Transformation of Crime in Information Age Yong Guan

Top Related