multi-level observation and understanding of program behaviors

34
Multi-level Observation and Understanding of Program Behaviors Liang Zhenkai 梁振凯

Upload: others

Post on 16-Oct-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi-level Observation and Understanding of Program Behaviors

Multi-level Observation and Understanding of Program Behaviors

Liang Zhenkai 梁振凯

Page 2: Multi-level Observation and Understanding of Program Behaviors

Security Incidents Are on The Rise

1

What happened? Who is affected? How to prevent?

Page 3: Multi-level Observation and Understanding of Program Behaviors

Binary-Level View

00 48 2d e9 04 b0 8d e20c 00 9f e5 a5 ff ff eb00 30 a0 e3 03 00 a0 e100 88 bd e8 d0 04 01 00

Binary code

myfunc:push {fp, lr}add fp, sp, #4ldr r0, helloworldbl <puts> mov r3, #0mov r0, r3pop {fp, pc}

Assembly code

void myfunc (){printf(“hello world”);

}

Source codeControl-flow Graph

Instruction Trace

Page 4: Multi-level Observation and Understanding of Program Behaviors

Binary-Leve Vulnerability, Attacks and Defenses• Code injection • Code reuse • return-to-libc • return-oriented programming (ROP)

• Data-oriented Programming (DOP)

3

Memory space

Code

DataData w/ DEP

Data Execution Prevention

Control Flow Integrity

CFI

CFG

Page 5: Multi-level Observation and Understanding of Program Behaviors

4

Audit-Log-Level Viewl User-space utilities (e.g., Auditd) collect system call records from kernel space

through Netlink and write them to a log file under /var/log/auditl An Example of a read log entry in Auditd

l An Example of a read log entry in Auditbeat

Page 6: Multi-level Observation and Understanding of Program Behaviors

Enterprise Network View

Page 7: Multi-level Observation and Understanding of Program Behaviors

High-level Report

6

Page 8: Multi-level Observation and Understanding of Program Behaviors

Understanding of Cyber Security Events

Page 9: Multi-level Observation and Understanding of Program Behaviors

Endpoint Monitoring Solutions

8

Endpoint monitoring solutions record audit logs for attack investigation

Audit logs:l A history of events representing OS-level activities l Provide visibility into security incidents with data provenance

type=SYSCALL msg=audit(30/09/19 20:34:53.383:98866813) : arch=x86_64 syscall=read exit=25 a0=0x3 ppid=15757 pid=30204 auid=junzeng sess=6309

Provenance Analysis

PID: 30204FD: 0x3read

Page 10: Multi-level Observation and Understanding of Program Behaviors

Audit Log Analysis• Starting from a detection point, Backtracker does:

• Events & objects identification related detection point• Generate dependency graph• Use rules to prune unrelated nodes in the dependency graph

Backtracker (King & Chen, 2003)

Suspicious file or process

Dependency explosion!

Page 11: Multi-level Observation and Understanding of Program Behaviors

Audit Log Analysis• Resolve dependency explosion problem in a long running application • Fine-grained provenance tracing technique• Identifying unit boundaries & dependences• Partition into individual unit• Code instrumentation

BEEP (Lee et al., 2013)

Suspicious file or process

Page 12: Multi-level Observation and Understanding of Program Behaviors

Audit Log Analysis• Address threat alert fatigue during threat investigation• Sssign anomaly scores to every edge in dependency graph• Based on frequency of events that have occured (historical & contextual

information)• Propagated score through edges in the graph• Generate aggregated anomaly score for triaging

NoDoze (Hassan et al., 2019)

High frequency suspicious event

42

Page 13: Multi-level Observation and Understanding of Program Behaviors

Audit Log Analysis• Generate high-level graph during threat investigation• Develop robust & reliable detection signal• Correlate between suspicious information flow

HOLMES (Milajerdi et al., 2019)

Higher level

Page 14: Multi-level Observation and Understanding of Program Behaviors

Related Work

13

l Scale up provenance analysis:l Data reduction [NDSS’16, 18 …] & Query system [Security’18, ATC’18 …]l Recognizing behaviors of interest requires intensive manual efforts

A semantic gap between low-level events and high-level behaviors

l Apply expert-defined specifications to bridge the gapl Match audit events against domain rules that describe behaviorsl Query graph [VLDB’15, CCS’19], Tactics Techniques Procedures (TTPs)

specification [SP’19,20], and Tag policy [Security’17,18]

Behavior-specific rules heavily rely on domain knowledge (time-consuming)

d

Can we automatically abstract high-level behaviors from low-level audit logs and cluster semantically similar behaviors before human inspection?

Page 15: Multi-level Observation and Understanding of Program Behaviors

bash

lsvim gcc

cc1 collect2as

ld

Pro1.c

a.out

a.out

sudo

vim tar

Eva.doc Eva.tar

bash

git add

git commit

git push 13.250.X.X

apt

sudo apt

sh

dpkg

gpgv

dpkg

update-motd-upd

http

http

apt-config dpkg

rm find

apt-key

bash ssh

shrun-parts

sshd

gcc

cc1

git add git commit git push

ssh13.250.X.X

catls cp

secret.txt a.c

bash

Motivating ExampleAttack Scenario: A software tester exfiltrates sensitive data that he has access to

Data Exfiltration Steps

13.250.X.Xsecret.txt

a.c a.out

cp

gcc

github

14

Motivating Example Logs

Page 16: Multi-level Observation and Understanding of Program Behaviors

bash

lsvim gcc

cc1 collect2as

ld

Pro1.c

a.out

a.out

sudo

vim tar

Eva.doc Eva.tar

bash

git add

git commit

git push 13.250.X.X

Data Exfiltration

Program Compilation and Upload Github Submission

apt

sudo apt

sh

dpkg

gpgv

dpkg

update-motd-upd

http

http

apt-config dpkg

rm find

apt-key

Package Installation

Package List Update

bash ssh

shrun-parts

sshd

gcc

cc1

git add git commit git push

ssh13.250.X.X

catls cp

secret.txt a.c

bash

Motivating Example

Data Exfiltration Steps

13.250.X.Xsecret.txt

a.c a.out

cp

gcc

github

15

Motivating Example Logs

Data Exfiltration

Program Compiling and Upload (cluster)

Attack Scenario: A software tester exfiltrates sensitive data that he has access to

Page 17: Multi-level Observation and Understanding of Program Behaviors

bash

lsvim gcc

cc1 collect2as

ld

Pro1.c

a.out

a.out

sudo

vim tar

Eva.doc Eva.tar

bash

git add

git commit

git push 13.250.X.X

Data Exfiltration

Program Compilation and Upload Github Submission

apt

sudo apt

sh

dpkg

gpgv

dpkg

update-motd-upd

http

http

apt-config dpkg

rm find

apt-key

Package Installation

Package List Update

bash ssh

shrun-parts

sshd

gcc

cc1

git add git commit git push

ssh13.250.X.X

catls cp

secret.txt a.c

bash

Challenges for Behavior Abstraction

Event Semantics Inference:l Logs record general-purpose system

activities but lack knowledge of high-level semantics

Individual Behavior Identification:l The volume of audit logs is overwhelmingl Audit events are highly interleaving

16

Data Exfiltration

Program Compiling and Upload

Package Installation Events > 50,000

Page 18: Multi-level Observation and Understanding of Program Behaviors

Our Insights

17

a.c

a.c

cc1

cc1

as

as

ld

ld

/tmp/ccCdWCyH.s a.out

/tmp/cc5vRkEk.s a.out

Similar context --> Similar semantics

X.o

X.o

Compiling program using GCC

How do analysts manually interpret the semantics of audit events?

Page 19: Multi-level Observation and Understanding of Program Behaviors

Our Insights

Reveal the semantics of audit events from their usage contexts in logs

18

a.c

cc1

cc1

as ld

/tmp/ccCdWCyH.s a.out

/tmp/cc5vRkEk.s

Different context --> Different semantics

X.o

Compiling program using GCC

How do analysts manually interpret the semantics of audit events?

secret.txt

Data Exfiltration

Page 20: Multi-level Observation and Understanding of Program Behaviors

Our Insights

Summarize behaviors by tracking information flows rooted at data objects19

How do analysts manually identify behaviors from audit events?

Forward tracking on secret.txt

secret.txt cat cp a.ccc1

git add git commit git push ssh

Data Exfiltration Behavior

Page 21: Multi-level Observation and Understanding of Program Behaviors

WATSON

An automated behavior abstraction approach that aggregates the semantics of audit logs to model behavioral patterns

l Input: audit logs (e.g., Linux Audit[1]) l Output: representative behaviors

20[1] Linux Kernel Audit Subsystem. https://github.com/linux-audit/ audit-kernel.

Page 22: Multi-level Observation and Understanding of Program Behaviors

KG = {(h, r, t)|h, t ∈ {Process, F ile, Socket}, r ∈ {Syscall}}

Knowledge Graph ConstructionWe propose to use a knowledge graph (KG) to represent audit logs:

l KG is a directed acyclic graph built upon triples

l Each triple, corresponding to an audit event, consists of three elements (head, relation, and tail):

l KG unifies heterogeneous events in a homogeneous manner

21

Page 23: Multi-level Observation and Understanding of Program Behaviors

Event Semantics Inferencel Suitable granularity to capture contextual semantics

l Prior work [CCS’17] studies log semantics using events as basic units. l Lose contextual information within eventsl Working on Elements (head, relation, and tail) preserves more contexts

l Employ an embedding model to extract contextsl Map elements into a vector spacel Spatial distance represents semantic similaritiesl TransE: a translation-based embedding modell Head + Relation ≈ Tail à Context decides semantics

22

Page 24: Multi-level Observation and Understanding of Program Behaviors

Event Semantics ExplicabilityUse t-SNE to project the embedding space (64 dimensional in our case) into a 2D-plane, giving us an intuition of embedding distribution

Semantically similar system entities are clustered in the embedding space

(a) 53 Program embeddings (b) 25 data object embeddings

23

Socket (port 22)

Socket (port 80)

Files

Page 25: Multi-level Observation and Understanding of Program Behaviors

bash

lsvim gcc

cc1 collect2as

ld

Pro1.c

a.out

a.out

bash

git add

git commit

git push 13.250.X.Xssh

Behavior Summarization

24

Individual behavior identification: Apply an adapted depth-first search (DFS) to track information flows rooted at a data object:

l Perform the DFS on every data object except librariesl Two behaviors are merged if one is the subset of another

Data Exfiltration

Program Compiling and Upload

Different!

Page 26: Multi-level Observation and Understanding of Program Behaviors

Behavior Semantics Aggregationl How to aggregate event semantics to represent behavior semantics?

l Naïve approach: Add up the semantics of a behavior’s constituent events l Assumption: audit events equally contribute to behavior semantics

l Relative event importancel Observation: behavior-related events are common across behaviors, while

behavior-unrelated events the oppositel Apply frequency as a metric to define event importancel Quantify the frequency: Inverse Document Frequency (IDF)

l The presence of noisy eventsl Redundant events [CCS’16] & Mundane events

25

Page 27: Multi-level Observation and Understanding of Program Behaviors

Representative Behavior Identificationl Cluster semantically similar behaviors: Agglomerative Hierarchical

Clustering analysis (HCA)

l Extract the most representative behaviorsl Representativeness: Behavior’s average similarity with other behaviors in a clusterl Analysis workload reduction: Do not go through the whole behavior space

26

Page 28: Multi-level Observation and Understanding of Program Behaviors

Efficacy in Attack InvestigationMeasure the analysis workload reduction of APT attack investigation in the DARPA TRACE dataset:

l Analysis workload: the number of events to recognize all behaviors

Two orders of magnitude reduction in analysis workload and behaviors27

Page 29: Multi-level Observation and Understanding of Program Behaviors

Functionality, Flexibility, and Security• Security is about “nothing

else”• Specified functionality and only

specified functionality

• Flexibility is the root of many security problems

SpecifiedFunctionalit

y

F(x) = x + 1

PossibleSoftwareBehaviors

system()Socket()

Page 30: Multi-level Observation and Understanding of Program Behaviors

• KISS (Keep It Simple, Stupid)

• KICS (Keep It Complex & Smart)

Simplicity in System Design

Page 31: Multi-level Observation and Understanding of Program Behaviors

Dimensions of System Research

30

Human factor,social

engineering

Isolation, memory exploit,

provenance analysis, …

Law, policy, politics

Psychology, cognition,

responsibilitySystem

System

Human

Human

Page 32: Multi-level Observation and Understanding of Program Behaviors

Objective for Graduate Education• Deep knowledge and skills in technology domain• Abstraction and presentation of thoughts

• Ability to think and analyze broadly• Especially in challenging times, calling for independent minds

• Understanding systems better• Order, flexibility, force, …

• Form and live with a philosophy• Embrace trust• Life with minimal dependency

Page 33: Multi-level Observation and Understanding of Program Behaviors

Research Areas and Range of Development

System and security:National Cybersecurity R&D Lab

Economics, business, and technologyFintech Institute

Behavior/Psychology and security:CFPR in Arts and Social Science

Top-ranked degree program and open cultureTop researchers in various fields

Views from different perspectivesCulture, social system, etc.

Binary and System AnalysisAttack diagnosis and attributionTrusted environment in Web/Mobile Human behavior in cyber experimentation

Page 34: Multi-level Observation and Understanding of Program Behaviors

Understand the movement of the sun and moon from traces of shadows under the roof.

审堂下之阴,而知日月之行,阴阳之变也 Thank you!

l Multiple-level views of cyber incidents

l Our Insights in log analysisl Infer audit event semantics by usage

contextsl Identify behaviors with information

flows rooted at data objectsl On system research

Understanding systems 理解系统Abstracting knowledge 提炼知识

Connecting facts 参悟规律