exploiting diverse observation perspectives to get insights on the malware landscape

40
EXPLOITING DIVERSE OBSERVATION PERSPECTIVES TO GET INSIGHTS ON THE MALWARE LANDSCAPE Corrado Leita Symantec Research Labs Ulrich Bayer Technical University Vienna Engin Kirda Institute Eurecom @ iSecLab

Upload: rhonda

Post on 25-Feb-2016

51 views

Category:

Documents


0 download

DESCRIPTION

Corrado Leita Symantec Research Labs Ulrich Bayer Technical University Vienna Engin Kirda Institute Eurecom @ iSecLab. Exploiting diverse observation perspectives to get insights on the malware landscape. Outline. Introduction Related Work SGNET and EPM Clustering Results - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Exploiting diverse observation perspectives to get insights on the malware landscape

EXPLOITING DIVERSE OBSERVATION PERSPECTIVES TO GET INSIGHTS ON THE MALWARE LANDSCAPE

Corrado Leita Symantec Research LabsUlrich Bayer Technical University ViennaEngin Kirda Institute Eurecom @ iSecLab

Page 2: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 2

Outline

Introduction Related Work SGNET and EPM Clustering Results Conclusion

2010/7/20

Page 3: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 32010/7/20

INTRODUCTION

Page 4: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 4

Introduction

30,000 samples per day submitted to VirusTotal website About the order of millions of samples

per month Malware writers can generate new

code by existing code bases or by re-packing the binaries using code obfuscation tools e.g., Allaple Worms.

2010/7/20

Page 5: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 5

Introduction

A complete picture on the complexity of the malware landscape is possible only by discerning polymorphic instances from new variants

Get quantitative insights on the interrelations among the different families, and on the extent to which malware writers share code and produce patches to known variants

2010/7/20

Page 6: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 6

Introduction

SGNET dataset Combine clustering techniques

based on either static or behavioral characteristics of the malware samples

2010/7/20

Page 7: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 72010/7/20

RELATED WORK

Page 8: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 8

Related Work

Ghorghescu, 2005 Disassembling Comparing their basic blocks

Kolter and Maloof, 2006 Comparing a hex dump of their code

segments Wicherski, 2009, peHash

Polymorphic binaries receive the same hash value

According to the portions of the PE header that are not mutated 2010/7/20

Page 9: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 9

Related Work

Lee and Mody, 2006 Based on system call traces First attempts to cluster malware

according to its behavior Bailey et al., 2007

The first builds a clustering system that described a sample’s behavior in more abstract terms

O(n^2)

2010/7/20

Page 10: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 10

Related Work

Anubis http://anubis.iseclab.org/ Data tainting The tracking of sensitive compare

operations Dynamic analysis system for capturing a

sample’s behavior

2010/7/20

Page 11: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 112010/7/20

SGNET AND EPM CLUSTERING

Page 12: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 12

SGNET and EPM Clustering

SGNET focuses on the collection of detailed information on code injection attacks and on the sources responsible these attacks

Virus Total Anubis

2010/7/20

Page 13: Exploiting diverse observation perspectives to get insights on the malware landscape

13

SGNET and EPM Clustering

SGNET ScriptGen

Learning 0-day behavior Argos

Program flow hijack detection Nepenthes

Shellcode emulation Malware download

2010/7/20ADLab Meeting

Page 14: Exploiting diverse observation perspectives to get insights on the malware landscape

14

SGNET and EPM Clustering

Sensor: ScriptGen FSM Sample Factory: Argos Shellcode handlers: Nepenthes

2010/7/20ADLab Meeting

Page 15: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 152010/7/20

EPM CLUSTERING

Page 16: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 16

EPM Clustering

Epsilon-Gamma-Pi-Mu (EPGM) model Exploit (ε) Bogus control data (γ) Payload (π) Malware (μ)

Assumption: any randomization performed by attacker has a limited scope

Do not consider γ due to lack of host-based information in the SGNET dataset 2010/7/20

Page 17: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 17

EPM Clustering

Phase 1: feature definition

2010/7/20

Page 18: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 18

EPM Clustering

2010/7/20

Pi PUSH-based interaction PULL-based interaction Central repository

Mu PE header characteristics seem to be

more difficult to mutate The change in their value is likely to be

associated to a modification or recompilation of existing codebase

Page 19: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 19

EPM Clustering

Clearly, all of the features taken into account for the classification could be easily randomized by the malware writer

More complex (costly) polymorphic approaches might appear in the future

2010/7/20

Page 20: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 20

EPM Clustering

Phase 2: invariant discovery An invariant value is a value that is not

specific to a certain .. Attack instance Attacker Destination

Threshold-based: At least 10 different attack instances At least 3 different attackers At least 3 honeypot IPs

2010/7/20

Page 21: Exploiting diverse observation perspectives to get insights on the malware landscape

21

EPM Clustering

Phase 3: pattern discovery T = v1, v2, v3, …, vn

2010/7/20ADLab Meeting

Page 22: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 22

EPM Clustering

Phase 4: pattern-based classification Clustering Multiple patterns could match the same

instance Each instance is always associated with

the most specific pattern matching its feature values

All the instances associated to the same pattern are said to belong to the same EPM cluster

2010/7/20

Page 23: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 23

EPM Clustering

E-clusters Exploit

P-clusters Payload

M-clusters Malware

2010/7/20

Page 24: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 24

EPM Clustering

B-Cluster Anubis Compare two samples based on their

behavioral profile

2010/7/20

Page 25: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 252010/7/20

RESULTS

Page 26: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 26

Results

Data: Jan 2008 ~ May 2009, collected by SGNET deployment

6353 malware samples Only 5165 can be correctly executed in

Anubis Some malwares can not download

correctly by Nepenthes

2010/7/20

Page 27: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 27

Results

39 E-clusters 27 P-clusters 260 M-clusters 972 B-clusters

2010/7/20

Page 28: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 28

Results

2010/7/20

Page 29: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 29

Results

#(exploit/payload combinations) is low Most malware variants seem to be

sharing few distinct exploitation routines for propagation

#(B-clusters) is lower than #(M-clusters) Some M-clusters are likely to correspond

to variations of the same codebase

2010/7/20

Page 30: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 30

Results

Clustering anomalies 860 B-clusters are composed of a single

malware sample and are associated to a single attack instance in the SGNET dataset

A small number of size-1 B-clusters have a 1-1 association with a static M-cluster

Mostly…

2010/7/20

Page 31: Exploiting diverse observation perspectives to get insights on the malware landscape

31

Results

2010/7/20ADLab Meeting

Page 32: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 32

Results

P-pattern 45: PUSH-based download TCP port 9988

2010/7/20

Page 33: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 33

Results

M-cluster 13:

2010/7/20

Page 34: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 34

Results

M-cluster 13 is a polymorphic malware associated to several different B-clusters MD5 is not an invariant Allaple mutates its content at each

attack instance

2010/7/20

Page 35: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 35

Results

Each behavioral profile corresponds to an execution time of 4 mins Bot? Honeypots may help!

2010/7/20

Page 36: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 36

Results

2010/7/20

Page 37: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 37

Results

Allaple Worm exploiting MS04-007 DoS attacks

2010/7/20

Page 38: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 38

Results

IRC servers

2010/7/20

Page 39: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 392010/7/20

CONCLUSION

Page 40: Exploiting diverse observation perspectives to get insights on the malware landscape

ADLab Meeting 40

Conclusion

Combine different clustering techniques Improve effectiveness in building

intelligence on the threats economy

2010/7/20