acing the ioc game: toward automatic discovery and ...dvotipka/misc/osuciu_020117.pdfacing the ioc...

Post on 26-Mar-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Liao, Xiaojing, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem BeyahPresented by Octavian Suciu

1

Octavian Suciu:: Acing the IOC Game

What are Indicators of Compromise (IOCs)?• Forensic artifacts of intrusion– virus signatures– IPs/domains used by botnets– MD5s of malware

• Shared across the community– Threat Intelligence platforms

• Used as input to security products– IDS, AVs

2

Octavian Suciu:: Acing the IOC Game

The OpenIOC Format

3

Octavian Suciu:: Acing the IOC Game

The OpenIOC Format

4

Context = IOC Category [write registry key]

Octavian Suciu:: Acing the IOC Game

The OpenIOC Format

5

Content = Specific artifact [file bing modified]

Octavian Suciu:: Acing the IOC Game

How is security information disseminated?• Blogs• Forums• Social Networks• Blacklists & Databases• Papers & Technical reports [FeatureSmith!]• Underground markers

6

Octavian Suciu:: Acing the IOC Game

Problem

7

Natural Language vs Machine-readable Format

Octavian Suciu:: Acing the IOC Game

Problem (2)• Volume & Velocity harden manual conversion

8

Octavian Suciu:: Acing the IOC Game

Problem (3)• Information Extraction tools are ineffective– domain-specific– high false positive rate

9

The Trojan downloads a file ok.zip from the server

It’s available as a Free 30 day trial download.

X

Octavian Suciu:: Acing the IOC Game

iACE Solution• Automated IOC extractor from technical blogs• Key observation:– Discourse in technical blogs is consistent and stable

• iACE in a nutshell:– discover an IOC token (ok.zip)– identify context (downloads)– analyze their grammatical relation – classify relation based on similarity with others

10

The Trojan downloads a file ok.zip from the server

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

11

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

12

Octavian Suciu:: Acing the IOC Game

iACE Architecture

13

Octavian Suciu:: Acing the IOC Game

iACE Architecture

14

Continuously download websites

Octavian Suciu:: Acing the IOC Game

iACE Architecture

15

Filter out non-technical pages (i.e. login pages)

Octavian Suciu:: Acing the IOC Game

iACE Architecture

16

Get sentences likely containing IOCs

Octavian Suciu:: Acing the IOC Game

iACE Architecture

17

Check if the extracted relations are IOCs

Octavian Suciu:: Acing the IOC Game

iACE Architecture

18

Generate the OpenIOC format

Octavian Suciu:: Acing the IOC Game

Blog Scraper (BS)• Download complete websites

• Monitor them for new posts

19

Octavian Suciu:: Acing the IOC Game

Blog Preprocessor (BP)• Remove template from webpages– Retain only user-generated content

• Convert content to text– OCR on images, PDF to text

• Filter pages based on topic– topic words– article length– density of dictionary words

20

Octavian Suciu:: Acing the IOC Game

Relevant Content Picker (RCP)• Split text into sentences

• Determine IOC tokens (ok.zip)– regex matching

• Identify context terms (downloads)– dictionary of relevant terms

21

The Trojan downloads a file ok.zip from the server

Octavian Suciu:: Acing the IOC Game

Relation Checker (RC)

22

Octavian Suciu:: Acing the IOC Game

Relation Checker (RC)

• Graph Mining – Similarity metric for directed graphs– Compute the number of identical random walks

occuring in both graphs

23

Octavian Suciu:: Acing the IOC Game

Relation Checker (RC)

• Train classifier on relations from ground truth• Classify new relations based on their similarity to

ground truth

24

Octavian Suciu:: Acing the IOC Game

IOC Generator (IG)• Generate Definition and Header in OpenIOC

format– map context & IOC terms to XML

25

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

26

Octavian Suciu:: Acing the IOC Game

Datasets• DS-Labeled (used for training)– 450 articles– 1,500 true IOC sentences– 3,000 false IOC sentences

• DS-Unknown (used for testing)– 45 blogs– 71,000 articles

• Training sample size is small

27

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

28

Octavian Suciu:: Acing the IOC Game

Performance of iACE• Precision = fraction of identified IOCs that are

truly IOCs• Recall = fraction of IOCs that are identified

29

Precision Recall

Topic Classifier 98% 100%

iACE on DS-Labeled 98% 92%

iACE on DS-Unknown 95% 90%

Octavian Suciu:: Acing the IOC Game

Performance of Existing Systems• Precision = fraction of identified IOCs that are

truly IOCs• Recall = fraction of IOCs that are identified

30

Precision Recall

iACE 98% 93%

AlienVault OTX 72% 56%

Stanford NER 71% 47%

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

31

Octavian Suciu:: Acing the IOC Game

Discovered IOCs• 45 blogs

• 71,000 articles (DS-Unknown)

• 20,000 identified as containing IOCs

• 900,000 IOCs identified

32

Octavian Suciu:: Acing the IOC Game

How are IOCs related to each other?• Cluster articles on infrastructure-related IOCs– IPs, domains, email addresses– 527 clusters likely corresponding to campaigns

– Little cross-reference between articles in same cluster– This allowed the discovery of new campaigns

33

Octavian Suciu:: Acing the IOC Game

How do IOCs evolve over time?• Cluster articles on attack vector IOCs– malware hashes, CVEs– measure decay time = # of consecutive months while

an IOC was mentioned

– most attack vectors are short lived– long lasting attacks pointed to small set of C&C

servers that were not taken down

34

Octavian Suciu:: Acing the IOC Game

What is the impact of IOCs on defenses?• How fast are IOCs adopted by the industry?

– Measure the time difference between when IOCs are blogged about and when they are detected on VirusTotal

– 47% of IOCs were detected before being blogged about– AVs respond much slower to domains & IPs than to hashes

35

Octavian Suciu:: Acing the IOC Game

What is the quality of the 45 blogs?

• Timeliness = % time being the first to report on IOCs– 10 blogs report first on 60% of campaigns– a blog with 13% timeliness has 84% exclusive IOCs

36

Octavian Suciu:: Acing the IOC Game

What is the quality of the 45 blogs?

• Completeness = % of IOCs reported and how diverse they are (different types)– 6 blogs reported 40% of IOCs– 9 blogs reported 50% of IOC types

37

Octavian Suciu:: Acing the IOC Game

What is the quality of the 45 blogs?

• Robustness = % of robust IOCs that are reported (these that remain unchanged during campaigns)– C&C servers, registry email are robust during

campaign– one blog reports 87% of the robust IOCs

38

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

39

Octavian Suciu:: Acing the IOC Game

Discussion• iACE automates IOC generation and has good

performance• Allows an analysis of impact, evolution and

relations between IOCs from technical blogs

• Limitations– Errors due to natural language ambiguity:

• e.g. masking http as hxxp in URLs– Other intelligence sources are also valuable

• iACE assumptions might not hold

40

Thank you!

Octavian Suciu:: Acing the IOC Game

Octavian Suciuosuciu@umiacs.umd.edu

41

top related