acing the ioc game: toward automatic discovery and ...dvotipka/misc/osuciu_020117.pdfacing the ioc...

41
Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence Liao, Xiaojing, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem Beyah Presented by Octavian Suciu 1

Upload: others

Post on 26-Mar-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Acing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Liao, Xiaojing, Kan Yuan, XiaoFeng Wang, Zhou Li, Luyi Xing, and Raheem BeyahPresented by Octavian Suciu

1

Page 2: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

What are Indicators of Compromise (IOCs)?• Forensic artifacts of intrusion– virus signatures– IPs/domains used by botnets– MD5s of malware

• Shared across the community– Threat Intelligence platforms

• Used as input to security products– IDS, AVs

2

Page 3: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

The OpenIOC Format

3

Page 4: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

The OpenIOC Format

4

Context = IOC Category [write registry key]

Page 5: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

The OpenIOC Format

5

Content = Specific artifact [file bing modified]

Page 6: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

How is security information disseminated?• Blogs• Forums• Social Networks• Blacklists & Databases• Papers & Technical reports [FeatureSmith!]• Underground markers

6

Page 7: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Problem

7

Natural Language vs Machine-readable Format

Page 8: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Problem (2)• Volume & Velocity harden manual conversion

8

Page 9: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Problem (3)• Information Extraction tools are ineffective– domain-specific– high false positive rate

9

The Trojan downloads a file ok.zip from the server

It’s available as a Free 30 day trial download.

X

Page 10: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

iACE Solution• Automated IOC extractor from technical blogs• Key observation:– Discourse in technical blogs is consistent and stable

• iACE in a nutshell:– discover an IOC token (ok.zip)– identify context (downloads)– analyze their grammatical relation – classify relation based on similarity with others

10

The Trojan downloads a file ok.zip from the server

Page 11: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

11

Page 12: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

12

Page 13: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

iACE Architecture

13

Page 14: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

iACE Architecture

14

Continuously download websites

Page 15: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

iACE Architecture

15

Filter out non-technical pages (i.e. login pages)

Page 16: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

iACE Architecture

16

Get sentences likely containing IOCs

Page 17: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

iACE Architecture

17

Check if the extracted relations are IOCs

Page 18: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

iACE Architecture

18

Generate the OpenIOC format

Page 19: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Blog Scraper (BS)• Download complete websites

• Monitor them for new posts

19

Page 20: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Blog Preprocessor (BP)• Remove template from webpages– Retain only user-generated content

• Convert content to text– OCR on images, PDF to text

• Filter pages based on topic– topic words– article length– density of dictionary words

20

Page 21: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Relevant Content Picker (RCP)• Split text into sentences

• Determine IOC tokens (ok.zip)– regex matching

• Identify context terms (downloads)– dictionary of relevant terms

21

The Trojan downloads a file ok.zip from the server

Page 22: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Relation Checker (RC)

22

Page 23: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Relation Checker (RC)

• Graph Mining – Similarity metric for directed graphs– Compute the number of identical random walks

occuring in both graphs

23

Page 24: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Relation Checker (RC)

• Train classifier on relations from ground truth• Classify new relations based on their similarity to

ground truth

24

Page 25: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

IOC Generator (IG)• Generate Definition and Header in OpenIOC

format– map context & IOC terms to XML

25

Page 26: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

26

Page 27: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Datasets• DS-Labeled (used for training)– 450 articles– 1,500 true IOC sentences– 3,000 false IOC sentences

• DS-Unknown (used for testing)– 45 blogs– 71,000 articles

• Training sample size is small

27

Page 28: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

28

Page 29: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Performance of iACE• Precision = fraction of identified IOCs that are

truly IOCs• Recall = fraction of IOCs that are identified

29

Precision Recall

Topic Classifier 98% 100%

iACE on DS-Labeled 98% 92%

iACE on DS-Unknown 95% 90%

Page 30: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Performance of Existing Systems• Precision = fraction of identified IOCs that are

truly IOCs• Recall = fraction of IOCs that are identified

30

Precision Recall

iACE 98% 93%

AlienVault OTX 72% 56%

Stanford NER 71% 47%

Page 31: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

31

Page 32: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Discovered IOCs• 45 blogs

• 71,000 articles (DS-Unknown)

• 20,000 identified as containing IOCs

• 900,000 IOCs identified

32

Page 33: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

How are IOCs related to each other?• Cluster articles on infrastructure-related IOCs– IPs, domains, email addresses– 527 clusters likely corresponding to campaigns

– Little cross-reference between articles in same cluster– This allowed the discovery of new campaigns

33

Page 34: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

How do IOCs evolve over time?• Cluster articles on attack vector IOCs– malware hashes, CVEs– measure decay time = # of consecutive months while

an IOC was mentioned

– most attack vectors are short lived– long lasting attacks pointed to small set of C&C

servers that were not taken down

34

Page 35: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

What is the impact of IOCs on defenses?• How fast are IOCs adopted by the industry?

– Measure the time difference between when IOCs are blogged about and when they are detected on VirusTotal

– 47% of IOCs were detected before being blogged about– AVs respond much slower to domains & IPs than to hashes

35

Page 36: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

What is the quality of the 45 blogs?

• Timeliness = % time being the first to report on IOCs– 10 blogs report first on 60% of campaigns– a blog with 13% timeliness has 84% exclusive IOCs

36

Page 37: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

What is the quality of the 45 blogs?

• Completeness = % of IOCs reported and how diverse they are (different types)– 6 blogs reported 40% of IOCs– 9 blogs reported 50% of IOC types

37

Page 38: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

What is the quality of the 45 blogs?

• Robustness = % of robust IOCs that are reported (these that remain unchanged during campaigns)– C&C servers, registry email are robust during

campaign– one blog reports 87% of the robust IOCs

38

Page 39: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Outline• System Design• Datasets• Evaluation• Security Findings• Discussion

39

Page 40: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Octavian Suciu:: Acing the IOC Game

Discussion• iACE automates IOC generation and has good

performance• Allows an analysis of impact, evolution and

relations between IOCs from technical blogs

• Limitations– Errors due to natural language ambiguity:

• e.g. masking http as hxxp in URLs– Other intelligence sources are also valuable

• iACE assumptions might not hold

40

Page 41: Acing the IOC Game: Toward Automatic Discovery and ...dvotipka/misc/osuciu_020117.pdfAcing the IOC Game: Toward Automatic Discovery and Analysis of Open-Source Cyber Threat Intelligence

Thank you!

Octavian Suciu:: Acing the IOC Game

Octavian [email protected]

41