using sequence statistics to fight advanced persistent threats

45
© 2016 MapR Technologies 1 © 2014 MapR Technologies Using Sequence Statistics to Fight Advanced Persistent Threats Ted Dunning

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

528 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 1© 2014 MapR Technologies

Using Sequence Statistics to Fight Advanced Persistent ThreatsTed Dunning

Page 2: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 2

Contact Information

Ted DunningChief Applications Architect at MapR Technologies

Committer & PMC for Apache’s Drill, Zookeeper & othersVP of Incubator at Apache Foundation

Email [email protected] [email protected]

Twitter @ted_dunning

Hashtags today: #hs16dublin #mapr

Page 3: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 3

Agenda• What’s this persistent threat stuff?

– What attackers do– How they do it

• Examples• Sequence statistics

– Really geeking with gas now!• Detection techniques• Specifics• Summary

Page 4: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 4

Agenda of All Security Talks• Terror• Faint hope• More terror• Practical suggestions• Summary

Page 5: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 5

Operation Ababil – Brobots on Parade• Dork attack to find unpatched default Joomla sites

– Especially web servers with high bandwidth connections– Basically just Google searches for default strings– Joomla compromised into attack Brobot

• C&C network checks in occasionally– Note C&C is incoming request and looks like normal web requests

• Later, on command, multiple Brobots direct 50-75 Gb/s of attack– Attacks come from white-listed sites

Page 6: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 6

Attack Sequence

Page 7: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 7

Attack Sequence

Page 8: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 8

Attack Sequence

Page 9: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 9

Attack Sequence

Page 10: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 10

Outline of an Advanced Persistent Threat• Advanced

– Common use of zero-day for preliminary attacks– Often attributed to state-level actors– Modern privateers blur the line

• Persistent– Result of first attack is heavily muffled, no immediate exploit– Remote access toolset installed (RAT)

• Threat– On command, data is exfiltrated covertly or en masse– Or the compromised host is used for other nefarious purpose

Page 11: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 11

APT in Summary• Attack, penetrate, pivot, exfiltrate or exploit• If you are a high-value target, attack is likely and stealthy

– High-value = telecom, banks, utilities, retail targets, web100– … and all their vendors– Conventional multi-factor auth is easily breached

• Penetration and pivot are critical counter-measure opportunities– In 2010, RAT would contact command and control (C&C)– In 2016, C&C looks like normal traffic

• Once exfiltration or exploit starts, you may no longer have a business

Page 12: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 12

So are we totally screwed?

Page 13: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 13

So are we totally screwed?

Not entirely!

Page 14: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 14

Event Sequences Provide Clues• Event sequence appear in many places• Headers

– Header types, ordering in requests• IP address accesses

– Source and destination, sequences of either• TLS options

– Which options, which values, which algorithms• Incoming component request ordering and timing

– Body first, CSS, scripts and images next– But which are cached, what is round-trip time?

Page 15: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 15

Sequences and Cooccurrences• All of these characteristics form symbolic sequences

• Current systems use hand-crafted rules about particular state– But hand-crafting depends on human knowledge

• We can do much, much better by considering cooccurrence and ordering of symbols in these sequences

• Log-likelihood ratio test (jargon alert) is a key tool

Page 16: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 16

A core technique• Many of these easy problems reduce to finding interesting

coincidences

• This can be summarized as a 2 x 2 table

• Actually, many of these tables

A OtherB k11 k12

Other

k21 k22

Page 17: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 17

How do you do that?• This is well handled using G-test

– See wikipedia– See http://bit.ly/surprise-and-coincidence

• Original application in linguistics now cited > 2000 times

• Available in ElasticSearch, in Solr, in Mahout• Available in R, C, Java, Python

Page 18: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 18

Which one is the anomalous co-occurrence?

A not AB 13 1000

not B 1000 100,000

A not AB 1 0

not B 0 10,000

A not AB 10 0

not B 0 100,000

A not AB 1 0

not B 0 2

Page 19: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 19

Which one is the anomalous co-occurrence?

A not AB 13 1000

not B 1000 100,000

A not AB 1 0

not B 0 10,000

A not AB 10 0

not B 0 100,000

A not AB 1 0

not B 0 20.90 1.95

4.52 14.3

Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)

Page 20: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 20

How to Count (header-like documents)For each “document”: For each “word” A: left[A]++ For each “word” B after that (within window): count[A,B]++ right[B]++ total++

Page 21: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 21

• We wanted this 2 x 2 table for each A,B

• But we only counted k11 directly• But we did count

k*1 = k11 + k21 (how many A’s we saw)k1* = k11 + k12 (how many B’s we saw)k** = k11 + k21 + k12 + k22 (how many pairs in total)

A OtherB k11 k12

Other

k21 k22

Page 22: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 22

How to Count (continued)Map<PriorityQueue> queuefor each pair (A,B) k11 = count[A,B] k1x = left[A] kx1 = right[B] kxx = total k12 = k1x - k11 k21 = kx2 - k11 k22 = kxx - k11 - k12 - k21 queue.add(A, (LLR(k11,k12,k21,k22), B))

Page 23: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 23

How to Count (cooccurrence)for each (C,B)=(“context”, “word”): if (!filter(C) && !filter(B)): right[B]++ for each A in history(C): count[A,B]++ left[A]++ history(C) += B total++

Page 24: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 24

Seriously...It really can be that simple

Page 25: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 25

Basic techniques• Counting – often the hardest part• LLR – the basic tool• Order models

– Ordered cooccurrences– Transition probabilities– Recurrent neural networks

• Ploughing a quiet field– Reimage servers often– Force attackers to pivot repeatedly

Page 26: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 26

Example 1 - Ababil

Defense has to happen here

Page 27: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 27

Spot the Important Difference?

Attacker request Real request

Page 28: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 28

Spot the Important Difference?

Attacker request Real request

Page 29: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 29

This could only be found at scale

Page 30: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 30

Overall Outline Again

Tradecraft error!

Page 31: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 31

Large corpus analysis of source IP’s wins big

Page 32: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 32

Page 33: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 33

Example 2 - Common Point of Compromise• Scenario:

– Merchant 0 is compromised, leaks account data during compromise– Fraud committed elsewhere during exploit– High background level of fraud– Limited detection rate for exploits

• Goal:– Find merchant 0

• Meta-goal:– Screen algorithms for this task without leaking sensitive data

Page 34: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 34

Example 2 - Common Point of Compromise

Card data is stolen from Merchant 0

That data is used in frauds at other merchants

Page 35: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 35

Simulation Setup

Page 36: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 36

Simulation Strategy• For each consumer

– Pick consumer parameters such as transaction rate, preferences– Generate transactions until end of sim-time

• If merchant 0 during compromise time, possibly mark as compromised• For all transactions, possible mark as fraud, probability depends on history• Merchants are selected using hierarchical Pittman-Yor

• Restate data– Flatten transaction streams– Sort by time

• Tunables– Compromise probability, transaction rates, background fraud, detection

probability

Page 37: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 37

Page 38: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 38

Really truly bad guys

Page 39: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 39

Historical cooccurrence gives high S/N

Page 40: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 40

Summary• The world can be seen as sequences of symbols

• We can find patterns

• Those patterns can nail opponents

• Many patterns only appear at scale

• You can do this

Page 41: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 41

Page 42: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 42

Short Books by Ted Dunning & Ellen Friedman• Published by O’Reilly in 2014 and 2015• For sale from Amazon or O’Reilly• Free e-books currently available courtesy of MapR

http://bit.ly/ebook-real-world-hadoop

http://bit.ly/mapr-tsdb-ebook

http://bit.ly/ebook-anomaly

http://bit.ly/recommendation-ebook

Page 43: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 43

Streaming Architectureby Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)

Free copies at book signing today(oops… that was earlier)

http://bit.ly/mapr-ebook-streams

Page 44: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 44

Thank You!

Page 45: Using Sequence Statistics to Fight Advanced Persistent Threats

© 2016 MapR Technologies 45

Q & A@mapr maprtech

[email protected]

Engage with us!

MapR

maprtech

mapr-technologies