using sequence statistics to fight advanced persistent threats
TRANSCRIPT
© 2016 MapR Technologies 1© 2014 MapR Technologies
Using Sequence Statistics to Fight Advanced Persistent ThreatsTed Dunning
© 2016 MapR Technologies 2
Contact Information
Ted DunningChief Applications Architect at MapR Technologies
Committer & PMC for Apache’s Drill, Zookeeper & othersVP of Incubator at Apache Foundation
Email [email protected] [email protected]
Twitter @ted_dunning
Hashtags today: #hs16dublin #mapr
© 2016 MapR Technologies 3
Agenda• What’s this persistent threat stuff?
– What attackers do– How they do it
• Examples• Sequence statistics
– Really geeking with gas now!• Detection techniques• Specifics• Summary
© 2016 MapR Technologies 4
Agenda of All Security Talks• Terror• Faint hope• More terror• Practical suggestions• Summary
© 2016 MapR Technologies 5
Operation Ababil – Brobots on Parade• Dork attack to find unpatched default Joomla sites
– Especially web servers with high bandwidth connections– Basically just Google searches for default strings– Joomla compromised into attack Brobot
• C&C network checks in occasionally– Note C&C is incoming request and looks like normal web requests
• Later, on command, multiple Brobots direct 50-75 Gb/s of attack– Attacks come from white-listed sites
© 2016 MapR Technologies 6
Attack Sequence
© 2016 MapR Technologies 7
Attack Sequence
© 2016 MapR Technologies 8
Attack Sequence
© 2016 MapR Technologies 9
Attack Sequence
© 2016 MapR Technologies 10
Outline of an Advanced Persistent Threat• Advanced
– Common use of zero-day for preliminary attacks– Often attributed to state-level actors– Modern privateers blur the line
• Persistent– Result of first attack is heavily muffled, no immediate exploit– Remote access toolset installed (RAT)
• Threat– On command, data is exfiltrated covertly or en masse– Or the compromised host is used for other nefarious purpose
© 2016 MapR Technologies 11
APT in Summary• Attack, penetrate, pivot, exfiltrate or exploit• If you are a high-value target, attack is likely and stealthy
– High-value = telecom, banks, utilities, retail targets, web100– … and all their vendors– Conventional multi-factor auth is easily breached
• Penetration and pivot are critical counter-measure opportunities– In 2010, RAT would contact command and control (C&C)– In 2016, C&C looks like normal traffic
• Once exfiltration or exploit starts, you may no longer have a business
© 2016 MapR Technologies 12
So are we totally screwed?
© 2016 MapR Technologies 13
So are we totally screwed?
Not entirely!
© 2016 MapR Technologies 14
Event Sequences Provide Clues• Event sequence appear in many places• Headers
– Header types, ordering in requests• IP address accesses
– Source and destination, sequences of either• TLS options
– Which options, which values, which algorithms• Incoming component request ordering and timing
– Body first, CSS, scripts and images next– But which are cached, what is round-trip time?
© 2016 MapR Technologies 15
Sequences and Cooccurrences• All of these characteristics form symbolic sequences
• Current systems use hand-crafted rules about particular state– But hand-crafting depends on human knowledge
• We can do much, much better by considering cooccurrence and ordering of symbols in these sequences
• Log-likelihood ratio test (jargon alert) is a key tool
© 2016 MapR Technologies 16
A core technique• Many of these easy problems reduce to finding interesting
coincidences
• This can be summarized as a 2 x 2 table
• Actually, many of these tables
A OtherB k11 k12
Other
k21 k22
© 2016 MapR Technologies 17
How do you do that?• This is well handled using G-test
– See wikipedia– See http://bit.ly/surprise-and-coincidence
• Original application in linguistics now cited > 2000 times
• Available in ElasticSearch, in Solr, in Mahout• Available in R, C, Java, Python
© 2016 MapR Technologies 18
Which one is the anomalous co-occurrence?
A not AB 13 1000
not B 1000 100,000
A not AB 1 0
not B 0 10,000
A not AB 10 0
not B 0 100,000
A not AB 1 0
not B 0 2
© 2016 MapR Technologies 19
Which one is the anomalous co-occurrence?
A not AB 13 1000
not B 1000 100,000
A not AB 1 0
not B 0 10,000
A not AB 10 0
not B 0 100,000
A not AB 1 0
not B 0 20.90 1.95
4.52 14.3
Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)
© 2016 MapR Technologies 20
How to Count (header-like documents)For each “document”: For each “word” A: left[A]++ For each “word” B after that (within window): count[A,B]++ right[B]++ total++
© 2016 MapR Technologies 21
• We wanted this 2 x 2 table for each A,B
• But we only counted k11 directly• But we did count
k*1 = k11 + k21 (how many A’s we saw)k1* = k11 + k12 (how many B’s we saw)k** = k11 + k21 + k12 + k22 (how many pairs in total)
A OtherB k11 k12
Other
k21 k22
© 2016 MapR Technologies 22
How to Count (continued)Map<PriorityQueue> queuefor each pair (A,B) k11 = count[A,B] k1x = left[A] kx1 = right[B] kxx = total k12 = k1x - k11 k21 = kx2 - k11 k22 = kxx - k11 - k12 - k21 queue.add(A, (LLR(k11,k12,k21,k22), B))
© 2016 MapR Technologies 23
How to Count (cooccurrence)for each (C,B)=(“context”, “word”): if (!filter(C) && !filter(B)): right[B]++ for each A in history(C): count[A,B]++ left[A]++ history(C) += B total++
© 2016 MapR Technologies 24
Seriously...It really can be that simple
© 2016 MapR Technologies 25
Basic techniques• Counting – often the hardest part• LLR – the basic tool• Order models
– Ordered cooccurrences– Transition probabilities– Recurrent neural networks
• Ploughing a quiet field– Reimage servers often– Force attackers to pivot repeatedly
© 2016 MapR Technologies 26
Example 1 - Ababil
Defense has to happen here
© 2016 MapR Technologies 27
Spot the Important Difference?
Attacker request Real request
© 2016 MapR Technologies 28
Spot the Important Difference?
Attacker request Real request
© 2016 MapR Technologies 29
This could only be found at scale
© 2016 MapR Technologies 30
Overall Outline Again
Tradecraft error!
© 2016 MapR Technologies 31
Large corpus analysis of source IP’s wins big
© 2016 MapR Technologies 32
© 2016 MapR Technologies 33
Example 2 - Common Point of Compromise• Scenario:
– Merchant 0 is compromised, leaks account data during compromise– Fraud committed elsewhere during exploit– High background level of fraud– Limited detection rate for exploits
• Goal:– Find merchant 0
• Meta-goal:– Screen algorithms for this task without leaking sensitive data
© 2016 MapR Technologies 34
Example 2 - Common Point of Compromise
Card data is stolen from Merchant 0
That data is used in frauds at other merchants
© 2016 MapR Technologies 35
Simulation Setup
© 2016 MapR Technologies 36
Simulation Strategy• For each consumer
– Pick consumer parameters such as transaction rate, preferences– Generate transactions until end of sim-time
• If merchant 0 during compromise time, possibly mark as compromised• For all transactions, possible mark as fraud, probability depends on history• Merchants are selected using hierarchical Pittman-Yor
• Restate data– Flatten transaction streams– Sort by time
• Tunables– Compromise probability, transaction rates, background fraud, detection
probability
© 2016 MapR Technologies 37
© 2016 MapR Technologies 38
Really truly bad guys
© 2016 MapR Technologies 39
Historical cooccurrence gives high S/N
© 2016 MapR Technologies 40
Summary• The world can be seen as sequences of symbols
• We can find patterns
• Those patterns can nail opponents
• Many patterns only appear at scale
• You can do this
© 2016 MapR Technologies 41
© 2016 MapR Technologies 42
Short Books by Ted Dunning & Ellen Friedman• Published by O’Reilly in 2014 and 2015• For sale from Amazon or O’Reilly• Free e-books currently available courtesy of MapR
http://bit.ly/ebook-real-world-hadoop
http://bit.ly/mapr-tsdb-ebook
http://bit.ly/ebook-anomaly
http://bit.ly/recommendation-ebook
© 2016 MapR Technologies 43
Streaming Architectureby Ted Dunning and Ellen Friedman © 2016 (published by O’Reilly)
Free copies at book signing today(oops… that was earlier)
http://bit.ly/mapr-ebook-streams
© 2016 MapR Technologies 44
Thank You!
© 2016 MapR Technologies 45
Q & A@mapr maprtech
Engage with us!
MapR
maprtech
mapr-technologies