applying machine learning to network security monitoring - baythreat 2013
DESCRIPTION
Video (at YouTube) - http://bit.ly/19TNSTF Big Data Security Analytics, Data Science and Machine Learning are a few of the new buzzwords that have invaded out industry of late. Most of what we hear are promises of an unicorn-laden, silver-bullet panacea by heavy-handed marketing folks, evoking an expected pushback from the most enlightened members of our community. This talk will help parse what we as a community need to know and understand about these concepts and help understand where the technical details and actual capabilities of those concepts and also where they fail and how they can be exploited and fooled by an attacker. The talk will also share results of the author's current ongoing research (on MLSec Project) of applying machine learning techniques to information secuirty monitoring.TRANSCRIPT
![Page 1: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/1.jpg)
Applying Machine Learning to Network Security Monitoring
Alexandre Pinto Chief Data Scien4st | MLSec Project
@alexcpsec @MLSecProject!
![Page 2: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/2.jpg)
• This is a talk about BUILDING not breaking – NO systems were harmed on the development of this talk. – This is NOT about 1337 Android Malware
• Only thing we are likely to break here is the 4me limit on the talk
• This talk includes more MATH than the daily recommended
intake by the FDA.
• All stunts described in this talk were performed by trained professionals.!
WARNING!
![Page 3: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/3.jpg)
• 13 years in Informa4on Security, done a liRle bit of everything. • Past 7 or so years leading security consultancy and monitoring
teams in Brazil, London and the US. – If there is any way a SIEM can hurt you, it did to me.
• Researching machine learning and data science in general for the past year or so and presen4ng about the intersec4on of it and Infosec throughout the year.
• Created MLSec Project in July 2013 to give structure to the research being done.
Who's Alex?
![Page 4: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/4.jpg)
• Defini4ons • Big Data • Data Science • Machine Learning
• Y U DO DIS? • Network Security Monitoring • PoC || GTFO • Feature Intui4on • How to get started?
Agenda
![Page 5: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/5.jpg)
Big Data + Machine Learning + Data Science
![Page 6: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/6.jpg)
Big Data + Machine Learning + Data Science
![Page 7: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/7.jpg)
Big Data
![Page 8: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/8.jpg)
(Security) Data ScienEst
Data Science Venn Diagram by Drew Conway!
• “Data Scien4st (n.): Person who is beRer at sta4s4cs than any so`ware engineer and beRer at so`ware engineering than any sta4s4cian.”
-‐-‐ Josh Willis, Cloudera
![Page 9: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/9.jpg)
• “Machine learning systems automa4cally learn programs from data” (*)
• You don’t really code the program, but it is inferred from data.
• Intui4on of trying to mimic the way the brain learns: that's where terms like ar#ficial intelligence come from.!
Enter Machine Learning
(*) CACM 55(10) -‐ A Few Useful Things to Know about Machine Learning (Domingos 2012)
![Page 10: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/10.jpg)
• Supervised Learning: – Classifica4on (NN, SVM, Naïve Bayes)
– Regression (linear, logis4c)!
Kinds of Machine Learning
Source – scikit-‐learn.github.io/scikit-‐learn-‐tutorial/general_concepts.html
• Unsupervised Learning : – Clustering (k-‐means) – Decomposi4on (PCA, SVD)
![Page 11: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/11.jpg)
ClassificaEon Example
VS!
![Page 12: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/12.jpg)
Regression Example
![Page 13: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/13.jpg)
ConsideraEons on Data Gathering • Models will (generally) get beRer with more data
– But we always have to consider bias and variance as we select our data points
– Also adversaries – we may be force fed “bad data”, find signal in weird noise or design bad (or exploitable) features
• “I’ve got 99 problems, but data ain’t one”!
Domingos, 2012 Abu-‐Mostafa, Caltech, 2012
![Page 14: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/14.jpg)
• Sales!
ApplicaEons of Machine Learning
• Trading
• Image and Voice Recogni4on
![Page 15: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/15.jpg)
• Common reac4ons from Security Professionals: • “Eh, cool…” *blank stare* *walks away* • “Are you high, bro?”
Y U DO DIS?
• “Why aren’t you doing some cool research like Android Malware?”
![Page 16: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/16.jpg)
Math is HARD
![Page 17: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/17.jpg)
• Fraud detec4on systems: – Is what he just did consistent with past behavior?
• Network anomaly detec4on (?): – More like bad sta4s4cal analysis – Did not advance a lot, IMO
• Predic4ng likelihood of aRack actors – Create different predic4ve models and chain them to gain more confidence in each step.!
Security ApplicaEons of ML
• SPAM filters
![Page 18: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/18.jpg)
• Adversaries -‐ Exploi4ng the learning process • Understand the model, understand the machine, and you can circumvent it
• Something InfoSec community knows very well • Any predic4ve model on InfoSec will be pushed to the limit
• Again, think back on the way SPAM engines evolved.!
ConsideraEons on Data Gathering
![Page 19: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/19.jpg)
Network Security Monitoring
![Page 20: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/20.jpg)
• Rules in a SIEM solu4on invariably are: – “Something” has happened “x” 4mes; – “Something” has happened and other “something2” has happened, with some rela4onship (4me, same fields, etc) between them.
• Configuring SIEM = iterate on combina4ons un4l: – Customer or management is foole.. I mean sa4sfied; – Consul4ng money runs out
• Behavioral rules (anomaly detec4on) helps a bit with the “x”s, but s4ll, very laborious and 4me consuming.!
CorrelaEon Rules: A Primer
![Page 21: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/21.jpg)
• Alert-‐based: – “Tradi4onal” log management – SIEM – Using “Threat Intelligence” (i.e blacklists) for about a year or so
– Lack of context – Low effec4veness – You get the results handed over to you
Kinds of Network Security Monitoring
• Explora4on-‐based: – Network Forensics tools (2/3 years ago)
– Elas4c Search based LM systems
– High effec4veness – Lots of people necessary – Lots of HIGHLY trained people
• Big Data Security Analy4cs (BDSA): – Run explora4on-‐based monitoring on Hadoop – More like Big Data Security Monitoring (BDSM)
![Page 22: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/22.jpg)
Alert-‐based + ExploraEon-‐based
![Page 23: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/23.jpg)
A wild army of robots appears
![Page 24: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/24.jpg)
Using robots to catch bad guys
![Page 25: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/25.jpg)
• We developed a set of algorithms to detect malicious behavior from log entries of firewall blocks
• Over 6 months of data from SANS DShield (thanks, guys!) • A`er a lot of sta4s4cal-‐based math (true posi4ve ra4o, true nega4ve ra4o, odds likelihood), it could pinpoint actors that would be 13x-‐18x more likely to aRack you.
• Today more like 30x on the SANS data, and finding around 80% of “badness” in par4cipant deployments.!
PoC || GTFO
![Page 26: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/26.jpg)
• Assump4ons to aggregate the data • Correla4on / proximity / similarity BY BEHAVIOR • “Bad Neighborhoods” concept: – Spamhaus x CyberBunker – Google Report (June 2013) – Moura 2013
• Group by Geoloca4on • Group by Netblock (/16, /24) • Group by ASN – (thanks, Team Cymru)!
Feature IntuiEon: IP Proximity
![Page 27: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/27.jpg)
Map of the Internet
• (Hilbert Curve) • Block port 22 • 2013-‐07-‐20
0
10
127
MULTICAST AND FRIENDS
CN
RU
CN, BR, TH
You are here!
![Page 28: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/28.jpg)
• Even bad neighborhoods renovate: – ARackers may change ISPs/proxies – Botnets may be shut down / relocate – A liRle paranoia is Ok, but not EVERYONE is out to get you (at least not all at once)!
Feature IntuiEon: Temporal Decay
• As days pass, let's forget, bit by bit, who aRacked
• Last 4me I saw this actor, and how o`en did I see them!
![Page 29: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/29.jpg)
• Behavior: block on port 22
• Trial inference on 100k IP addresses per Class A subnet
• Logarithm scale: brightest 4les are 10 to 1000 4mes more likely to aRack.
MLSec Project
![Page 30: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/30.jpg)
• Who resolves to this IP address? • Number of domains that resolve to the IP address • Distribu4on of their life4me • Entropy, size, ccTLDs • Registrar informa4on
• Reverse DNS informa4on… • History of DNS registra4on… • (Thanks, DNSDB!)
Feature IntuiEon: DNS features
![Page 31: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/31.jpg)
• YAY! We have a bunch of numbers per IP address/domain! • How do you define what is malicious or not?
• “Advanced exper4se in both informa4on security and data science will be a necessary ingredient in enabling accurate discrimina4on between malicious and benign ac4vity. “
-‐ Anton Chuvakin, Gartner
• Kinda easy for security tools (if you trust them) • Web applica4on logs need deeper sta4s4cal analysis • Not normal / standard devia4on thing
!
Training the Model
![Page 32: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/32.jpg)
• Programming is a must (Python / R) • Sta4s4cal knowledge keeps you from making dumb mistakes
• Specific machine learning courses and books: – Coursera (ML/ Data Analysis / Data Science)
• Prac4ce, Prac4ce, Prac4ce: – Explore your data! – (Security Onion) – Kaggle – KDD, VAST, VizSec!
How do I get started on this?
![Page 33: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/33.jpg)
MLSec Project
• Sign up, send logs, receive reports generated by machine learning models!
• Working with several companies on trying out these models on their environment with their data
• We are hiring (KINDA)
• Visit h]ps://www.mlsecproject.org , message @MLSecProject or just e-‐mail me.!
![Page 34: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/34.jpg)
• Inbound aRacks on exposed services (DEFCON/BH 2013): – Informa4on from inbound connec4ons on firewalls, IPS, WAFs – Feature extrac4on and supervised learning
• Malware Distribu4on and Botnets: – Informa4on from outbound connec4ons on firewalls, DNS and Web Proxy
– Ini4al labeling provided by intelligence feeds and AV/an4-‐malware – Semi-‐supervised learning involved
• Kill-‐chain Ensemble Models: – Increased precision by composing different behaviors – Web server path -‐> go through Firewall, then IPS, then WAF – Early confirma4on of aRack failure or success
MLSec Project -‐ Current Research
![Page 35: Applying Machine Learning to Network Security Monitoring - BayThreat 2013](https://reader033.vdocument.in/reader033/viewer/2022052900/555bd66fd8b42adf478b5224/html5/thumbnails/35.jpg)
Thanks! • Q&A? • Feedback?
Alexandre Pinto @alexcpsec
@MLSecProject hRps://www.mlsecproject.org/
" Essen4ally, all models are wrong, but some are useful." -‐ George E. P. Box