![Page 1: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/1.jpg)
Ranveer Chandra and Dina Katabi
Learning Communication Rules
Srikanth Kandula
![Page 2: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/2.jpg)
Network Admins. are Groping in the Dark
Focus on Traffic Volume• TCP=80%, HTTP=30%
• Adapt report categories (e.g., AutoFocus)– Much traffic from ports 500-600
But, What’s Going On?• Traffic follows plan?• Misconfigurations• Suspicious Traffic
(Active) user browsing web, reading/sending mail(Automatic) SMS scan on a network, outlook refresh
Besides focusing on volume, learn rules underlying the traffic Besides focusing on volume, learn rules underlying the traffic
![Page 3: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/3.jpg)
• Infer the actual behavior of applications– AFS root servers direct traffic to volume servers evenly– mail to the incoming MX, is forwarded onto group MXes
• Notice misconfigurations and badness– these clients shld not be talking on known command-control ports this server shld not be responding to DHCP requests
– this mail server shld not attempt connections to non-existent MXes
flowY flowXWhenever flowy happens, flowx is likely to occur
Rule
Rule
tX X X XY Y Y
If you could learn such rules directly from a trace,If you could learn such rules directly from a trace,
(http DNS)
![Page 4: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/4.jpg)
Report all significant rules with no specific knowledge about a trace
![Page 5: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/5.jpg)
Mining for Rules is Hard• How to define significance?
– When is a group of flows interesting enough to report?
• Avoid observer bias but cannot evaluate everything– Focus on one server, miss what you are not looking for
• Practical, deal with noise, search quickly
eXpose1. A scoring function for significance2. Heuristics that bias search toward high hit-rate3. Empirical validation on enterprise traces
eXpose1. A scoring function for significance2. Heuristics that bias search toward high hit-rate3. Empirical validation on enterprise traces
![Page 6: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/6.jpg)
Overview
• Packet trace to Activity Matrixo Rows are 1s windows; Columns are flows o Is flow active in [timei-1, timei )? (at least one packet)
• Association rule mining (X,Y are r.v. for columns)• Need not worry about interleaving• Dependencies are at these time-scales (an rtt, a server response)
PacketTrace
flow1 … flowK
time1
…
timeR
Activity Matrix
Rules
All windows in [.25s, 2s] range yield similar rulesAll windows in [.25s, 2s] range yield similar rules
![Page 7: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/7.jpg)
Which Rules are Significant?• High Joint Probability?
o X, Y may occur very often individually (e.g., breeze, sun shining)
• High Conditional Probability?o Say Y occurs only when X does, but both are rare (lottery, buy a jet)
X Y
![Page 8: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/8.jpg)
* Measures fraction of change in Y due to X
• High Joint Probability?• High Conditional Probability?• We use mutual information (combines the two)
( ) ( ) ( )( ) ( ) ( )
( )YP
XYPYXP
YP
XYPXYPYXScore
¬
¬¬+=→
|log
|log
* Trades off dependency & frequency
Score=0, if Y is independent of X
Score=Max, if Y is fully dependent on X
* Encodes Directionality KerberosReservation
Which Rules are Significant? X Y
![Page 9: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/9.jpg)
• Negative Correlation– Flows with little overlap
Y… X …
( ) ( ) ( )( ) ( ) ( )
( )YP
XYPYXP
YP
XYPXYPYXScore
¬
¬¬+=→
|log
|log
P(Y|X) 1 leads to high score
Modifying Scores for Networking
![Page 10: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/10.jpg)
• Negative Correlation– Flows with little overlap
• Long Running Flows– Large downloads, ssh/remote desktop– Trivial overlaps with long flow– Distinguish new vs. present– Present rules reported only if small mismatch in freq.
• Too Many Possibilities– Bias, focus on pairs with at least one common IP– Miss rules, but hit-rate up 1000x and costs down 10x
Y… …
Y… X …
( ) ( ) ( )( ) ( ) ( )
( )YP
XYPYXP
YP
XYPXYPYXScore
¬
¬¬+=→
|log
|log
X
P(Y|X) 1
Modifying Scores for Networking
![Page 11: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/11.jpg)
Generics- Miss, if no client accesses server often+ Rules that abstract away parts of a flow
Server
Database
Client : Server Server : Database
Reservation
Kerberos
Client : Server Server : Database *
Client : Rsrv. Client : Kerberos
Client : Rsrv. Client : Kerberos * *
(any client)
(any client, but same on both sides)
To do this automatically,• what to abstract? (IP addresses at non-server port)• which pairs to consider for rule?
– flows match IP, generics match abstracted IP
To do this automatically,• what to abstract? (IP addresses at non-server port)• which pairs to consider for rule?
– flows match IP, generics match abstracted IP
![Page 12: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/12.jpg)
Techniques extend to arbitrary sized rules
Instead,1. Focus on pair-wise rules (simpler is likelier)2. Group similar rules
– Eliminate weak rules between strongly connected groups– Transitive closure to read off clusters
Rule Mining
Mining for Rules
YX ⇒ YXXX n ⇒∧∧∧ K21O(f2) O(fn+1)
Rule ScoreRecursive Spectral Partitioning (VKV’00)
Digests 105—106 flows into 102—103 rule clustersDigests 105—106 flows into 102—103 rule clusters
![Page 13: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/13.jpg)
…flowi.new flowj.present
...
PacketTrace
flow1 … flowK
time1present |new
…
timeR
Activity Matrix Rules
Recap: eXpose Mines for Rules
Learn all significant rules without prior knowledgeo Scoring function for rule significanceo Avoids observer bias, yet stays feasible by focusing on high hit-rateo Algorithms to mine and prune
Rule Clusters
Contributions
![Page 14: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/14.jpg)
Related Work
Semi-Automated Discovery of App. Session Structure (KJPK’06)Sherlock (Diagnosing Performance Problems, BCGKMZ’07)Autofocus (ESV’03)BLINC (KPF’05)Stepping Stones (ZP’00)
Learn all significant rules without prior knowledgeo Avoids observer bias, yet stays feasible by focusing on high hit-rateo Scoring function for rule significanceo Algorithms to mine and prune
![Page 15: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/15.jpg)
Results
![Page 16: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/16.jpg)
Evaluation Setup
• Traces at access and internal server-facing links– Packet Headers, Connection Records (Bro), some anon.
• Operational n/w with 103 clients, diverse traffic mix• Corroborated on test-bed traffic & vetted by admins.• Ran eXpose on a 2.4GHz x86 with 8GB RAM
Inside MicrosoftBefore CSAIL’s ServersAccess Link of Conf. LANsCSAIL’s Access
![Page 17: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/17.jpg)
• Dependencies for Major Applications
Rules Discovered by eXpose
Client.* – Mail.135
Client.* – DC.88 Client.* – Mail.X
Client.* – PFS1.X Client.* – PFS2.X Client.* – Proxy.80
email @ microsoftemail @ microsoft
![Page 18: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/18.jpg)
Rules Discovered by eXpose• Dependencies for Major Applications
afs @ csailafs @ csail
C.7001 – Root.7003
C.7001 – *.*
C.7001 – AFS1.7000
C.7001 – AFS2.7000 AFS1.7000 – Root.7002
![Page 19: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/19.jpg)
Rules Discovered by eXpose• Dependencies for Major Applications
– web, e-mail, file-servers, IM, print, video broadcast
web @ microsoftweb @ microsoft
Proxy1.80 – *.*
Proxy2.80 – *.* Proxy3.80 – *.*
Proxy4.80 – *.*
![Page 20: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/20.jpg)
Rules Discovered by eXpose• Dependencies for Major Applications
– web, e-mail, file-servers, IM, print, video broadcast• Configuration Errors & Other Badness
Client.* – MailServer.25
Client.113 – MailServer.*
smtp + IDENT @ csailsmtp + IDENT @ csail
![Page 21: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/21.jpg)
• Dependencies for Major Applications– web, e-mail, file-servers, IM, print, video broadcast
• Configuration Errors & Other Badness– IDENT, Legacy emails, ssh scans, wingate
Rules Discovered by eXpose
Legacy email ids @ csailLegacy email ids @ csail
UnivMail.* – Old2.25
UnivMail.* – Old1.25
UnivMail.* – Old3.25
![Page 22: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/22.jpg)
Rules Discovered by eXpose• Dependencies for Major Applications
– web, e-mail, file-servers, IM, print, video broadcast• Configuration Errors & Other Badness
– IDENT, Legacy emails, ssh scans, wingate• Rules for stuff we didn’t know before
Nagios monitors @ csailNagios monitors @ csail
Nagios.7001 – AFS1.7000
Nagios.7001 – AFS2.7000
Nagios.* – Mail2.25
Nagios.* – Mail1.25
![Page 23: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/23.jpg)
Rules Discovered by eXpose• Dependencies for Major Applications
– web, e-mail, file-servers, IM, print, video broadcast• Configuration Errors & Other Badness
– IDENT, Legacy emails, ssh scans, wingate• Rules for stuff we didn’t know before
– Nagios, LLMNR, iTunesLink level multicast name resolution @ hotspotsLink level multicast name resolution @ hotspots
H.* – DNS.53
H.137 – Wins.137
H.* – Multicast.5355Black box: Little prior knowledge about servers, applications, or users Can evolve
Black box: Little prior knowledge about servers, applications, or users Can evolve
![Page 24: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/24.jpg)
Correctness & Completeness• False Positives
– 13% of rule-clusters in CSAIL trace, we couldn’t explain• False Negatives
– Main CSAIL Web Server (too many different activities)– Dependencies on Personal Web Pages (too few traffic)– PlanetLab Traffic (punted)
• Other Limitations– IPSec, Anonymized, Cover Traffic
• Extensions– Rules repeat over time, and across traces– Application whitelisting, Customize Generics
![Page 25: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/25.jpg)
Time to Mine for Rules
At CSAIL’s access link, high fan-out with many distinct flows
Stream Mining Appears Feasible!Stream Mining Appears Feasible!
# Flows (x 106)
.6
.2
.6
.9
2.8
![Page 26: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/26.jpg)
Packet TraceRules for frequently reoccurring flow sets
Learn all significant rules with no specific knowledgeo Avoids observer bias, but feasible by focusing on high hit-rateo Scoring function for rule significanceo Algorithms to mine and prune
Empirical validation on enterprise traces• found configurations & protocols that we didn’t know existed• learnt rules for actual behavior of applications• found config. errors, bot scans, infected machines
eXpose
http://research.microsoft.com/~srikanth
![Page 27: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/27.jpg)
Backup
![Page 28: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/28.jpg)
Rule Score (Modified JMeasure)
# of
Dis
cove
red
Rule
sExpanding Search Space (# of flows)…
… exposes few significant rules!… exposes few significant rules!
![Page 29: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/29.jpg)
Expanding Search Space (# of flows)…
# Top Active Flows # Top Active Flows
Tim
e to
Min
e Ru
les
(s)
Mem
ory
Foot
prin
t (m
illio
n ru
les)
… exposes few rules & costs a lot in time, memory… exposes few rules & costs a lot in time, memory
![Page 30: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/30.jpg)
Varying Size of Time Windows#
of D
isco
vere
d Ru
les
Rule Score (Modified JMeasure)
All window sizes in [.25s, 2s] produce similar rules!All window sizes in [.25s, 2s] produce similar rules!
![Page 31: Ranveer Chandra and Dina Katabi Learning Communication Rules Srikanth Kandula](https://reader034.vdocument.in/reader034/viewer/2022051819/55147530550346414e8b62cd/html5/thumbnails/31.jpg)
For all rules X Y
Prob. (X)Prob. (Y)
Join
t Pro
babi
lity