Blacklisting and Blocking Sources of Malicious Traffic Sources of Malicious Traffic
Athina MarkopoulouUni sit f C lif ni I inUniversity of California, Irvine
Joint work with Fabio Soldo, Anh Le @ UC Irvine Jo nt work w th Fab o Soldo, Anh Le @ UC Irv ne and Katerina Argyraki @ EPFL
1
OutlineOutline
MotivationMot vat onMalicious Internet Traffic: Attack and Defense
Two Defense Mechanisms Proactive: Predictive Blacklisting
d F lReactive: Source-Based Filtering
C l siConclusion
2
Malicious Traffic on the Internet
Compromising systems
Malicious Traffic on the Internet
p g yscanning, worms, website attacksphishing, social engineering attacks....
Launching attacksspamclick fraudclick-fraudDenial-of-Service attacks…
B t tBotnetslarge groups of compromised hosts, remotely controlled
3
The solution requires many components
Monitoring and detection of malicious activity
The solution requires many components
Monitoring and detection of malicious activity– in the network and/or at hosts– signature-based, behavioral analysis
Mitigation – at the hosts: remove malicious code– in the network: block, rate-limit, scrub malicious traffic
Internet architecture Internet architecture
4
Defense at the edge of the networkDefense at the edge of the network
N k 1 Network 2Network 1 Network 2
Logging IDS Firewall Logging IDS Firewallrouter router
L i IDS Fi ll
Network 3 Network 4
Logging IDS FirewallLogging IDS Firewall
Our focus is on (1) blacklisting and (2) blocking malicious traffic5
Dshield Dataset
6 months of IDS+firewall logs from Dshield.org (May-Oct 2008):
Dshield Dataset
6 months of IDS firewall logs from Dshield.org (May Oct 2008)~600 contributing networks, 60M+ source IPs, 400M+ logs
Contributing network
Dshield.org
Logs Time Victim ID Src IP Dst IP Src Port Dst Port Protocol FlagsLogs Time Victim ID(contributor)
Src IP Dst IP Src Port Dst Port Protocol Flags
P h f d d l d b hPros: huge amount of data, diverse sample, used by many researchersCons: no detailed information on alerts, may include errors
6
OutlineOutline
BackgroundBackgroundMalicious Internet Traffic: Attack and Defense
Two Defenses Mechanisms Proactive: Predictive Blacklisting
d F lReactive: Source-Based Filtering
C l siConclusion
7
Predictive Blacklisting
Problem definition:
Predictive Blacklisting
Problem definition: – Given past logs of malicious activity collected at various
locationsP di t lik l t d li i t ffi t h i ti – Predict sources likely to send malicious traffic to each victim network in the future.
Blacklist: – list of “worst” (e.g. top-100) attack sources
Prediction vs DetectionPrediction vs. Detection
8
Data analysisSuperposition of several behaviors
Data analysiser
tsm
ber
of a
l
D
Nu
Source (“Attacker”) IPDay
9
A multi-level prediction model
Different predictors capture different patterns in
A multi-level prediction model
Different predictors capture different patterns in the dataset: – Model temporal dynamics
M d l i l l i b i i / k– Model spatial correlation between victims/attackers
Combine different predictorsComb ne d fferent pred ctors
Formulate as a Recommendation Systems problem– in particular collaborative filtering
10
Recommender systems: example
Netflix: you rate movies and you get suggestions
Recommender systems: example
11
Formulating Predictive Blacklisting
Recommendation System Predictive Blacklistingas a Recommendation System (CF)
3 2 ? ? 13 4 ?
AttackersUsers
3 2 ? ?
1 ? ? 4
- 13 4 ?
? - 3 ?- ? ? 1
? - 12 1- 7 ? 1
? ? ? ?ims
ms
6 3 1 9
? ? 2 ?
? 11 - 2
3 8 ? -
? - 12 1
4 ? - 273 - ? 9
1
? ? ? ?
? ? ? ?Vi
cti
Item
? ? 2 ?
R = Rating Matrix
8 ?2 ? 6 -? 21 - ?
11 2 ? -? ? ? ?
? ? ? ?User Attack? ? ? ?Userrating
Attackvolume
Goal: predict rating matrix: ra,v(t) 12
Predictor I: (attacker, victim) pairT l d iTemporal dynamics
)(, trTSva
Data analysis: attacks from the same source within short time
13
Predictor I: (a v) time seriesPredictor I: (a, v) time series)(, trTS
va
Data analysis: repeated attacks within short time periodsPrediction:
– Use EWMA model to capture this temporal trendp p– Accounts for the short memory of attack sources.– Computationally efficient– Includes as special case t=1
Past activityat time t’ ≤ t
Predicted activity
14
Predictor II: similar victims
Data analysis: victims share common attackers.
spatial correlation
– [Katti et al, IMC 2005], [Zhang et al, Usenix Security 2008]
C Our approach:
Common attackers
Victims
15
Predictor II: similar victimsdefining similarity
• Similarity of victims u,v captures:y p– the number of common attackers– and when they are attacked
C Our approach:
1 1 0 0v1a1 a2 a3 a4
Common attackers
1 1 0 0
1 1 0 0
1 1 1 0
v2v3
victims
0 0 1 1v416
Predictor II: similar victimsk-nearest neighbors (kNN)
)(, tr KNNva
Traditional kNN: “trust” your peers– Identify k most similar victims (“neighbors”) + predict your rating based on theirs
N h ll d i i iNew challenges due to time varying ratings
Sum over the
Our approach: Sum over the
neighborhood of v
Time series forecastgiven past logs
Predicted activity
given past logs
Similarity between ytime-varying vectors
17
Predictor III: Attackers-Victims l
Data analysis:
Co-clustering
– group of attackers consistently target the same group of victims.– this behavior often persists over time
We used the Cross-Association (CA) method to automatically identify dense clusters of victims-attackers.
18
Predictor III: Attackers-Victims P d Prediction
)(, tr CAEWMAva
−
Intuition:– pairs (a,v) in dense clusters are more likely to occur– use the density of the cluster, as the predictor
, where
EWMA-CA: further weight by persistence over time
19
A multi-level prediction modelpSummary
Different predictors capture different patterns: – Temporal trends
EWMA TS of (attacker victim) • EWMA TS of (attacker,victim) – Neighborhood models:
• KNN: Similarity of victims• EWMA CA: Interaction of attackers-victims
Combine different predictorsCombine different predictors
20
Combining different predictors
W i ht d A
Combining different predictors
Weighted Average – with weights proportional to the accuracy of each predictor on a pair (a,v).
21
Performance AnalysisB li Bl kli i T h iBaseline Blacklisting Techniques
• Local Worst Offender List (LWOL)• Local Worst Offender List (LWOL)– Most prolific local attackers– Reactive but not proactive
• Global Worst Offender List (GWOL)• Global Worst Offender List (GWOL)– Most prolific global attackers– Might contain irrelevant attackers– Non prolific attackers are elusive to GWOL
• Collaborative Blacklisting (HPB)– [J. Zhang, P. Porras, J. Ullrich, “Highly Predictive Blacklisting”, USENIX Security 2008]– Also implemented and offered as a service (HPB) by Dshield.org– Methodology: Use link-analysis on the victims similarity graph to predict future attacks
22
Performance Analysis
60 d f D hi ld l 5 d t i i 1 d t ti BL l th 1000
total hit countPerformance Analysis
60 days of Dshield logs, 5 days training, 1 day testing, BL length=1000, The combined method
– significantly improves the hit count (up to 70%, 57% on avg)– exhibits less variation over timeexhibits less variation over time
Combined method
HPBHPB
GWOL
23
Predicting Attacksh i h b d ?what is the best we can do?
Training, day t1 Test, day t2
12 - 1 33 5 - - 3 5 - 17 4 - -viLocalUB(vi)=3
Local Upper Bound: #IPs in training & test window of a particular contributor
2 - 1 1 - - -
12 - 1 33 5 - -
- - 7 - 3 29 6
- 1 - - 5 - -
3 5 - 17 4 - -
1 2 - 1 5 31 4
- - - - 2 - - 1 - - 2 4 - -
x - x x x x x x x - x x x x GlobalUB=5
Global Upper Bound: # IPs in training window of any contributor
24
Global Upper Bound: # IPs in training window of any contributor
Predicting AttacksPredicting Attacksroom for improvement
Collaboration helps!
Large gap from prior methodsOur method (|BL|=1000)
25
Performance Analysis
Robustness achieved by diverse methods
yrobustness to random errors
E.g. an attacker may send traffic to a single victim (detected by temporal) or to several victims (detected by spatial behavior); or he can limit his attack activity
26
Predictive Blacklisting as a RS System
b
Summary Predictive Blacklisting as a RS System
Contributions– Combined predictors that capture different patterns in the data– Significant improvement with simple techniques
• still room for further improvement• still room for further improvement– New formulation as a recommenders system (collaborative filtering) problem
• paves the way to powerful techniques: • e.g., capture global structure (latent factors), joint spatio-temporal models
References– F.Soldo, A.Le, A.Markopoulou, "Predictive Blacklisting as an Implicit
Recommendation system“, IEEE INFOCOM 2010 and in arXiV.org– In the news: MIT Technology Review, Slashdot, ACM TechNews
27
How to use a list of malicious sources?How to use a list of malicious sources?
• A policy decision:– E.g. scrub, give lower priority, block, monitor, do nothing …
• One option is to block (filter) malicious sources– when: during flooding attacks by million-node botnets– where: at firewalls or at the routers
28
OutlineOutline
BackgroundBackgroundMalicious Internet Traffic: Attack and Defense
Two Defenses Mechanisms Proactive: Predictive Blacklisting
l d F lReactive: Optimal Source-Based Filtering
C l siConclusion
29
Filtering at the routersFiltering at the routers
• Access Control Lists (ACLs)( )– Match a packet header against rules, e.g. source and
destination IP addresses– Source-based filter: ACL that denies access to a source Source based filter: ACL that denies access to a source
IP/prefix
l l • Filters implemented in TCAM– Can keep up with high speeds– Limited resource Limited resource
• There are less filters than attack sources
30
Filter Selection at a Single Routerd ff b f fil ll l dtradeoff: number of filters vs. collateral damage
cattackers
Filter an attack source A.B.C.D
. . . . . . . . .c
c cc cc
legitimate users
Filter a prefix A.B.C.*
ISP
edge routerC
edge router
V 31
Optimal Source-Based FilteringOptimal Source Based Filtering
Design a family of filter selection algorithms that:t k i t• take as input:
– a blacklist of malicious (bad) sources – a whitelist of legitimate (good) sources– a constraint on the number of filters Fmax– a constraint on the number of filters Fmax– a constraint on the access bandwidth C– the operator’s policy
• optimally select which source IP prefixes to filteroptimally select which source IP prefixes to filter– so as to optimize the operator’s objective – subject to the constraints
A B C *
0 2^32-1 A.B.C.D
A.B.C.
so far, heuristically done (through ACLs or rate limiters) 32
Optimal Source-Based Filtering p gA General Framework
[l,r]: range in the IP spaceg pp/l: prefix p of length lF max: number of filters (<<N)
: whether we block range [l r] or not: whether we block range [l,r] or not: weight assigned to source IP address, i.
: cost of blocking a range [l,r]
33
Optimal Source-Based Filtering E i O ’ P li Expressing Operator’s Policy
• Assignment of weights Wi is the operator’s knob:– indicates volume of traffic sent, or importance assigned by the operator– Wi>0 (good source i), Wi<0 (bad source i ), Wi=0 (indifferent)
• Objective function
=
=
cost of good sources in range [l,r]
cost of bad sources in range [l r]cost of bad sources in range [l,r]
34
Filter Selection AlgorithmsP bl O iProblem Overview
• RANGE-based: filter IP or range [l,r]g[Soldo, El Defrawy, Markopoulou, Van De Merwe, Krishnamurthy: ITA’09]
– FILTER-ALL-RANGE– FILTER-SOME-RANGE
FILTER ALL DYNAMIC RANGE– FILTER-ALL-DYNAMIC-RANGE
• PREFIX-based: filter IP source or prefix[Soldo, Markopoulou, Argyraki: INFOCOM’09, arXiv.org][Soldo, Markopoulou, Argyraki INFOCOM 09, arXiv.org]– FILTER-ALL: block all malicious sources– FILTER-SOME: block some malicious sources– FILTER-ALL-DYNAMIC: BL varies over time
FLOODING: b d idth st i t t ss t– FLOODING: bandwidth constraint at access router– DISTRIBUTED-FLOODING: filters at multiple routers
35
Filter Selection AlgorithmsAl ith O iAlgorithms Overview
• RANGE-based: filter IP or range [l,r]g[Soldo, El Defrawy, Markopoulou, Van De Merwe, Krishnamurthy: ITA’09]
– FILTER-ALL-RANGE– FILTER-SOME-RANGE
FILTER ALL DYNAMIC RANGE– FILTER-ALL-DYNAMIC-RANGE
• PREFIX-based: filter IP source or prefix[Soldo, Markopoulou, Argyraki: INFOCOM’09, arXiv.org][Soldo, Markopoulou, Argyraki INFOCOM 09, arXiv.org]– FILTER-ALL: O(N)– FILTER-SOME: O(N)– FILTER-ALL-DYNAMIC: O(N)
FLOODING: NP h d s d l i l l O(C2N) h isti– FLOODING: NP-hard, pseudo-polynomial alg. O(C2N) + heuristic– DISTRIBUTED-FLOODING: distributed solution
following a dynamic programming approachg y p g g pp
36
Longest Common Prefix Tree of a BLLongest Common Prefix Tree of a BL• LCP-Tree(BL) : binary tree, leaves are addresses in BL,
intermediate nodes are their longest common prefixesg p f• It can be found from the full binary tree of IP prefixes• E.g. for BL={10.0.0.2, 10.0.0.3, 10.0.0.7}, the LCP-Tree(BL) is:
10.0.0.2/31
10.0.0.0/29
3 bad, 5 good addresses10.0.0.2/31
10 0 0 2/32 10 0 0 3/32 10 0 0 7/32
0 good, 2 bad addresses
• Finding a set of filters:– no need to look for all possible sets of prefixes
10.0.0.2/32 10.0.0.3/32 10.0.0.7/32
no need to look for all possible sets of prefixes – sufficient to look only for prunings of the LCP tree– lends itself to a dynamic programming approach 37
Filter-All-PrefixP bl SProblem Statement
• Given: a blacklist BL, weight wi (for each good IP i), Fmax filters• choose: prefixes p/l (x /l)choose: prefixes p/l (xp/l)• so as to: filter all bad addresses and minimize collateral damage
38
Filter-All-PrefixD i P i Al i hDynamic Programming Algorithm
: cost of optimal allocation of F filters within a prefix pp p p
psL sRL sR
F-n ≥ 1,filters within left subtree
n ≥ 1,filters within right
subtree
39
n=1,1,…,F: means that we want to block all malicious sources (leaves)
Filter-All-PrefixP l h E l
Fmax = 4N = 10
DP Algorithm: Example
Fmax = 4
0/1
32/5
57/6 58/6
Filter-Some-Prefix
Fmax = 4N = 10Fmax = 4
32/5
57/6 58/63/6
N 10Filter-All-Prefix-Dynamic
Ti i Fmax = 4N = 10
Need to be
Time-varying case
(re)computed:O(Fmaxlog(N))
26
7
0 22
7 75
31 3710 15 17 22 32 33 57 583
6 6 0 2
42
FLOODINGP bl SProblem Statement
• Given: a blacklist BL, a whitelist WL, a weight of address = traffic volume generated weight of address = traffic volume generated, a constraint on the link capacity C, and Fmax filters
• choose: source IP prefixes, xp/l• so as to: minimize the collateral damage g
and fit the total traffic within the link capacity C
43
FLOODINGDP Al i hDP Algorithm
• FLOODING is NP-hard – reduction from knapsack with cardinality constraint (1.5K)
• An optimal pseudo-polynomial dynamic programming An optimal pseudo polynomial dynamic programming algorithm, solves the problem in: O((CFmax)2N)– similar to the previous DP but solve 2-dimensional KP
l– the LCP-Tree includes both good and bad addresses– DP extended to take into account the capacity constraint
• A heuristic, by adjusting the granularity (ΔC>1) of C44
Distributed Floodingfil l filters at several routers
attackers
• Deploy filters at several routers– increase total filter budget
E h ( ) h
. . . . . .
cc cc c
c
• Each router (u) has its own:– view of good/bad traffic– capacity in incoming link– filter budget
. . .
filter budget• Filtering at several routers:
– not only which prefix to block– but also on which router
• Solution:– can be solved in a distributed way
outperforms independent decisions Victim– outperforms independent decisions
45
Evaluation using Dshield dataFLOODING li i iFLOODING vs. rate limiting
• Attack sources, from a point of view of a single victim in Dshield• Good sources: [Kohler et al. TON’06, Barford et al. PAM’06]• Before attack: good traffic was C/10 < C• During attack: bad traffic is 10C g
CD
/N
46Optimal filter selection preserves the good traffic and drops the bad.
Intuition why optimization helpsy p pcompared to non-optimized filtering
• Malicious sources are clustered in the IP address spacep• Malicious sources are not co-located with legitimate sources
• Filtering can block IP prefixes with malicious sources, without penalizing (many) legitimate sources. 47
Evaluation using Dshield data (2)l lFILTER-ALL-PREFIX vs. generic clustering algorithms
• Malicious addresses:– attacking 2 specific victim networks (most and least clustered) in Dshield datasetg p ( )
• Good addresses generated:– using a multifractal [Kohler et al. TON’06, Barford et al. PAM’06]
48Optimal filter selection outperforms generic clustering
Evaluation using Dshield data (3)DISTRIBUTED FLOODING h l f di iDISTRIBUTED-FLOODING: the value of coordination
D/N
C
49Coordination among routers helps
Optimal Source-Based Filtering SSummary
F k f ti l filt l ti • Framework for optimal filter selection – defined various filtering problems – designed efficient algorithms to solve themg g
• Lead to significant improvements on real datasets– Compared to non-optimized filter selection , to generic
clustering, or to uncoordinated routers– because of clustering of malicious sources
50
OutlineOutline
BackgroundBackgroundMalicious Internet Traffic: Attack and Defenses
T D f M h Two Defenses Mechanisms Proactive: Blacklisting as a Recommendation SystemReactive: Filtering as an Optimization ProblemReactive: Filtering as an Optimization Problem
ConclusionConclusionParts of larger system that collects and analyzes data from multiple sensors and takes appropriate action
51