defending sybil attack in peer2peer networks md. tanvir al amin 04 09 05 2064 shah md. rifat ahsan...
TRANSCRIPT
Defending Sybil Attack in Peer2Peer Networks
Md. Tanvir Al Amin 04 09 05 2064Shah Md. Rifat Ahsan 10 09 05 2060
Adviser : Dr. Reaz Ahmed
Distributed Search Techniques
2
Sybil Attack A fundamental problem in
distributed systems.
Single user assumes many fake/sybil identities Already observed in real-world
p2p systems
Sybil identities can become a large fraction of all identities “Out-vote” honest users in
collaborative tasks
launchsybilattack
honest
malicious
Sybil attack
Present in both Application level and P2P Networking Attacker creates many fake/sybil identities Many cases of real world attacks : Digg, Youtube Several research works shown how easy it was to
subvert DHT like Chord or Kademlia using Sybil Attack
Automated sybil attack on Youtube for $147!Automated sybil attack on Youtube for $147!
Defending against Sybil attacks
Traditional solutions rely on central trusted authorities Runs counter to open membership policies of OSNs
Recent proposals leverage social networks Lots of research activity recently Each optimized under assumptions about the graph
structure Each evaluated on different datasets
SybilGuard [SIGCOMM’06]SybilLimit [Oakland’08]Ostra [NSDI’08]SumUp [NSDI’09]SybilInfer [NDSS’09]Whanau [NSDI’10]MobID [INFOCOM’10]
All schemes analyze the graph structure to isolate Sybils
Defending against Sybil attacks Recent proposals leverage social networks
Key Insight: Social links are hard to acquire in abundance Look for small cuts in the graph Conversely, look for communities around known trusted nodes Dunbar’s Number Power law node degrees
Links difficult to createLinks difficult to create
HOW DO SOCIAL NETWORKS LOOK LIKE
SybilGuard: Defending Against Sybil Attacksvia Social NetworksSybilguard is a system for detecting Sybil nodes in social graphs.
Features of Sybil Guard SybilGuard enables an honest node to identify other nodes Verifier node V can verify if suspect node S is malicious Guaranteed bound on number of sybil groups Guaranteed bound on size of sybil groups Completely decentralizeKey Insight: 1. Use a social network to limit Sybils
2.Social links are hard to acquire in abundance3.Look for small cuts in the graph
DBLP Network
Dunbar’s number Limits the # of stable social relationships a user can have
To less than a couple of hundred Linked to size of neo-cortex region of the brain Observed throughout history since hunter-gatherer societies
Roughly reported to be 150
Also observed repeatedly in studies of OSN user activity Users might have a large number of contacts But, regularly interact with less than a couple of hundred of them
Power-law node degrees
9U.S. highways U.S. Airlines
10
Path lengths and diameter all major networks have short path length from 4.25
– 5.88 six degrees of separation
Facebook, 4.2 million for Octorber 2007, 6.12 fromhttp://blog.paulwalk.net/2007/10/08/no-degrees-of-separation/
11
Implications of Path lengths and diameter
The small diameter and path lengths of social networks are likely to impact the design of techniques for finding paths in such networks
12
Link degree correlations high-degree nodes tend to connect to other high-degree nodes ? OR high-degree nodes tend to connect to low-degree nodes ? In real society: the former theory is true. By virtue of two metrics: the scale-free metric and the assortativity. Suggests that there exists a tightly-connected “core” of the
high-degree nodes which connect to each other, with the lower-degree nodes on the fringes of the network.
The next question: How big the core is
13
Implications of Link degree correlationsSpread of Information
“A Measurement-driven Analysis of Information Propagation in the Flickr Social Network” [WWW’ 09]
14
Densely connected core the graphs have a densely connected core comprising of
between 1% and 10% of the highest degree nodes such that removing this core completely disconnects the graph.
Sub logarithmic growth
15
Densely connected core the graphs have a densely connected core comprising of
between 1% and 10% of the highest degree nodes such that removing this core completely disconnects the graph.
Sub logarithmic growth
Implications of densely connected core
Network contains dense core of users Core necessary for connectivity of 90% of users Most short paths pass through core Could be used for quickly disseminating information
So 10% at core What about remaining nodes (90% at fringe)
16
What does the structure look like
the networks contain a densely connected core of high-degree nodes;
and that this core links small groups of strongly clustered, low-degree nodes at the fringes of the network.octopus
Mixing time
Random walk: choose each hop randomly Mixing time: #hops until uniform probability Fast mixing network: mixing time = O(log n)
Sampling by random walks
A random walk has o(1) chance of escaping* True when g bounded by o(n/log n) Of r walks, (1-o(1))r = Ω(r) end nodes are good! Can’t distinguish good from bad nodes in set
Honest region
Sybil region
escapingpaths
non-escaping path
Creating Social Link Is Hard
Social links maintained over Internet
Social network
…
Sybil region
Social networkHonest region
…
Attack edges
A malicious user fools an honest userCreates an attack edge
Sybil resilience & group attachment theory
Sybil schemes find bond groups around a trusted node But, these are only a fraction of all honest nodes Bond groups are hard for Sybils to infiltrate Not the case with identity groups
SYBILGUARDYu, Kaminsky, Gibbons, Flaxman, Sigcomm 2006
Problem Formulation and Objective Social network
» n honest human users» 1+ malicious users : multiple sybil identities» SybilGuard enables an honest node to identify other
nodes» Verifier node V can verify if suspect node S is malicious
SybilGuard Guaranteed bound on number of sybil groups
» Divides n nodes into m equivalence classes» A group is sybil if it contains 1+ sybil nodes
Guaranteed bound on size of sybil groups» In a group, at most w sybil nodes
Completely decentralized» An honest node accepts honest nodes with high probability» Rejects malicious nodes with high probability» Accepts bounded number of sybil nodes
Random Routes Foundation of SybilGuard: different from random walk Random route begins at a random edge of a node At every node
» For an incoming edge i, there is a unique outgoing edge j» Thus, input to output is one-to-one mapped
A node A with d neighbors uniformly randomly chooses a permutation “x1,x2, . . . ,xd” among all permutations of 1,2, . . . ,d.
If a random route comes from the ith edge, A uses edge xi as the next hop.
SybilGuard Algorithm Attack Model
n honest users: One identity/node each Malicious users: Multiple identities each (sybil nodes)
node A: verify node B A computes d random routes (length w) B computes d random routes (length w) If d/2 random routes intersects, accept S Else reject S
If few attack edges, then a sybil node’s random route is less likely to reach honest region
And vice-versa
Main Assumptions of SybilGuard
Honest Nodes
Sybil Nodes
Attackedges
Properties of Random Routes Convergence
» Once two routes merge, they will remain merged Routes are back-traceable There can be only one route with length w that
traverses e along the given direction at its ith hop If two random routes ever share an edge in the same
direction, then one of them must start in the middle of the other
Cycles can exist, but with low probability» Prob. (diameter k cycle) = 1/d(k-2)
Sybilguard Algorithm
32
Step 1:Bootstrap the network.
All users exchange signed keys.
Key exchange implies that both parties are human and trustworthy.
Steps: 2
Choose a verifier (A) and a suspect (B).
A and B send out random walks of a certain length (2).
Look for intersections.
A knows B is not a Sybil because multiple paths intersect and they do so at different nodes.
A
B
SybilGuard Algorithm, cont.
33
A
B
SybilGuard Caveats Bootstrapping requires human interaction. Assumes short random walks lie mostly in the honest
regionResults in poor threshold to colluding attackers. In a million node network ,each attack edge
accepts nearly 2000 sybil nodes. In million node network , SybilGuard cannot
bound the number of sybils at all if there are > 15,000 attack edges .
SybilLimitA Near-Optimal Social Network Defense Against Sybil Attacks
SybilLimitA Near-Optimal Social Network Defense Against Sybil Attacks
Motivation : To mitigate the problems of SybilGuard. Basic insight : Social network (same as SybilGuard) SybilLimit Novelity :
1. use many random routes but shorter ones.2. intersect edges not nodes3. limit how often each edge is used.
Identity Registration Each node (honest or sybil) has a locally generated public/private key pair
“Identity”: V accepts S means V accepts S’s public key KS
NO assumption/need PKI
Every suspect S “registers” KS on some other nodes
Registration Goals Ensure that sybil nodes (collectively)
register only on limited number of honest nodes
Still provide enough “registration opportunities” for honest nodes
sybil regionhonest region
K: registered keys of sybil nodes
K K
K
KK
K
K K
K
K
K
K
K
KK K
K: registered keys of honest nodes
Acceptance Criteria
Accept S only if KS is register on
sufficiently many honest nodes K K
K
KK
K
K K
K
K
K
K
K
KK K
sybil regionhonest region
K: registered keys of sybil nodes
K: registered keys of honest nodes
)(logn Take random “walks” of w= hops Honest nodes: likely to remain in honest region* Sybil nodes: must cross an attack edge to reach honest region
Key Idea
sybil regionhonest region
K K
K
KK
K
K K
K
K
K
K
K
KK K
• Register key at last hop of “walk”
4. Is KS registered?
Verification Procedure
VS
1. request S’s set of tails AB
CDEF
F
2. I have three tails
AB; CD; EF
3.common tail: EF
5. Yes. 4 messages involved
V accepts S Tails intersect + key registered
Attack edges SybilGuard SybilLimit
nng log/
nnOg log/ )log( nng )log( ng
)log( ng unbounded
Sybil nodes accepted
unbounded unbounded
nn log/ nnO log/
g between
and
g
SybilInfer: How to Win the Zombie Wars!
Prateek Mittal, George Danezis (MSRC Intern) (MSR Cambridge)
SybilInfer Work from UIUC and Microsoft Research A centralized algorithm Uses the fast mixing properties of social network to
design a Bayesian Classifier Classify nodes
Formal Model
Assign probabilities of cuts being honest
Using Bayes Theorem, we have that :
Next Challenge: Model
Z
HonestXPHonestXTPTHonestXP
)()|()|(
)|( THonestXP
VXHonestXPHonestXTPZ )()|(
)|( HonestXTP
Formal Model
XX XXXX
XXXX
XXXX
XX probprobprobprob NNNNhonestXTP )|(
XXprobXXprob
XXprob
XXprob
XXXX EV
prob ||
1
XXXX EV
prob ||
1
X
X
SYBIL PROOF DHT
Distributed Hash Table
Interface: PUT(key, value), GET(key)→value Route to peer responsible for key
GET( sip://alice@foo )
PUT( sip://alice@foo, 18.26.4.9 )
DHTs are subject to the Sybil attack
Attacker creates many pseudonyms
Disrupts routing or stabilization
s
t
{IDt}
The Sybil attack on open DHTs
Brute-force attack Clustering attack
Sybil Proof DHT How to build a sybil resilient DHT ?
Works from MIT PDOS Group Parallel and Distributed Operating Systems Quest to build Sybil Proof DHT
Sybil-resistant DHT routing 2005 A Sybil proof One hop DHT SocialNets 2008 Whanau NSDI 2010
A Sybil proof one hop DHT Motivation: SybilGuard/SybilLimit:Not a DHT, but a “general” Sybil defense Honest node accepts at most O(g log n) SybilsFeatures : DHTs are subject to the Sybil attack Social networks provide useful information Created a Sybil-resistant one-hop DHT
Resistant to g = o(n/log n) attack edges Table sizes and routing BW O(√n log n) Uses O(1) messages to route
Basic one-hop DHT design Construct finger table by r random walks Route to t by asking all fingers about t
If r = Ω(√n log n), some finger knows t WHP Adversary cannot interfere with routing
s
t
r
r
{know t?} {know t?}{t?} {t?} {t?} {t?} {t?}
{t’s IP address}
{forwarded message from s}
Properties of this solution Finger table size: r = O( ) Bandwidth to construct: O(r log n) bits Bandwidth to query: O(r) messages Probability of failure: 1/poly(n)
nn log
WHĀNAU: A SYBIL-PROOF DISTRIBUTED HASH TABLE
Chris Lesniewski-Laas M. Frans Kaashoek NSDI 2010
Contribution
Whānau: an efficient Sybil-proof DHT protocol GET cost: O(1) messages, one RTT latency Cost to build routing tables: O(√N log N)
storage/bandwidth per node (for N keys) Oblivious to number of Sybils!
Proof of correctness PlanetLab implementation Large-scale simulations vs. powerful attack
Sybil regio
n
Social networkHonest region
…
Attack edges
Random walksc.f. SybilLimit [Yu et al 2008]
Building tables using random walksc.f. SybilLimit [Yu et al 2008]
What have we accomplished?• Small fraction (e.g. < 50%) of
bad nodes in routing tables• Bad fraction is independent
of number of Sybil nodes
SETUP LOOKUP
Social Network
Routing Tables
key
value
PUT(key, value)PUT Queue
Routing table structure O(√n) fingers and O(√n) keys stored per node Fingers have random IDs, cover all keys WHP Lookup: query closest finger to target key
Finger tables: (ID, address)
Key tables: (key,value)
Keynes
AardvarkZyzzyva
Kelvin
From social network to routing tables
Finger table: randomly sample O(√n) nodes Most samples are honest
Honest nodes pick IDs uniformly
Plenty of fingers near key
Sybil ID clustering attack
[Hypothetical scenario: 50% Sybil IDs, 50% honest IDs]
Many bad fingers near key
Honest layered IDs mimic Sybil IDsLayer 0 Layer 1
Every range is balanced in some layerLayer 0 Layer 1
Two layers is not quite enoughLayer 0 Layer 1
Ratio =1 honest :10 Sybils
Ratio =10 honest :100 Sybils
Log n parallel layers is enough
log n layered IDs for each node Lookup steps:
1. Pick a random layer2. Pick a finger to query3. GOTO 1 until success or timeout
Layer 0 Layer 1 Layer 2 Layer L
…
From Social relations to Routing Tables
SETUP LOOKUP
Social Network Routing Tables
key
value
PUT(key, value)
PUT Queue
Problems Whanau’s goal is to create a Sybil proof DHT Which ensures delivery Whanau uses the idea of random walk in fast mixing
graphs Whanau has changed the basic structure of DHT Tables contain O(√n log n) entries !! The DHT has become a one hop DHT But O(√n) entries are insane !! Think of a DHT with 100000000 users
How to handle churn ??
OUR IDEA OF A SYBIL PROOF DHT
Our Idea We are given a social graph Each node knows about their friends in the social
graph Same assumptions about SybilGuard or Whanau
Fast mixing graphs Small cut around attack edges o(n/log n) attack edges at most
Our Motivation for DHT Isn’t it possible to keep the basic routing features of
a DHT while making it sybil resilient? O(log n) table size Lookup should take O(log n) We should use social information to build the DHT
Bootstrapping the DHT Here comes the fundamental question
How to convert a given social graph into a DHT So that the socially connected nodes are near Socially far nodes are far in the DHT Sybil nodes require significant amount of social
engineering to be strongly connected members of a social group
A new type of DHT We want to build a DHT
Where distance between two nodes in the DHT-Space is related to their social-distance
i.e, two friends in the social graph are expected to be one-hop distant in the DHT-Space
Most of the queries will be through friends Hence, the probability of reaching a Sybil node is less
We use the idea of Plexus A novel DHT routing based on linear block codes Plexus: A Scalable Peer-to-Peer Protocol Enabling Efficient Subset
Search : Reaz Ahmed and Raouf Boutaba ACM/ IEEE TON Feb 2009
85/15
Advertisement, P Query, Q
advSet(P) C qSet(Q) C
PadvSetQqSetPQ
Plexus: Index ClusteringC = set of cluster heads
Linear code, C <n,k,d>
Cluster head Codeword
Generator matrix based routing
knkk
n
n
k ggg
ggg
ggg
g
g
g
G
21
22221
11211
2
1
0111000
1010100
1100010
1110001
0
15
23
47
E
G
<7, 4, 3> Hamming code
Cluster
Pattern
Cluster head
86/15
Linear Binary Code C = <n, k, d> linear binary code
n: number of bits in a codeword k: dimension 2k codeword in code d: minimum distance between any pair of codeword e.g., G24=24, 12, 8
Generator Matrix G,
knkk
n
n
k ggg
ggg
ggg
g
g
g
G
21
22221
11211
2
1
2k codeword can be formed by applying XOR to any combination of these k codewords.
87/15
Plexus: Routing Table In a complete network each peer is responsible for a
codeword Peer with codeword X maintains links to k+1 peers
with IDs computed as: Xi = X gi 1 i k
Xk+1 = X g1 g2 … gk
Xk+1 is used for: Replication Reducing routing cost
88/15
Plexus: Routing Observation: C is closed under operation
tiii gggXYCYX 21
,
532 gggXY
Example: Route from X to Y where,
X Y
X2g2
X3
X5
g3
g5
X23
g3g5
X25
g3
g5
X35 g2g3
g5
g2
g2
X2 X23 X235=YX1=Xg1
X2=Xg2…
Xk=Xgk
X21=X2g1…
X23=X2g3…
X2k=X2gk
X231=X23g1
…X235=X23g5
…X23k=X23gk
kk gggXX 211
89/15
Hamming distance based clustering & indexing Maximum routing hops (within a subnet)
½ K in normal condition
½ K +2 in presence of failure.
Disjoint routing paths Source X destination Y
XY is disjoint from XYK+1
Alternate routing paths Suitable for Multicasting
Improved fault resilience
Improved load balancing
Strengths of Plexus Routing
X
Y
YK+1
Rep
lica
Y’ Y
YK+1Y’K+1X
Social Network to Plexus Now, the problem reduces to assigning appropriate
linear block codes to the nodes How to do that ?
Naïve Idea All nodes u know their friends F1(u). All nodes u send
F1(u) to all of their friends. At this point, Every node u, in addition to F1(u), can
calculate its "mutual friend list" for each of its friends. For any two friends u, v : Their mutual friend set is
Every node u, can also calculate F2(u), its exact two-hop distant friend list.
)()(),( 111 vFuFvuM
}{)())(()( 1112 uuFuFFuF
Naïve Idea Each node u, sorts their friends according to an
"influence metric.“ For each friend v of a node u, Influence(u,v) = Influence of v on u = I (u, v)
it is highly probable that a sybil node will have very low influence on an honest node via attack edge due to very small number of mutual friends. However not only sybils, but also a common friend of two
groups will have low influence on both group (however, this case is not handled in any algorithms)
|)(|
|),(|),(
1
1
uF
vuMvuI
Naïve Idea Each node u, calculates I(u,v) and I(v,u) for all its
friends. There are 2*deg(u) such quantities. C(u) = Those nodes for which u has more influence on v
than v has on u P(u) = Those nodes for which v has more influence on u
than u has on v. and R(u) = Those nodes for which u and v both has same
influence on each other
)},(),(),(:{)(
)},(),(),(:{)(
)},(),(),(:{)(
1
1
1
uvIvuIuFvvuR
uvIvuIuFvvuP
vuIuvIuFvvuC
Naïve Idea max{ C(u) } = x = The friend, on which u is maximum
influential. However, it doesn’t mean x doesn’t have a friend more
influential than u. It means, u does not have a friend on which it has more influence than it has on x.
max { P(u) } = y = The friend which has the highest influence on u. It also doesn’t mean y doesn’t have friends on which it has
more influence than it has on u. max { R(u) } = z = The friends which has same influence
on u as u has on them.
Naïve Idea lx = I(x,u) ; Iy = I(u,y) , Iz = I(u,z) = I(z,u)
MI = { Ix, Iy, Iz}If Max { MI } = Ix : u is an “influencial” nodeIy : u is an “influenced” nodeIz : u is an “neutral” node
Naïve Idea Action D : If u is “influenceD”, it decides not to
generate any ID, and decides to take command from y. It sends a message to y that it has come into his control.
Action L : If u is an influentiaL node, it decides to generate ID for u and some of F1(u) and F2(u)
Action N : If u is Neutral, then decides Action L or Action D by a uniform bernoulli trial.
Now, u generates ID for itself, for those of Gang(u). It will try to keep friend IDs as close as possible, also
those of Gang(u) which are friends themselves will get close ID as possible.
u will inform all of Gang(u) all the ids generated by it. Members of Gang(u) will take care of id generation
of their neighbors
But how to handle collision ? Some gossip protocols needed !!
Naïve Idea Thus Id’s will be assigned in the code space according
to their “Social Groups”