understanding kazaa
DESCRIPTION
Understanding KaZaA. Jian Liang Rakesh Kumar Keith Ross Polytechnic University Brooklyn, N.Y. KaZaA/FastTrack Operation. Top file sharing system 3 million active nodes four clients: KaZaA, KaZaA-lite, Grokster and iMesh Good availability and scalability - PowerPoint PPT PresentationTRANSCRIPT
Understanding KaZaA
Jian LiangRakesh Kumar
Keith Ross
Polytechnic UniversityBrooklyn, N.Y.
KaZaA/FastTrack Operation
• Top file sharing system– 3 million active nodes – four clients: KaZaA, KaZaA-lite,
Grokster and iMesh• Good availability and scalability• Proprietary protocol; signaling
traffic encrypted– in contrast with Gnutella and e-mule
Purpose of Measurement Study
• Try to understand highly successful file-sharing system– Overlay topology and dynamics– Peer selection – Index management
• Utilize the KaZaA as a test-bed for further research.– Content pollution research (another paper)
Existing Tools and Projects
• FastTrack encryption algorithm – available from a Web site: http://gift-
fasttrack.berlios.de/
• KaZaA Media Desktop (KMD) software architecture– http://kazaasearch.narod.ru/
Big Picture of Overlay
• Two layer hierarchy– Ordinary Node (ON)– Super Node (SN)
Measurement Apparatus
• KaZaA Sniffing Platform• KaZaA Probing Tool
KaZaA Sniffing Platform
• Poly (Ethernet)• Home (cable modem)
KaZaA Probing Tool
• Campus & home based probing– Node list – Workloa
d
SN 128.
Probe
ON
SN
ON
SN
SN
SN
ON
SN
SN
SN 24.
Probe
ON SN
SN
ON
ON
Home
Poly
KaZaA Network
SN 213.
ON
ON
ON
ON
ON
Signaling Protocol
SN–SN Node list fragment 1 [Enc]
SN–SN Node list fragment 2 [Enc]
SN–SN Node list fragment n [Enc]
ON-SN session initial SN-SN session initial
TCP Connections Evolution
0
20
40
60
80
100
197193
289
385
481
577
673
769
865
961
1057
1153
1249
1345
1441
1537
1633
1729
1825
1921
2017
2113
2209
2305
2401
2497
on-snsn-sn
0
10
20
30
40
50
60
70
80
1
148
295
442
589
736
883
1030
1177
1324
1471
1618
1765
1912
2059
2206
2353
2500
2647
2794
2941
3088
3235
3382
3529
3676
3823
3970
on-snsn-sn
Poly campus 4 – 6 hour measurement
Cable modem 7-11 hour measurement
SN Workload
0
20
40
60
80
100
120
140
1
104
207
310
413
516
619
722
825
928
1031
1134
1237
1340
1443
1546
1649
1752
1855
1958
2061
2164
2267
2370
2473
2576
on-snsn-sntotal
0
20
40
60
80
1 17 33 49 65 81 97 113
129
145
161
177
193
209
225
241
257
273
289
305
321
337
353
369
385
401
417
433
workload
0
20
40
60
80
100
120
140
1
145
289
433
577
721
865
1009
1153
1297
1441
1585
1729
1873
2017
2161
2305
2449
2593
2737
2881
3025
3169
3313
3457
3601
3745
3889
4033
on-snsn-sntotal
0
10
20
30
40
50
60
70
1 25 49 73 97 121
145
169
193
217
241
265
289
313
337
361
385
409
433
457
481
505
529
553
577
601
625
649
673
worload
7 - 11 hours TCP connections evolution
7 - 11 hours workload values evolution
Signaling Sessions Lifetime
Peer Selection: Node List IP Prefix Match
Peer Selection: Workload & RTT
Index Management: Sharing Content
Port Dynamic and NAT
• 19,637 unique SN addresses collected• Found only 707 SNs (3.6%) use the
default 1214 port number. • 18,887 SNs (96.3%) use non-default
port numbers. • Of total unique 64834 peers (SN + ON),
21269 peers (ON) use private IP.
Summary of Results
• 20,000 ~ 40,000 active super nodes• Each SN connects to approx. 0.1% of
other SNs • Highly dynamic connections: over 35%
SN-SN durations are less than 30 sec.
Summary of results
• Peer selection uses IP prefix match, workload, RTT and freshness
• No index exchange between SNs but query forwarding
• Skewed content distribution: 20% peers provide 70% metadata for sharing.
Design Principles forUnstructured P2P Overlays
• Distributed design– No infrastructure– Avoiding legal attacks.
• Exploit heterogeneity – Hierarchy– Self organization
• Load balancing - workload balancing.• Explicit locality awareness• Shuffle connections in core overlay
Design Principles forUnstructured P2P Overlays
• Properly designed gossip mechanisms – peers have a fresh list of SNs
• Firewall circumvention – dynamic port numbers– improves availability
• NAT circumvention