dynamics of peer-to-peer networks or who is going to be the next pop star? yuval shavitt school of...

35
Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star ? Yuval Shavitt School of Electrical Engineering [email protected] http://

Upload: orion-linch

Post on 01-Apr-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Dynamics of Peer-to-Peer Networks or

Who is Going to be The Next Pop Star?

Yuval ShavittSchool of Electrical Engineering

[email protected]://www.eng.tau.ac.il/~shavitt

Page 2: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Credits

Talk is based on the papers:• Static and dynamic characterization of the

Gnutella network [Shaked-Gish, S, Tankel, IPTPS 2007]

• How to predict the next pop star? [Koenigstein, S, Tankel, KDD 2008]

Page 3: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

What are Peer-to-Peer Networks?

• The common computing paradigm is client-server– Server waits for requests (on a

known port)– Client sends a request– Server serves the client– Examples: WWW, FTP, SMTP (e-

mail), …..

• Peer-to-peer networks:– Each end-point is both client and

server

client client

client client

client client

client clientserver

Page 4: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

The Gnutella Network

• Gnutella: The most popular sharing network on the Internet

• According to the Digital Music News Research Group 40% market share in Q4 2007

• Limewire: The most popular file sharing client in the world. Dominates the Gnutella network.

Page 5: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

The Gnutella Protocol

• Originally: a flat peer-to-peer distributed protocol.– Churn caused instability

• Today: a 2-level tiered system – Stable nodes are promoted to become ultrapeers– Queries carry OOB address:

The originator’s address or in most cases when the client is firewalled, this is the ultrapeer’s address

Page 6: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Locating the Origin IP address

IP resolution Process:

• Detect the U.P. IP• Discard queries with

more than 2 hops• Discard queries with

2 hops and same IP• Intercept queries

with 2 hops and different IPs

peer peer

UPUPUP listener

peer

Cancels the bias for rare queries

Introduces bias against firewalled clients

Page 7: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Data Sets• First study:

– Jul 2006 - Nov 2006– 665,000,000 world-wide geo-identified queries

• Second study– Oct 2006 – Jul 2007, Sundays only– 310,000,000 USA geo-identified queries

• A network crawl of 24 hours– 1.2M users– 533,000 different songs

Largest studies ever performedin length and depth

Page 8: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Query Classification in Gnutella

Music (68.11%) Adult (22.01%)

Movie (4.1%) TV (1.7%)

Unknow n (1.67%) Japanese Anime/Comic (1.37%)

Softw are (0.54%) File Suff ix (0.26%)

Spam (0.23%)

2nd

Page 9: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Top Coutries

Page 10: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Queries Per Day

Page 11: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Queries Per Hour Per User

Page 12: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Top Queries (constant)

Page 13: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Top Volatile Queries

Page 14: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Temporal Ranking Drift

Page 15: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

How to Predict Artist’s Success?

Noam Koenigstein, Y. Shavitt, and Tomer Tankel. Spotting Out Emerging Artists Using Geo-Aware Analysis of P2P Query Strings. The 2008 ACM SIGKDD Conference, August 2008, Las Vegas, NV, USA.

Page 16: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

The Word of Mouth Effect

A successful innovation formation of adopter-clusters around early adopters

unsuccessful product a uniform spatial distribution

The Divergence can be used to predict a new product success probability [Garber et al., Marketing Science 2004]

Page 17: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

The divergence

• When measured against the uniform distribution, maximum is achieved when P is a function.– True for both Kullback-Leiblar and Jensen-

Shannon– This is the case when emerging artists are

considered

• Non uniform distribution of potential adopters:

Page 18: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Party Like a Rockstar in 2007Week 6: The string “party like a rockstar” is detected by the algorithmWeek 8: Atlanta’s popularity chart in (Feb 18th)Week 15: Atlanta based Shop Boyz sign contract with Universal RecordingsWeek 18: The song first enters the Billboard Hot 100 on (80th position)Week 23: Reached 2nd position on Billboard Hot 100

Ranked only10,156on the

global chart

Page 19: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Party Like a Rockstar

0

0.5

1

1.5

2

2.5

3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Week Numbers (2007)

Div

erg

en

ce

0.00E+00

1.00E-02

2.00E-02

3.00E-02

4.00E-02

5.00E-02

6.00E-02

7.00E-02

8.00E-02

Po

pu

lari

ty

KL Divergence

PopularityShop Boyz related queries in February 2007

Shop Boyz Popularity and Divergence in 2007

Page 20: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Soulja Boy

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Week Numbers (2007)

Div

erg

ence

0.00E+00

1.00E-02

2.00E-02

3.00E-02

4.00E-02

5.00E-02

6.00E-02

KL Divergence

Popularity

• Detected by our alg:already in 2006.

•The string “soulja boy” entered the “Atlanta queries top 100” already in October 2006

• Entered the Bubbling Under R&B/Hip-Hop Singles in the 23rd of June 2007•Later ranked first in the following Billboard charts:Hot 100, Hot Rap Tracks, Hot Videoclip, Hot RingMasters and Hot Ringtones

Page 21: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Yung Berg

• Active in LA

• Week 2: Entered LA top 100

• Week 15: First appeared on the Billboard charts

• Week 32: Reached 18 on the Billboard Top 100

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Week Numbers (2007)

Div

erg

ence

0.00E+00

2.00E-03

4.00E-03

6.00E-03

8.00E-03

1.00E-02

1.20E-02

1.40E-02

1.60E-02

Po

pu

lari

ty

KL Diveregence

Popularity

Page 22: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Madonna

Page 23: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

The Detection Algorithm• Input: A list of Geo-identified P2P Query strings

Output: A list of locally popular query string with high probability to become globally popular

• Build local and global popularity charts

• local popularity is detected using local and global popularity thresholds

• Looking for local popularity growth trends from week to week

• Filtering:Non-music related content, and already familiar artists are characterized by uniform distribution

Page 24: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Local Popularity

• Not all queries are “products”, thus divergence is not effective (e.g., rare typos)

• Detection is based on local popularity:

Page 25: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

ATPL - All Times Popular List• Initialization: All the strings that reached global popularity in

2006

• Weekly aggregation

• Filters non-volatile string: • adult related, e.g., “porn” • well established artists, e.g., “madonna”, “avril lavigne”• Movies, software, etc.

Page 26: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Algorithm's Flow

Page 27: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Detection Time

Page 28: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Local Threshold

Page 29: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Local Threshold

Page 30: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Manual inspection of the Atlanta data

Page 31: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Correlation Between Billboard and downloads

Page 32: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Correlation Measurements

• Modified time series correlation

• P2P correlation with the Billboard:

Page 33: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Finding The Optimal Time Shift

Page 34: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Prediction Results

• Example:When a song enters the Billboard will it reach “top 20”?

• Precision: 89%, Recall: 80%On average songs pass the threshold 2.83 weeks before reaching top Billboard rank

• More details:Koenigstein, Shavitt, and Zilberman, AdMIRe 2009

Page 35: Dynamics of Peer-to-Peer Networks or Who is Going to be The Next Pop Star? Yuval Shavitt School of Electrical Engineering shavitt@eng.tau.ac.il shavitt

Summary

• Following activity in the Internet can help up detect trends before they are visible– P2P networks– Social networks– Blogs– Talk-backs– Searches

• More at http://www.eng.tau.ac.il/~shavitt