ranking web sites with real user traffic mark meiss filippo menczer santo fortunato alessandro...
TRANSCRIPT
![Page 1: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/1.jpg)
Ranking Web Sites with Real User Traffic
Mark MeissFilippo MenczerSanto Fortunato
Alessandro FlamminiAlessandro Vespignani
Web Search and Data MiningStanford, CaliforniaFebruary 11, 2008
![Page 2: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/2.jpg)
Outline
•Data collection
•Structural properties
•Behavioral patterns
•PageRank validation
•Temporal patterns
![Page 3: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/3.jpg)
Sources for Ranking Data:The Link Graph
![Page 4: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/4.jpg)
Sources for Ranking Data:Dynamic Sources
• Network flow data
• Web server logs
• Toolbars and plugins
![Page 5: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/5.jpg)
ISP
~100 K users
Sources for Ranking Data:Packet Inspection
![Page 6: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/6.jpg)
Data Collection
HostHostPathPath
RefererRefererUser-AgentUser-AgentTimestampTimestamp
HTTP (80)HTTP (80)30% @ peak30% @ peak
anonymizeranonymizer
GETGET
requests requests from IU onlyfrom IU only
FULLFULLh/p/r/a/th/p/r/a/t
HUMANHUMANh/p/r/a/th/p/r/a/t
{
![Page 7: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/7.jpg)
![Page 8: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/8.jpg)
Outline
•Data collection
•Structural properties
•Behavioral patterns
•PageRank validation
•Temporal patterns
![Page 9: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/9.jpg)
Structural properties: Degree
![Page 10: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/10.jpg)
Caveat: Sampling Bias
![Page 11: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/11.jpg)
Structural properties:Strength (Site Traffic)
![Page 12: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/12.jpg)
Structural properties:Weights (Link Traffic)
![Page 13: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/13.jpg)
Outline
•Data collection
•Structural properties
•Behavioral patterns
•PageRank validation
•Temporal patterns
![Page 14: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/14.jpg)
Behavioral patterns (HUMAN)
(Proportion of total out-strength)
Empty Referrer54%
Search5%
Other40%
Webmail1%
![Page 15: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/15.jpg)
Ratios are stableR
equ
est
s (x
10
6)
0%
20%
40%
60%
80%
100%
Sep06
Oct06
Nov06
Dec06
Jan07
Feb07
Mar07
Apr07
May07
![Page 16: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/16.jpg)
Requ
est
s (x
10
6)
0%
20%
40%
60%
80%
100%
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Ratios are stable
![Page 17: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/17.jpg)
Outline
•Data collection
•Structural properties
•Behavioral patterns
•PageRank validation
•Temporal patterns
![Page 18: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/18.jpg)
Validation of PageRank
• PR is a stationary distribution of visit frequency by a modified random walk (with jumps) on the Web graph
• Compare with actual site traffic (in-strength)
• From an application perspective, we care about the resulting ranking of sites rather than the actual values
![Page 19: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/19.jpg)
Kendall’s Rank Correlation
![Page 20: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/20.jpg)
PageRank Assumptions
1. Equal probability of teleporting to each of the nodes
2. Equal probability of teleporting from each of the nodes
3. Equal probability of following each link from any given node
0:
)()(
)1()(ijwi out
ij iPRWis
w
NjPRW
![Page 21: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/21.jpg)
Kendall’s Rank Correlation
![Page 22: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/22.jpg)
Local Link Heterogeneity
perfect
perfect concentratio
concentrationn
perf
ect
perf
ect
hom
ogen
eity
hom
ogen
eity
HH Index of concentration or
disparity
j out
iji is
wY
2
)(
![Page 23: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/23.jpg)
Teleportation Target Heterogeneity
![Page 24: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/24.jpg)
Teleportation Source Heterogeneity (“hubness”)
ssoutout < s < sinin
teleport sourcesteleport sourcesbrowsing sinksbrowsing sinks
-2
ssoutout > s > sinin
popular hubspopular hubs
![Page 25: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/25.jpg)
Navigation vs. Jumps: Sources of Popularity
![Page 26: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/26.jpg)
Outline
•Data collection
•Structural properties
•Behavioral patterns
•PageRank validation
•Temporal patterns
![Page 27: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/27.jpg)
Temporal patterns
How predictable are traffic patterns?
-- Cache refreshing
(e.g. proxies)
-- Capacity allocation
(e.g. peering and provisioning for spikes)
-- Site design
(e.g. expose content based on time of day)
![Page 28: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/28.jpg)
• Predict future host graph (clicks) from current one, as a function of delay
• Generalized temporal precision and recall:
Ttij ij
ij ijij
tw
twtwR
,)(
)(),(min)(
Temporal patterns
Ttij ij
ij ijij
tw
twtwP
,)(
)(),(min)(
![Page 29: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/29.jpg)
HUMAN host graph (FULL is about 10% more predictable)
![Page 30: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/30.jpg)
Summary
•Heterogeneity: incoming and outgoing site traffic, link traffic
• Less than half of traffic is from following links
•Only 5% of traffic is directly from search engines
•High temporal regularity
•PageRank is a poor predictor of traffic: random walk and random teleportation assumptions violated
![Page 31: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/31.jpg)
Next
•Sampling bias and search bias
•From host graph to page graph
•Modeling traffic: Beyond random walk?
![Page 32: Ranking Web Sites with Real User Traffic Mark Meiss Filippo Menczer Santo Fortunato Alessandro Flammini Alessandro Vespignani Web Search and Data Mining](https://reader030.vdocument.in/reader030/viewer/2022032702/56649ced5503460f949baf92/html5/thumbnails/32.jpg)
THANKS!
Mark Meiss
Filippo Menczer
Santo Fortunato
Alessandro Vespignani
Alessandro Flammini CNLL
??