link analysis (rby)

141
Link Analysis on the Web Levels of Link Analysis Generalizing PageRank Other Functional Rankings Web Spam Web Spam Detection Topological Web Spam Direct Counting of Supporters Spam Detection Results Link Analysis on the Web The big picture, the small picture and the medium-sized picture Ricardo Baeza-Yates 3,4 Joint work with: L. Becchetti 1 , P. Boldi 2 , C. Castillo 1,3 , D. Donato 1,3 , S. Leonardi 1 , B. Poblete 5 1. Universit` a di Roma “La Sapienza” – Rome, Italy 2. Univerit` a degli Studi di Milano – Milan, Italy 3. Yahoo! Research Barcelona – Catalunya, Spain 4. Yahoo! Research Latin America – Santiago, Chile 5. Universitat Pompeu Fabra – Catalunya, Spain

Upload: carlos-castillo

Post on 11-May-2015

1.584 views

Category:

Technology


5 download

TRANSCRIPT

  • 1.Link Analysis on the WebLevels of Link AnalysisGeneralizing PageRankOther Link Analysis on the Web Functional RankingsThe big picture, the small picture and the medium-sized picture Web SpamWeb Spam Detection Ricardo Baeza-Yates3,4 Topological Web Spam Joint work with: L. Becchetti1 , P. Boldi2 , C. Castillo1,3 , Direct Counting D. Donato1,3 , S. Leonardi1 , B. Poblete5 of SupportersSpam Detection Results 1. Universit` di Roma La Sapienza Rome, Italy a 2. Univerit` degli Studi di Milano Milan, Italya3. Yahoo! Research Barcelona Catalunya, Spain 4. Yahoo! Research Latin America Santiago, Chile 5. Universitat Pompeu Fabra Catalunya, Spain

2. Link Analysis on the Web Levels of Link Analysis1 Levels of Link AnalysisGeneralizing PageRank2 Generalizing PageRankOtherOther Functional Rankings3 Functional RankingsWeb SpamWeb Spam4 Web Spam Detection Web Spam Detection Topological Web5 SpamDirect Counting of SupportersTopological Web Spam6 Spam Detection Results Direct Counting of Supporters7Spam Detection Results8 3. Link Analysis on the WebLevels of Link AnalysisGeneralizing PageRankLevels of Link Analysis1 OtherGeneralizing PageRank2 FunctionalOther Functional Rankings Rankings 3Web Spam4 Web Spam Web Spam Detection5 Web Spam DetectionTopological Web Spam6 Topological WebDirect Counting of Supporters7 SpamSpam Detection Results8 Direct Counting of SupportersSpam Detection Results 4. Link Analysis on the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 5. Link Analysis onHow to nd meaningful patterns? the WebLevels of Link AnalysisGeneralizing PageRankOther Functional Rankings Several levels of analysis: Web SpamWeb Spam Macroscopic view: overall structure DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 6. Link Analysis onHow to nd meaningful patterns? the WebLevels of Link AnalysisGeneralizing PageRankOther Functional Rankings Several levels of analysis: Web SpamWeb Spam Macroscopic view: overall structure DetectionMicroscopic view: nodes Topological Web SpamDirect Counting of SupportersSpam Detection Results 7. Link Analysis onHow to nd meaningful patterns? the WebLevels of Link AnalysisGeneralizing PageRankOther Functional Rankings Several levels of analysis: Web SpamWeb Spam Macroscopic view: overall structure DetectionMicroscopic view: nodes Topological Web Spam Mesoscopic view: regions Direct Counting of SupportersSpam Detection Results 8. Link Analysis onMacroscopic view, e.g. Bow-tie the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results[Broder et al., 2000] 9. Link Analysis onMacroscopic view, e.g. Bow-tie, migration the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results[Baeza-Yates and Poblete, 2006] 10. Link Analysis onMacroscopic view, e.g. Jellysh the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results[Tauro et al., 2001] - Internet Autonomous Systems (AS)Topology 11. Link Analysis onMacroscopic view, e.g. Jellysh the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 12. Link Analysis onMicroscopic view, e.g. Degree the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results[Barabsi, 2002] and othersa 13. Link Analysis onMicroscopic view, e.g. Degree the Web Greece Chile Levels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam Detection Spain Korea Topological Web SpamDirect Counting of SupportersSpam Detection Results[Baeza-Yates et al., 2006b] - compares this distribution in 8countries . . . guess what is the result? 14. Link Analysis onMesoscopic view, e.g. Hop-plot the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 15. Link Analysis onMesoscopic view, e.g. Hop-plot the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 16. Link Analysis onMesoscopic view, e.g. Hop-plot the WebLevels of Link Analysis.it (40M pages).uk (18M pages) Generalizing0.30.3 PageRankOther 0.20.2 FrequencyFrequency Functional Rankings 0.10.1 Web SpamWeb Spam0.00.0 5 10 15 20 25 30 5 10 15 20 25 30 DetectionDistance Distance Topological Web.eu.int (800K pages)Synthetic graph (100K pages) SpamDirect Counting 0.30.3 of SupportersSpam Detection0.20.2 FrequencyFrequency Results0.10.1 0.00.0 5 10 15 20 25 30 5 10 15 20 25 30Distance Distance [Baeza-Yates et al., 2006a] 17. Link Analysis on the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 18. Link Analysis on the WebLevels of Link AnalysisGeneralizing PageRankLevels of Link Analysis1 OtherGeneralizing PageRank2 FunctionalOther Functional Rankings Rankings 3Web Spam4 Web Spam Web Spam Detection5 Web Spam DetectionTopological Web Spam6 Topological WebDirect Counting of Supporters7 SpamSpam Detection Results8 Direct Counting of SupportersSpam Detection Results 19. Link Analysis onNotation the WebLevels of Link AnalysisGeneralizingLet PNN be the normalized link matrix of a graph PageRank Row-normalized Other Functional RankingsNo sinks Web Spam Denition (PageRank) Web Spam DetectionStationary state of: Topological Web Spam (1 ) Direct Counting P + 1NN of Supporters N Spam Detection Results 20. Link Analysis onNotation the WebLevels of Link AnalysisGeneralizingLet PNN be the normalized link matrix of a graph PageRank Row-normalized Other Functional RankingsNo sinks Web Spam Denition (PageRank) Web Spam DetectionStationary state of: Topological Web Spam (1 ) Direct Counting P + 1NN of Supporters N Spam Detection Results Follow links with probability Random jump with probability 1 21. Link Analysis onExplicit Formulas the WebLevels of Link AnalysisGeneralizing PageRank Formulas for PageRank Other Functional[Newman et al., 2001, Boldi et al., 2005] RankingsWeb Spam (1 ) Web Spam(P)t . r() = Detection Nt=0 Topological Web Spam(1 )|p| Direct Countingri () = branching(p) of Supporters N Spam DetectionpPath(,i) Results 22. Link Analysis onExplicit Formulas the WebLevels of Link AnalysisGeneralizing PageRank Formulas for PageRank Other Functional[Newman et al., 2001, Boldi et al., 2005] RankingsWeb Spam (1 ) Web Spam (P)t .r() = DetectionN t=0 Topological Web Spam (1 )|p| Direct Counting ri () = branching(p) of SupportersN Spam Detection pPath(,i) Results Path(, i) are incoming paths in node i 23. Link Analysis onBranching contribution the WebLevels of Link AnalysisGeneralizing PageRankDenition (Branching contribution of a path) Other FunctionalGiven a path p = x1 , x2 , . . . , xt of length t = |p| RankingsWeb Spam 1branching(p) = Web Spamd1 d2 dt1 DetectionTopological Webwhere di are the out-degrees of the members of the path SpamDirect CountingFor every node i and every length t of SupportersSpam Detection Resultsbranching(p) = 1.pPath(i,),|p|=t 24. Link Analysis onFunctional ranking the WebLevels of Link AnalysisGeneralizing PageRankOther Functional Rankings General functional ranking [Baeza-Yates et al., 2006a] Web SpamWeb Spamdamping(|p|) Detection ri () =branching(p)N Topological WebpPath(,i) SpamDirect CountingPageRank is a particular case of path-based ranking of SupportersSpam Detection Results 25. Link Analysis on the WebLevels of Link AnalysisGeneralizing PageRankLevels of Link Analysis1 OtherGeneralizing PageRank2 FunctionalOther Functional Rankings Rankings 3Web Spam4 Web Spam Web Spam Detection5 Web Spam DetectionTopological Web Spam6 Topological WebDirect Counting of Supporters7 SpamSpam Detection Results8 Direct Counting of SupportersSpam Detection Results 26. Link Analysis onExponential damping = PageRank the WebLevels of Link0.30 Analysis damping(t) with =0.8 damping(t) with =0.7 Generalizing PageRankOther0.20 FunctionalWeight RankingsWeb SpamWeb Spam0.10 DetectionTopological Web SpamDirect Counting0.00 of Supporters 1 2 345678 9 10 Spam Detection Length of the path (t) Results Exponential damping = PageRank damping(t) = (1 )tMost of the contribution is on the rst few levels. 27. Link Analysis onLinear damping the Web 0.30 Levels of Linkdamping(t) with L=15 Analysis damping(t) with L=10 Generalizing PageRank0.20 Other Functional Weight RankingsWeb Spam0.10 Web Spam DetectionTopological Web Spam0.00 Direct Counting of Supporters1 2 345678 9 10 Spam DetectionLength of the path (t) ResultsLinear damping2(Lt) tT 97. Link Analysis onTruncated PageRank the WebLevels of Link AnalysisGeneralizingProposed in [Becchetti et al., 2006b]. Idea: reduce the direct PageRank contribution of the rst levels of links: Other Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam DetectiontT 0 Resultsdamping(t) = C t t>TV No extra reading of the graph after PageRank 98. Link Analysis onTruncated PageRank(T=2) / PageRank the WebLevels of Link Analysis TruncatedPageRank T=2 / PageRank = 0.30 Generalizing PageRankNormal OtherSpam 0.3 Functional RankingsWeb SpamWeb Spam Detection 0.2 Topological Web SpamDirect Counting of Supporters 0.1 Spam Detection Results00.2 0.40.6 0.8 11.2 1.4 1.6TruncatedPageRank(T=2) / PageRank 99. Link Analysis onMax. change of Truncated PageRank the WebLevels of Link AnalysisMaximum change of Truncated PageRank = 0.29 Generalizing PageRank Normal Other Spam Functional Rankings 0.2 Web SpamWeb Spam DetectionTopological Web Spam 0.1 Direct Counting of SupportersSpam Detection Results00.85 0.9 0.95 1 1.05 1.1 max(TrPRi+1/TrPri) 100. Link Analysis on the WebLevels of Link AnalysisGeneralizing PageRankLevels of Link Analysis1 OtherGeneralizing PageRank2 FunctionalOther Functional Rankings Rankings 3Web Spam4 Web Spam Web Spam Detection5 Web Spam DetectionTopological Web Spam6 Topological WebDirect Counting of Supporters7 SpamSpam Detection Results8 Direct Counting of SupportersSpam Detection Results 101. Link Analysis onHigh and low-ranked pages are dierent the Web4 Levels of Linkx 10 AnalysisTop 0%10% 12 GeneralizingTop 40%50% PageRankTop 60%70% Other 10 Number of Nodes Functional Rankings8 Web SpamWeb Spam Detection 6 Topological Web Spam 4 Direct Counting of Supporters2 Spam Detection Results 015 10 1520 Distance 102. Link Analysis onHigh and low-ranked pages are dierent the Web 4 Levels of Link x 10 Analysis Top 0%10%12 Generalizing Top 40%50% PageRank Top 60%70% Other10Number of Nodes Functional Rankings 8 Web SpamWeb Spam Detection6 Topological Web Spam4 Direct Counting of Supporters 2 Spam Detection Results0 15 10 1520DistanceAreas below the curves are equal if we are in the samestrongly-connected component 103. Link Analysis onProbabilistic counting the WebLevels of Link Analysis 1 1 Generalizing0 0 PageRank0 0 0 0 Other 0 1 1 1 1 1 Functional0 0 1 1 0 0 Rankings0 0 0 0 Propagation of0 0 1 1 Web Spam bits using the 1 0 1 1OR operation 1 0 1 0 Web Spam Detection 1 Target 0 Count bits set Topological Web 0page 0to estimate Spam0 0supporters 0 0 Direct Counting 1 1 1 1 of Supporters 0 0 1 1 0 0 Spam Detection0 0 Results 1 1 0 0 104. Link Analysis onProbabilistic counting the WebLevels of Link Analysis 1 1 Generalizing0 0 PageRank0 0 0 0 Other 0 1 1 1 1 1 Functional0 0 1 1 0 0 Rankings0 0 0 0 Propagation of0 0 1 1 Web Spam bits using the 1 0 1 1OR operation 1 0 1 0 Web Spam Detection 1 Target 0 Count bits set Topological Web 0page 0to estimate Spam0 0supporters 0 0 Direct Counting 1 1 1 1 of Supporters 0 0 1 1 0 0 Spam Detection0 0 Results 1 1 0 0[Becchetti et al., 2006b] shows an improvement of ANFalgorithm [Palmer et al., 2002] based on probabilisticcounting [Flajolet and Martin, 1985] 105. Link Analysis onGeneral algorithm the Web Require: N: number of nodes, d: distance, k: bits Levels of Link Analysis 1: for node : 1 . . . N, bit: 1 . . . k do GeneralizingINIT(node,bit) 2: PageRank3: end for Other Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 106. Link Analysis onGeneral algorithm the Web Require: N: number of nodes, d: distance, k: bits Levels of Link Analysis 1: for node : 1 . . . N, bit: 1 . . . k do GeneralizingINIT(node,bit) 2: PageRank3: end for Other Functional 4: for distance : 1 . . . d do {Iteration step} Rankings Aux 0k Web Spam5:for src : 1 . . . N do {Follow links in the graph} Web Spam 6: Detection for all links from src to dest do 7: Topological WebAux[dest] Aux[dest] OR V[src,] Spam 8: Direct Counting end for 9: of Supportersend for10: Spam Detection ResultsV Aux11:12: end for 107. Link Analysis onGeneral algorithm the Web Require: N: number of nodes, d: distance, k: bits Levels of Link Analysis 1: for node : 1 . . . N, bit: 1 . . . k do GeneralizingINIT(node,bit) 2: PageRank3: end for Other Functional 4: for distance : 1 . . . d do {Iteration step} Rankings Aux 0k Web Spam5:for src : 1 . . . N do {Follow links in the graph} Web Spam 6: Detection for all links from src to dest do 7: Topological WebAux[dest] Aux[dest] OR V[src,] Spam 8: Direct Counting end for 9: of Supportersend for10: Spam Detection ResultsV Aux11:12: end for13: for node: 1 . . . N do {Estimate supporters}Supporters[node] ESTIMATE( V[node,] )14:15: end for16: return Supporters 108. Link Analysis onOur estimator the WebLevels of Link AnalysisGeneralizing PageRankOther FunctionalInitialize all bits to one with probability RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 109. Link Analysis onOur estimator the WebLevels of Link AnalysisGeneralizing PageRankOther FunctionalInitialize all bits to one with probability Rankingsones(node)Estimator: neighbors(node) = log(1 ) 1 Web Spam k Web Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 110. Link Analysis onOur estimator the WebLevels of Link AnalysisGeneralizing PageRankOther FunctionalInitialize all bits to one with probability Rankingsones(node)Estimator: neighbors(node) = log(1 ) 1 Web Spam k Web Spam DetectionAdaptive estimation Topological Web SpamRepeat the above process for = 1/2, 1/4, 1/8, . . . , and look Direct Countingfor the transitions from more than (1 1/e)k ones to less of Supportersthan (1 1/e)k ones. Spam Detection Results 111. Link Analysis onConvergence the WebLevels of Link Analysis 100% Generalizing PageRank 90% Other 80% Functional Rankings Fraction of nodes 70%with estimates Web Spam60% Web Spam Detection 50% d=1 Topological Web d=2 40% Spam d=3 Direct Counting 30% d=4 of Supporters d=5 20% Spam Detection d=6 Results d=7 10% d=80%5 1015 20Iteration 112. Link Analysis onError rate the WebLevels of Link AnalysisGeneralizingOurs 64 bits, epsilononly estimator PageRankOurs 64 bits, combined estimator 0.5 OtherANF 24 bits 24 iterations (576 bi)Average Relative Error FunctionalANF 24 bits 48 iterations (1152 bi) Rankings0.4 Web Spam960 bi Web Spam1216 bi 512 bi832 bi Detection 1344 bi 1408 bi 768 bi 1152 bi 0.3 Topological Web Spam0.2 Direct Counting576 bi 1152 bi of Supporters 512 bi 768 bi960 bi 1216 bi 1344 bi 1408 bi832 bi1152 bi Spam Detection 0.1 Results 0 1 23 4 5 678Distance 113. Link Analysis onHosts at distance 4 the WebLevels of Link = 0.39 Hosts at Distance Exactly 4 Analysis0.4 GeneralizingNormal PageRank Spam Other Functional Rankings 0.3 Web SpamWeb Spam DetectionTopological Web 0.2 SpamDirect Counting of SupportersSpam Detection 0.1 Results0 1 100 1000S4 S3 114. Link Analysis onMinimum change of supporters the WebLevels of Link = 0.39 Minimum change of supporters AnalysisGeneralizing PageRankNormal 0.4Spam Other Functional RankingsWeb Spam 0.3 Web Spam DetectionTopological Web Spam 0.2 Direct Counting of SupportersSpam Detection 0.1 Results01 510 min(S2/S1, S3/S2, S4/S3) 115. Link Analysis on the WebLevels of Link AnalysisGeneralizing PageRankLevels of Link Analysis1 OtherGeneralizing PageRank2 FunctionalOther Functional Rankings Rankings 3Web Spam4 Web Spam Web Spam Detection5 Web Spam DetectionTopological Web Spam6 Topological WebDirect Counting of Supporters7 SpamSpam Detection Results8 Direct Counting of SupportersSpam Detection Results 116. Link Analysis onDetection rates the WebLevels of Link AnalysisGeneralizing PageRank 60% (UK-2006) 80% (UK-2002) of detection rate, with Other Functional4%2% error rate by combining dierent Rankings attributes [Becchetti et al., 2006a]. Web SpamWeb SpamX No magic bullet in link analysis DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 117. Link Analysis onDetection rates the WebLevels of Link AnalysisGeneralizing PageRank 60% (UK-2006) 80% (UK-2002) of detection rate, with Other Functional4%2% error rate by combining dierent Rankings attributes [Becchetti et al., 2006a]. Web SpamWeb SpamX No magic bullet in link analysis DetectionX Topological WebPrecision still low compared to e-mail spam lters SpamDirect Counting of SupportersSpam Detection Results 118. Link Analysis onDetection rates the WebLevels of Link AnalysisGeneralizing PageRank 60% (UK-2006) 80% (UK-2002) of detection rate, with Other Functional4%2% error rate by combining dierent Rankings attributes [Becchetti et al., 2006a]. Web SpamWeb SpamX No magic bullet in link analysis DetectionX Topological WebPrecision still low compared to e-mail spam lters Spam V Measure both home page and max. PageRank page Direct Counting of SupportersSpam Detection Results 119. Link Analysis onDetection rates the WebLevels of Link AnalysisGeneralizing PageRank 60% (UK-2006) 80% (UK-2002) of detection rate, with Other Functional4%2% error rate by combining dierent Rankings attributes [Becchetti et al., 2006a]. Web SpamWeb SpamX No magic bullet in link analysis DetectionX Topological WebPrecision still low compared to e-mail spam lters Spam V Measure both home page and max. PageRank page Direct Counting of SupportersV Host-based counts of neighbors are important Spam Detection Results 120. Link Analysis onDetection rates the WebLevels of Link AnalysisGeneralizing PageRank 60% (UK-2006) 80% (UK-2002) of detection rate, with Other Functional4%2% error rate by combining dierent Rankings attributes [Becchetti et al., 2006a]. Web SpamWeb SpamX No magic bullet in link analysis DetectionX Topological WebPrecision still low compared to e-mail spam lters Spam V Measure both home page and max. PageRank page Direct Counting of SupportersV Host-based counts of neighbors are important Spam Detection Results 121. Link Analysis onDetection rates the WebLevels of Link AnalysisGeneralizing PageRank 60% (UK-2006) 80% (UK-2002) of detection rate, with Other Functional4%2% error rate by combining dierent Rankings attributes [Becchetti et al., 2006a]. Web SpamWeb SpamX No magic bullet in link analysis DetectionX Topological WebPrecision still low compared to e-mail spam lters Spam V Measure both home page and max. PageRank page Direct Counting of SupportersV Host-based counts of neighbors are important Spam Detection ResultsNext step: combine link analysis and content analysis 122. Link Analysis onUpcoming Web Spam Challenge on UK-2006 the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWe asked 20+ volunteers to clasify entire hosts Web SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 123. Link Analysis onUpcoming Web Spam Challenge on UK-2006 the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWe asked 20+ volunteers to clasify entire hosts Web SpamWeb Spam We provided several examples DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 124. Link Analysis onUpcoming Web Spam Challenge on UK-2006 the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWe asked 20+ volunteers to clasify entire hosts Web SpamWeb Spam We provided several examples DetectionAsked to classify normal / borderline / spam Topological Web SpamDirect Counting of SupportersSpam Detection Results 125. Link Analysis onUpcoming Web Spam Challenge on UK-2006 the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWe asked 20+ volunteers to clasify entire hosts Web SpamWeb Spam We provided several examples DetectionAsked to classify normal / borderline / spam Topological Web SpamDo they agree? Mostly . . . Direct Counting of SupportersSpam Detection Results 126. Link Analysis onAgreement between humans the WebLevels of Link AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 127. Link Analysis onResult: rst public Web Spam collection the WebLevels of Link AnalysisGeneralizing PageRankOtherPublic spam collection Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 128. Link Analysis onResult: rst public Web Spam collection the WebLevels of Link AnalysisGeneralizing PageRankOtherPublic spam collection Functional RankingsWeb graph with 80 million pages Web SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 129. Link Analysis onResult: rst public Web Spam collection the WebLevels of Link AnalysisGeneralizing PageRankOtherPublic spam collection Functional RankingsWeb graph with 80 million pages Web Spam11,000 hosts Web Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 130. Link Analysis onResult: rst public Web Spam collection the WebLevels of Link AnalysisGeneralizing PageRankOtherPublic spam collection Functional RankingsWeb graph with 80 million pages Web Spam11,000 hosts Web SpamLabels for 4,000 hosts by at least 2 humans each DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 131. Link Analysis onResult: rst public Web Spam collection the WebLevels of Link AnalysisGeneralizing PageRankOtherPublic spam collection Functional RankingsWeb graph with 80 million pages Web Spam11,000 hosts Web SpamLabels for 4,000 hosts by at least 2 humans each DetectionTopological WebUpcoming Web Spam challenge SpamDirect Counting of SupportersSpam Detection Results 132. Link Analysis onResult: rst public Web Spam collection the WebLevels of Link AnalysisGeneralizing PageRankOtherPublic spam collection Functional RankingsWeb graph with 80 million pages Web Spam11,000 hosts Web SpamLabels for 4,000 hosts by at least 2 humans each DetectionTopological WebUpcoming Web Spam challenge SpamMachine learning Direct Counting of SupportersSpam Detection Results 133. Link Analysis onResult: rst public Web Spam collection the WebLevels of Link AnalysisGeneralizing PageRankOtherPublic spam collection Functional RankingsWeb graph with 80 million pages Web Spam11,000 hosts Web SpamLabels for 4,000 hosts by at least 2 humans each DetectionTopological WebUpcoming Web Spam challenge SpamMachine learning Direct Counting of SupportersInformation retrieval Spam Detection Results 134. Link Analysis onResult: rst public Web Spam collection the WebLevels of Link AnalysisGeneralizing PageRankOtherPublic spam collection Functional RankingsWeb graph with 80 million pages Web Spam11,000 hosts Web SpamLabels for 4,000 hosts by at least 2 humans each DetectionTopological WebUpcoming Web Spam challenge SpamMachine learning Direct Counting of SupportersInformation retrieval Spam [email protected] Results 135. Link Analysis on the WebLevels of LinkThank you! AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 136. Link Analysis on the WebLevels of LinkThank you! AnalysisGeneralizing PageRankOther Functional RankingsWeb SpamWeb Spam DetectionTopological Web SpamDirect Counting of SupportersSpam Detection Results 137. Link Analysis on the WebBaeza-Yates, R., Boldi, P., and Castillo, C. (2006a).Generalizing pagerank: Damping functions for link-based Levels of Link Analysisranking algorithms. GeneralizingIn Proceedings of ACM SIGIR, pages 308315, Seattle, PageRank Washington, USA. ACM Press. Other Functional RankingsBaeza-Yates, R., Castillo, C., and Efthimiadis, E. (2006b). Web SpamCharacterization of national web domains. Web Spam DetectionTo appear in ACM TOIT. Topological Web SpamBaeza-Yates, R. and Poblete, B. (2006). Direct Counting of SupportersDynamics of the chilean web structure. Spam DetectionComput. Networks, 50(10):14641473. ResultsBarabsi, A.-L. (2002). aLinked: The New Science of Networks.Perseus Books Group. 138. Link Analysis on the WebBecchetti, L., Castillo, C., Donato, D., Leonardi, S., andBaeza-Yates, R. (2006a). Levels of LinkLink-based characterization and detection of Web Spam. AnalysisGeneralizingIn Second International Workshop on Adversarial Information PageRankRetrieval on the Web (AIRWeb), Seattle, USA. Other Functional RankingsBecchetti, L., Castillo, C., Donato, D., Leonardi, S., and Web SpamBaeza-Yates, R. (2006b). Web SpamUsing rank propagation and probabilistic counting for Detectionlink-based spam detection. Topological Web SpamIn Proceedings of the Workshop on Web Mining and Web Direct CountingUsage Analysis (WebKDD), Pennsylvania, USA. ACM Press. of SupportersSpam DetectionBenczr, A. A., Csalogny, K., Sarls, T., and Uher, M. uao Results (2005).Spamrank: fully automatic link spam detection.In Proceedings of the First International Workshop onAdversarial Information Retrieval on the Web, Chiba, Japan. 139. Link Analysis on the Web Boldi, P., Santini, M., and Vigna, S. (2005).Pagerank as a function of the damping factor. Levels of Link AnalysisIn Proceedings of the 14th international conference on World GeneralizingWide Web, pages 557566, Chiba, Japan. ACM Press. PageRankOther FunctionalBroder, A., Kumar, R., Maghoul, F., Raghavan, P., RankingsRajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. Web Spam(2000). Web Spam DetectionGraph structure in the web: Experiments and models. Topological WebIn Proceedings of the Ninth Conference on World Wide Web, Spampages 309320, Amsterdam, Netherlands. ACM Press. Direct Counting of Supporters Fetterly, D., Manasse, M., and Najork, M. (2004). Spam Detection ResultsSpam, damn spam, and statistics: Using statistical analysis tolocate spam web pages.In Proceedings of the seventh workshop on the Web anddatabases (WebDB), pages 16, Paris, France. 140. Link Analysis onFlajolet, P. and Martin, N. G. (1985). the Web Probabilistic counting algorithms for data base applications. Levels of LinkJournal of Computer and System Sciences, 31(2):182209. AnalysisGeneralizingGibson, D., Kumar, R., and Tomkins, A. (2005). PageRankOtherDiscovering large dense subgraphs in massive graphs. Functional RankingsIn VLDB 05: Proceedings of the 31st international conference Web Spamon Very large data bases, pages 721732. VLDB Endowment. Web Spam DetectionGyngyi, Z., Molina, H. G., and Pedersen, J. (2004).o Topological WebCombating web spam with trustrank. SpamDirect CountingIn Proceedings of the Thirtieth International Conference on of SupportersVery Large Data Bases (VLDB), pages 576587, Toronto, Spam DetectionCanada. Morgan Kaufmann. ResultsNewman, M. E., Strogatz, S. H., and Watts, D. J. (2001).Random graphs with arbitrary degree distributions and theirapplications.Phys Rev E Stat Nonlin Soft Matter Phys, 64(2 Pt 2). 141. Link Analysis on the WebLevels of Link Analysis Palmer, C. R., Gibbons, P. B., and Faloutsos, C. (2002). Generalizing PageRankANF: a fast and scalable tool for data mining in massive Other Functionalgraphs. RankingsIn Proceedings of the eighth ACM SIGKDD international Web Spamconference on Knowledge discovery and data mining, pages Web Spam Detection8190, New York, NY, USA. ACM Press. Topological Web SpamTauro, L., Palmer, C., Siganos, G., and Faloutsos, M. (2001). Direct CountingA simple conceptual model for the internet topology. of SupportersSpam DetectionIn Global Internet, San Antonio, Texas, USA. IEEE CS Press. Results