leonid e. zhukov · leonid e. zhukov (hse) lecture 6 17.02.2015 7 / 21. web as a graph hyperlinks -...

21
Link Analysis Leonid E. Zhukov School of Data Analysis and Artificial Intelligence Department of Computer Science National Research University Higher School of Economics Structural Analysis and Visualization of Networks Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 1 / 21

Upload: others

Post on 31-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Link Analysis

Leonid E. Zhukov

School of Data Analysis and Artificial IntelligenceDepartment of Computer Science

National Research University Higher School of Economics

Structural Analysis and Visualization of Networks

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 1 / 21

Page 2: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Lecture outline

1 Graph-theoretic definitions

2 Web search engines

3 Web page ranking algorithms– Pagerank– HITS

4 The Web as a graph

5 PageRank beyond the web

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 2 / 21

Page 3: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Graph theory

Graph G (E ,V ), |V | = n, |E | = mAdjacency matrix An×n, Aij , edge i → j

Graph is directed, matrix is non-symmetric: AT 6= A, Aij 6= Aji

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 3 / 21

Page 4: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Graph theory

sinks: zero out degree nodes, kout(i) = 0, absorbing nodes

sources: zero in degree nodes, kin(i) = 0

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 4 / 21

Page 5: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Graph theory

Graph is strongly connected if every vertex is reachable form everyother vertex.

Strongly connected components are partitions of the graph intosubgraphs that are strongly connected

In strongly connected graphs there is a path is each direction betweenany two pairs of vertices

image from Wikipedia

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 5 / 21

Page 6: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Graph theory

A directed graph is aperiodic if the greatest common divisor of thelengths of its cycles is one (there is no integer k >1 that divides thelength of every cycle of the graph)

image from Wikipedia

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 6 / 21

Page 7: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Web search engine

”The Anatomy of a Large-Scale Hypertextual Web Search Engine”

Sergey Brin and Lawrence Page, 1998

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21

Page 8: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Web as a graph

Hyperlinks - implicit endorsements

Web graph - graph of endorsements (sometimes reciprocal)

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 8 / 21

Page 9: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Ranking on directed graph

iteratively update

ri ←∑

j∈N(i)

rj =∑j

Aji rj

r t+1i =

∑j

Aji rtj , with r t=0

j = r0j

rt+1 = AT rt , rt=0 = r0

norm ||rt+1|| ≥ ||rt ||

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 9 / 21

Page 10: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Ranking on directed graph

Absorbing nodes

Source nodes

Cycles

rt+1 = AT rt , rt=0 = r0

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 10 / 21

Page 11: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

PageRank

”PageRank can be thought of as a model of user behavior. We assume there is a”random surfer” who is given a web page at random and keeps clicking on links,never hitting ”back” but eventually gets bored and starts on another randompage. The probability that the random surfer visits a page is its PageRank.”

PR(A) = (1− d) + d(PR(T1)/C (T1) + ...+ PR(Tn)/C (Tn))

Sergey Brin and Larry Page, 1998

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 11 / 21

Page 12: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Random walk

Random walk on a directed graph

pt+1i =

∑j∈N(i)

ptjdoutj

=∑j

Aji

doutj

pj

Dii = diag{douti }

pt+1 = (D−1A)Tpt

pt+1 = PTpt

Markov chain with transition probability matrix P = D−1A

limt→∞

pt = π

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 12 / 21

Page 13: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Perron-Frobenius Theorem

Perron-Frobenius theorem (Fundamental Theorem of Markov Chains)If matrix is

stochastic (non-negative and rows sum up to one, describes Markovchain)

irreducible (strongly connected graph)

aperiodic

then∃ limt→∞

p̄t = π̄

and can be found as a left eigenvector

π̄P = π̄, where ||π̄||1 = 1

π̄ - stationary distribution of Markov chain, raw vectorOscar Perron, 1907, Georg Frobenius,1912.

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 13 / 21

Page 14: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

PageRank

Transition matrix:

P = D−1A

Stochastic matrix:

P′ = P +seT

nPageRank matrix:

P′′ = αP′ + (1− α)eeT

n

Eigenvalue problem (choose solution with λ = 1):

P′′Tp = λp

Notations:e - unit column vector, s - absorbing nodes indicator vector (column)

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 14 / 21

Page 15: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

PageRank computations

Eigenvalue problem (λ = 1, ||p||1 = pTe = 1):(αP′ + (1− α)

eeT

n

)T

p = λp

p = αP′Tp + (1− α)e

n

Power iterations:p← αP′Tp + (1− α)

e

n

Sparse linear system:

(I− αP′T )p = (1− α)e

n

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 15 / 21

Page 16: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Graph structure of the web

Bow tie structure of the web

Andrei Broder et al, 1999

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 16 / 21

Page 17: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

Hubs and Authorities (HITS)

Citation networks. Reviews vs original research (authoritative) papers

authorities, contain useful information, aihubs, contains links to authorities, hi

Mutual recursion

Good authorities reffered bygood hubs

ai ←∑j

Ajihj

Good hubs point to goodauthorities

hi ←∑j

Aijaj

Jon Kleinberg, 1999

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 17 / 21

Page 18: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

HITS

System of linear equations

a = αATh

h = βAa

Symmetric eigenvalue problem

(ATA)a = λa

(AAT )h = λh

where eigenvalue λ = (αβ)−1

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 18 / 21

Page 19: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

HITS

Focused subgraph of WWW

Jon Kleinberg, 1999

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 19 / 21

Page 20: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

PageRank beyond the Web

from David Gleich

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 20 / 21

Page 21: Leonid E. Zhukov · Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 7 / 21. Web as a graph Hyperlinks - implicit endorsements Web graph - graph of endorsements (sometimes reciprocal)

References

The Anatomy of a Large-Scale Hypertextual Web Search Engine,Sergey Brin and Lawrence Page, 1998

Authoritative Sources in a Hyperlinked Environment. Jon M.Kleinberg, Proc. 9th ACM-SIAM Symposium on Discrete Algorithms,1999

Graph structure in the Web, Andrei Broder et all. Procs of the 9thinternational World Wide Web conference, 2000

A Survey of Eigenvector Methods of Web Information Retrieval. AmyN. Langville and Carl D. Meyer, 2004

PageRank beyond the Web. David F. Gleich, arXiv:1407.5107, 2014

Leonid E. Zhukov (HSE) Lecture 6 17.02.2015 21 / 21