pagerank and markov chain

Post on 08-May-2015

5.145 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

A brief introduction to the methodology used by PageRank to rank the webpages.

TRANSCRIPT

Markov Chains as methodology used by PageRank torank the Web Pages on Internet.

Sergio S. Guirreri - www.guirreri.host22.com

Google Technology User Group (GTUG) of Palermo.

5th March 2010

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 1 / 14

Overview

1 Concepts on Markov-Chains.

2 The idea of the PageRank algorithm.

3 The PageRank algorithm.

4 Solving the PageRank algorithm.

5 Conclusions.

6 Bibliography.

7 Internet web sites.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 2 / 14

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.

Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.

The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

Concepts on Markov-Chains.

Stochastic Process and Markov-Chains.

Let assume the following stochastic process

{Xn; n = 0, 1, 2, . . . }

with values in a set E , called the state space, while its elements are calledstate of the process.Let assume the set E is finite or countable.

DefinitionA Markov Chain is a stochastic process Xn that hold the following feature:

Prob{Xn+1 = j|Xn = i,Xn−1 = in−1, . . . ,X0 = i0} =

= Prob{Xn+1 = j|Xn = i} = pij(n)

where E is the state space set and j, i, in−1, . . . , i0 ∈ E , n ∈ N.The transition probability matrix P of the process Xn is composed of pij ,∀i, j ∈ E .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 3 / 14

The idea of the PageRank algorithm.

PageRank’s idea.The idea behind the PageRank algorithm is similar to the idea of the impactfactor index used to rank the Journals [Page et al.(1999)][Brin and Page(1998)] [Langville et al.(2008)].

PageRank the impact factor of Internet.The impact factor of a journal is defined as the average number of citationsper recently published papers in that journal.By regarding each web page as a journal, this idea was then extended tomeasure the importance of the web page in the PageRank Algorithm.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 4 / 14

The idea of the PageRank algorithm.

PageRank’s idea.The idea behind the PageRank algorithm is similar to the idea of the impactfactor index used to rank the Journals [Page et al.(1999)][Brin and Page(1998)] [Langville et al.(2008)].

PageRank the impact factor of Internet.The impact factor of a journal is defined as the average number of citationsper recently published papers in that journal.By regarding each web page as a journal, this idea was then extended tomeasure the importance of the web page in the PageRank Algorithm.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 4 / 14

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.

let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.let be k the outgoing links of web page j.

let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

The idea of the PageRank algorithm.

Elements of the PageRank.

To illustrate the PageRank algorithm I define the following variables[Ching and Ng(2006)]:

let be N the total number of web pages in the web.let be k the outgoing links of web page j.let be Q the so called hyperlink matrix with elements:

Qij =

1k if web page i is an outgoing link of web page j;0 otherwise;Qi,i > 0 ∀i.

(1)

The hyperlink matrix Q can be regarded as a transition probability matrix ofa Markov chain.One may regard a surfer on the net as a random walker and the web pages asthe states of the Markov chain.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 5 / 14

The PageRank algorithm.

The PageRank with irreducible Markov Chain.

Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.

aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some

state s is always a multiple of k. Therefore a chain is aperiodic if k=1.

The PageRank

Each pi is the proportion of time that the surfer visiting the web page i.The higher the value of pi is, the more important web page i will be.The PageRank of web page i is then defined as pi .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14

The PageRank algorithm.

The PageRank with irreducible Markov Chain.

Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.

aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some

state s is always a multiple of k. Therefore a chain is aperiodic if k=1.

The PageRankEach pi is the proportion of time that the surfer visiting the web page i.

The higher the value of pi is, the more important web page i will be.The PageRank of web page i is then defined as pi .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14

The PageRank algorithm.

The PageRank with irreducible Markov Chain.

Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.

aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some

state s is always a multiple of k. Therefore a chain is aperiodic if k=1.

The PageRankEach pi is the proportion of time that the surfer visiting the web page i.The higher the value of pi is, the more important web page i will be.

The PageRank of web page i is then defined as pi .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14

The PageRank algorithm.

The PageRank with irreducible Markov Chain.

Assuming that the Markov chain is irreduciblea and aperiodicb then thesteady-state probability distribution (p1, p2, . . . , pN )T of the states (webpages) exists.

aA Markov chain is irreducible if all states communicate with each other.bA chain is periodic if there exists k > 1 such that the interval between two visits to some

state s is always a multiple of k. Therefore a chain is aperiodic if k=1.

The PageRankEach pi is the proportion of time that the surfer visiting the web page i.The higher the value of pi is, the more important web page i will be.The PageRank of web page i is then defined as pi .

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 6 / 14

The PageRank algorithm.

The PageRank with reducible Markov Chain

Since the matrix Q can be reducible to ensure that the steady-stateprobability exists and is unique the following matrix P must be considered:

P = α

Q11 Q12 . . . Q1NQ21 Q22 . . . Q2N. . . . . . . . . . . .

QN1 QN2 . . . QNN

+ (1− α)N

1 1 . . . 11 1 . . . 1. . . . . . . . . . . .1 1 . . . 1

(2)

Where 0 < α < 1 and the most popular values of α are 0.85 and (1− 1/N ).

Interpretation of PageRankThe idea of the PageRank (2) is that, for a network of N web pages, each webpage has an inherent importance of (1− α)/N .If a page Pi has an importance of pi , then it will contribute an importance ofα pi which is shared among the web pages that it points to.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 7 / 14

The PageRank algorithm.

The PageRank with reducible Markov Chain

Since the matrix Q can be reducible to ensure that the steady-stateprobability exists and is unique the following matrix P must be considered:

P = α

Q11 Q12 . . . Q1NQ21 Q22 . . . Q2N. . . . . . . . . . . .

QN1 QN2 . . . QNN

+ (1− α)N

1 1 . . . 11 1 . . . 1. . . . . . . . . . . .1 1 . . . 1

(2)

Where 0 < α < 1 and the most popular values of α are 0.85 and (1− 1/N ).

Interpretation of PageRankThe idea of the PageRank (2) is that, for a network of N web pages, each webpage has an inherent importance of (1− α)/N .If a page Pi has an importance of pi , then it will contribute an importance ofα pi which is shared among the web pages that it points to.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 7 / 14

The PageRank algorithm.

The PageRank with reducible Markov Chain

Solving the following linear system of equations subject to the normalizationconstraint one can obtain the importance of web page Pi :

p1p2...

pN

= α

Q11 Q12 . . . Q1NQ21 Q22 . . . Q2N. . . . . . . . . . . .

QN1 QN2 . . . QNN

p1p2...

pN

+ (1− α)N

11...1

(3)

SinceN∑

i=1pi = 1

the (3) can be rewritten as

(p1, p2, . . . , pN )T = P(p1, p2, . . . , pN )T

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 8 / 14

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:

there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Solving the PageRank algorithm.

The power method.

The power method is an iterative method for solving the dominant eigenvalueand its corresponding eigenvectors of a matrix.

Given an n × n matrix A, the hypothesis of power method are:there is a single dominant eigenvalue. The eigenvalues can be sorted:

|λ1| > |λ2| ≥ |λ3| ≥ . . . |λn|

there is a linearly independent set of n eigenvectors:

{u(1),u(2), . . . ,u(n)}

so thatAu(i) = λiu(i), i = 1, . . . ,n.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 9 / 14

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→

limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→

Ak ≈ a1λk1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Solving the PageRank algorithm.

The power method.The initial vector x0 can be wrote:

x(0) = a1u(1) + a2u(2) + · · ·+ anu(n)

iterating the initial vector with the A matrix:

Akx(0) = a1Aku(1) + a2Aku(2) + · · ·+ anAku(n)

= a1λk1u(1) + a2λ

k2u(2) + · · ·+ anλ

knu(n).

dividing by λk1

Akx(0)

λk1

= a1u(1) + a2

(λ2

λ1

)ku(2) + · · ·+ an

(λnλ1

)ku(n),

Since|λi ||λ1|

< 1→ limk→∞

|λi |k

|λ1|k= 0→ Ak ≈ a1λ

k1u(1)

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 10 / 14

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.

The convergence rate of the power method depends on the ratio of λ2λ1

.It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2

λ1.

It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2

λ1.

It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2

λ1.

It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.

The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Conclusions.

The power method and PageRank.

Results.The matrix P of the PageRank algorithm is a stochastic matrix thereforethe largest eigenvalue is 1.The convergence rate of the power method depends on the ratio of λ2

λ1.

It has been showed by [Haveliwala and Kamvar(2003)] that for the secondlargest eigenvalue of P, we have

|λ2| ≤ α 0 ≤ α ≤ 1.

Since λ1 = 1 the converge rate depends on α.The most popular value for α is 0.85. With this value it has been provedthat the power method on web data set of over 80 million pages convergesin about 50 iterations.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 11 / 14

Conclusions.

Really thanks to GTUG Palermoand

see you to the next meeting!

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 12 / 14

Bibliography.

Bibliography.

Brin, S. and Page, L. (1998).The anatomy of a large-scale hypertextual Web search engine.Computer networks and ISDN systems, 30(1-7), 107–117.

Ching, W. and Ng, M. (2006).Markov Chains: Models, Algoritms and Applications.Springer Science + Business Media, Inc.

Haveliwala, T. and Kamvar, M. (2003).The second eigenvalue of the google matrix.Technical report, Stanford University.

Langville, A., Meyer, C., and FernAndez, P. (2008).Google’s PageRank and beyond: the science of search engine rankings.The Mathematical Intelligencer, 30(1), 68–69.

Page, L., Brin, S., Motwani, R., and Winograd, T. (1999).The PageRank Citation Ranking: Bringing Order to the Web.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 13 / 14

Internet web sites.

Internet web sites.

Jon Atle Gulla (2007) - From Google Search to Semantic Exploration. -Norwegian University of Science Technology -www.slideshare.net/sveino/semantics-and-search?type=presentation

Steven Levy (2010) - Exclusive: How Google’s Algorithm Rules the Web - WiredMagazine - www.wired.com/magazine/2010/02/ff_google_algorithm/

Ann Smarty (2009) - Let’s Try to Find All 200 Parameters in Google Algorithm -Search Engine Journal -www.searchenginejournal.com/200-parameters-in-google-algorithm/15457/.

Sergio S. Guirreri - www.guirreri.host22.com (Google Technology User Group (GTUG) of Palermo.)Markov Chains as methodology used by PageRank to rank the Web Pages on Internet.5th March 2010 14 / 14

top related