stochastic approach for link structure analysis (salsa) presented by adam simkins

24
Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Upload: stanley-wilkins

Post on 24-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Stochastic Approach for Link Structure Analysis

(SALSA)

Presented by Adam Simkins

Page 2: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

SALSA

• Created by Lempel Moran in 2000

• Combination of HITS and PageRank

Page 3: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

SALSA’s similarities to HITS and PageRank

• SALSA uses authority and hub score

• SALSA creates a neighborhood graph using authority and hub pages and links

Page 4: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

SALSA’s differences between HITS and PageRank

• The SALSA method create a bipartite graph of the authority and hub pages in the neighborhood graph.

• One set contains hub pages

• One set contains authority pages

• Each page may be located in both sets

Page 5: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Neighborhood Graph G

Page 6: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Bipartite Graph G of Neighborhood Graph N

Page 7: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Markov Chains

• Two matrices formed from bipartite graph G

• A hub Markov chain with matrix H

• An authority Markov chain with matrix A

Page 8: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Where does SALSA fit in?

• Matrices H and A can be derived from the adjacency matrix L used in the HITS and PageRank methods

• HITS used unweighted matrix L

• PageRank uses a row weighted version of matrix L

• SALSA uses both row and column weighting

Page 9: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

How are H and A computed?

• Let Lr be L with each nonzero row divided by its row sum

• let Lc be L with each nonzero column divided by its column sum

Page 10: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins
Page 11: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

• H, SALSA’s hub matrix, consists of the nonzero rows and columns of LrLc

T

• A, SALSA’s authority matrix, consists of the nonzero rows and columns of Lc

TLr

Page 12: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins
Page 13: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins
Page 14: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Eigenvectors

• Av = λv

• vTA = λ vT

• Numerically: Power Method

Page 15: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

The Power Method

• Xk+1 = AXk

• Xk+1T = Xk

TA

• Converges to the dominant eigenvector

( λ = 1).

Page 16: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

The Power Method• Matrices H and A must be irreducible for

the power method to converge to a unique eigenvector given any starting value

• If our neighborhood graph G is connected, then both H and A are irreducible

• If G is not connected, then performing the power method on H and A will not result in the convergence to a unique dominant eigenvector

Page 17: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Our Graph is not connected!

• In our example it is clear to see that the graph is not connected as page 2 in the hub set is only connected to page 1 in the authority set and vice versa.

• H and A are reducible and therefore contain multiple irreducible connected components

Page 18: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins
Page 19: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Connected Components

• H contains two connected components, C = {2} and D = {1, 3, 6, 10}

• A contains two connected components, E = {1} and F = {3, 5, 6}

Page 20: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Cutting and Pasting. Part I

• We can now perform the power method on each component for H and A

Page 21: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Cutting and Pasting. Part II

• We can now paste the two components together for each matrix

• We must multiply each entry in the vector by its appropriate weight

Page 22: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

H:

A:

Page 23: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Strengths and Weaknesses

• Not affected as much my topic drift like HITS

• It gives authority and hub scores.

• Handles spamming better than HITS, but not near as good as PageRank

• query dependence

Page 24: Stochastic Approach for Link Structure Analysis (SALSA) Presented by Adam Simkins

Thank You For Your Time!