9 algorithms: pagerank
DESCRIPTION
9 Algorithms: PageRank. Ranking. After matching, have to rank:. Index Based Ranking. Strategies we could (do) use: Frequency Position Metadata. Missing Ingredient. Index lacks intra-page information. Link Quality. More links is easy to abuse. Spam Link Pages. Link Quality. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/1.jpg)
9 Algorithms:PageRank
![Page 2: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/2.jpg)
Ranking
• After matching, have to rank:
![Page 3: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/3.jpg)
Index Based Ranking
• Strategies we could (do) use:– Frequency– Position– Metadata
![Page 4: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/4.jpg)
Missing Ingredient
• Index lacks intra-page information
![Page 5: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/5.jpg)
Link Quality
• Not all links are equal• Who do you trust?– CS Prof– World Famous Chef
![Page 6: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/6.jpg)
Identifying Authority
• Links into a page give it authority• Page value = sum of authorities of pages
linking to it
![Page 7: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/7.jpg)
Link Quality
• More links is easy to abuse Spam Link Pages
![Page 8: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/8.jpg)
Issues
• Spam Links– Discourage with negative weight
Spam Link Pages
-1
-1
-1
-1
-1
-1
![Page 9: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/9.jpg)
Issues
• Cycles:
![Page 10: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/10.jpg)
Issues
• Cycles:
![Page 11: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/11.jpg)
Issues
• Cycles:
…
![Page 12: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/12.jpg)
Random Surfer
• Simulating a web surfing session– Start at random page– At each page have a chance to
• Pick a random link to go to• Jump to a completely random page
![Page 13: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/13.jpg)
Results
• Results of many random sessions:
![Page 14: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/14.jpg)
Results
• Expressed as percentages, results stabilize– Law of large numbers
![Page 15: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/15.jpg)
Cycle Buster
• Random surfer not phased by cycles:
![Page 16: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/16.jpg)
Random Surfer In Use
• The recipe pages visited by random surfers:
![Page 17: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/17.jpg)
Simulator
• PageRank Simulator:http://caccio.blogdns.net/software/pagerank-simulator
![Page 18: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/18.jpg)
The Real Math
• Markov Chains– Set of states– Each state has probability of leading to other
states– Represent as matrix
![Page 19: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/19.jpg)
Excel Simulation
• Three pages:
![Page 20: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/20.jpg)
Limitations
• Still have issues/room for growth– Link Spam– Context of link• Where link is on page• "Bob's recipe is terrible" vs "Bob's recipe is great"
– Lack of semantic knowledge• Page's Authority should not be the same for all domains
![Page 21: 9 Algorithms: PageRank](https://reader035.vdocument.in/reader035/viewer/2022062218/568165a8550346895dd88fc4/html5/thumbnails/21.jpg)
Power
• Controlling search is power:http://www.bitsbook.com/
"If you're not paying for the product, you are the product."