![Page 1: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/1.jpg)
Numerical computing & Google’s PageRank
DAVID F. GLEICH, CS 197 PRESENTATION
![Page 2: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/2.jpg)
Hey Katie, do you have a date for Valentine’s Day?
It was 1234567890 in 2009.
![Page 3: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/3.jpg)
Thanks Internet!
http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html http://listsoplenty.com/pix/tag/cartoon
https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890-
on-valentines-day.html
![Page 4: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/4.jpg)
Thanks Internet!
http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html http://listsoplenty.com/pix/tag/cartoon
https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890-
on-valentines-day.html
Thanks Google
![Page 5: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/5.jpg)
How did Google get started?
![Page 6: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/6.jpg)
How did Google get started? … with an idea … … on the shoulders of giants!
![Page 7: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/7.jpg)
LEO KATZ
David F. Gleich (Purdue) Emory Math/CS Seminar 6 of 47
![Page 8: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/8.jpg)
Vannevar Bush “wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified” -- “As we may think” The Atlantic, July 1945
![Page 9: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/9.jpg)
Sir Tim Berners-Lee “We should work towards a universal linked information system … to allow a place for any information or reference one felt was important and a way of finding it afterwards.”
-- Founding proposal for “the mesh”, 1989
![Page 10: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/10.jpg)
… the mesh became the web … the web became a mess ... “finding it afterwards”? Hah!
![Page 11: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/11.jpg)
Larry Page "Sergey Brin • Grad students at Stanford • Worked with Terry Winograd
(artificial intelligence) • Created a web-search
algorithm called “backrub” • Spun-off a company “Googol” • Worth about $20 billion each
![Page 12: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/12.jpg)
A cartoon websearch primer 1. Crawl webpages 2. Analyze webpage text (information retrieval) 3. Analyze webpage links 4. Fit measures to human evaluations 5. Produce rankings 6. Continuously update
![Page 13: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/13.jpg)
SportsIllustrated.com BobsPortsIllustrated.com
![Page 14: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/14.jpg)
1
2
3
to
Gleich (Stanford) PageRank intro Ph.D. Defense 6 / 41
![Page 15: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/15.jpg)
What pages are important? Those that people visit a lot! How to we check? Create a model of how people visit the web.
![Page 16: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/16.jpg)
What pages are important? The Google random surfer • Follows a random link with
probability alpha"“random clicks”
• Goes anywhere with probability (1-alpha)"“random jumps”
![Page 17: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/17.jpg)
This is a Markov chain!
![Page 18: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/18.jpg)
Andrei Markov • Studied sequences of random
variables. • The probability that the random
variable takes a particular value only depends on it’s current value.
• The “page id” is the “random variable” in the Markov chain!
![Page 19: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/19.jpg)
Oskar Perron"Georg Frobenius • Simultaneously discovered
when a Markov chain has an “average”
• The “average” of the web? It’s the probability of finding the random surfer at a page.
• In 1907
![Page 20: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/20.jpg)
What pages are important? Perron and Frobenius proved the following algorithm always converges to a solution… set prob[i] = 0 for all pages set p to a random page for t = 1 to ... increment prob[p] if rand() < alpha, set p to a random neighbor of p else, set p to a random page
![Page 21: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/21.jpg)
Richard von Mises • Created “the power method” • An efficient algorithm to
“average” a Markov chain • It updated the probabilities of
all pages at once. “Praktische Verfahren der Gleichungsauflösung”"R. von Mises and H. Pollaczek-Geiringer, 1929
![Page 22: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/22.jpg)
What pages are important? Using the von Mises method …
set prob[i] = 1/n for all pages for t = 1 to about 80 set newprob[i] = 0 for all pages for all links from page i to page j set newprob[j] += prob[i]/deg[i] for all pages I set prob[i] = alpha*newprob[i] + (1-alpha)/n
![Page 23: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/23.jpg)
That algorithm underlying Google’s analysis of the web is from 1929!
![Page 24: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/24.jpg)
Leo Katz
![Page 25: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/25.jpg)
Leo Katz
That’s not quite right Wikipedia!
![Page 26: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/26.jpg)
A new status index (1953)"Leo Katz A paper about how information spreads in groups … “For example, the information that the new high-school principal is unmarried and handsome might occasion a violent reaction in a ladies' garden club and hardly a ripple of interest in a luncheon group of the local chamber of commerce. On the other hand, the luncheon group might be anything but apathetic in its response to information concerning a fractional change in credit buying restrictions announced by the federal government.”
![Page 27: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/27.jpg)
… there were many other shoulders too …
![Page 28: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/28.jpg)
Gene Golub Popularized numerical computing with matrices via the informal “Golub thesis” “anything worth computing can be stated as a matrix problem”
William Kahan
Formalized IEEE-754 floating point arithmetic.
Make it possible to compute with probabilities as “real numbers” instead of discrete counts.
![Page 29: A history of PageRank from the numerical computing perspective](https://reader034.vdocument.in/reader034/viewer/2022052522/54b721784a795903798b47a8/html5/thumbnails/29.jpg)
Credits
Most pictures taken from Google image search. Original idea from Massimo Franceschet. “PageRank: Standing on the shoulders of giants”