Transcript
Page 1: A history of PageRank from the numerical computing perspective

Numerical computing & Google’s PageRank

DAVID F. GLEICH, CS 197 PRESENTATION

Page 2: A history of PageRank from the numerical computing perspective

Hey Katie, do you have a date for Valentine’s Day?

It was 1234567890 in 2009.

Page 3: A history of PageRank from the numerical computing perspective

Thanks Internet!

http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html http://listsoplenty.com/pix/tag/cartoon

https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890-

on-valentines-day.html

Page 4: A history of PageRank from the numerical computing perspective

Thanks Internet!

http://school.discoveryeducation.com/clipart/clip/stk-fgr6.html http://listsoplenty.com/pix/tag/cartoon

https://www.facebook.com/ProgrammersJokes http://www.feld.com/wp/archives/2009/02/unix-time-1234567890-

on-valentines-day.html

Thanks Google

Page 5: A history of PageRank from the numerical computing perspective

How did Google get started?

Page 6: A history of PageRank from the numerical computing perspective

How did Google get started? … with an idea … … on the shoulders of giants!

Page 7: A history of PageRank from the numerical computing perspective

LEO KATZ

David F. Gleich (Purdue) Emory Math/CS Seminar 6 of 47

Page 8: A history of PageRank from the numerical computing perspective

Vannevar Bush “wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified” -- “As we may think” The Atlantic, July 1945

Page 9: A history of PageRank from the numerical computing perspective

Sir Tim Berners-Lee “We should work towards a universal linked information system … to allow a place for any information or reference one felt was important and a way of finding it afterwards.”

-- Founding proposal for “the mesh”, 1989

Page 10: A history of PageRank from the numerical computing perspective

… the mesh became the web … the web became a mess ... “finding it afterwards”? Hah!

Page 11: A history of PageRank from the numerical computing perspective

Larry Page "Sergey Brin •  Grad students at Stanford •  Worked with Terry Winograd

(artificial intelligence) •  Created a web-search

algorithm called “backrub” •  Spun-off a company “Googol” •  Worth about $20 billion each

Page 12: A history of PageRank from the numerical computing perspective

A cartoon websearch primer 1. Crawl webpages 2. Analyze webpage text (information retrieval) 3.  Analyze webpage links 4. Fit measures to human evaluations 5. Produce rankings 6. Continuously update

Page 13: A history of PageRank from the numerical computing perspective

SportsIllustrated.com BobsPortsIllustrated.com

Page 14: A history of PageRank from the numerical computing perspective

1

2

3

to

Gleich (Stanford) PageRank intro Ph.D. Defense 6 / 41

Page 15: A history of PageRank from the numerical computing perspective

What pages are important? Those that people visit a lot! How to we check? Create a model of how people visit the web.

Page 16: A history of PageRank from the numerical computing perspective

What pages are important? The Google random surfer •  Follows a random link with

probability alpha"“random clicks”

•  Goes anywhere with probability (1-alpha)"“random jumps”

Page 17: A history of PageRank from the numerical computing perspective

This is a Markov chain!

Page 18: A history of PageRank from the numerical computing perspective

Andrei Markov •  Studied sequences of random

variables. •  The probability that the random

variable takes a particular value only depends on it’s current value.

•  The “page id” is the “random variable” in the Markov chain!

Page 19: A history of PageRank from the numerical computing perspective

Oskar Perron"Georg Frobenius •  Simultaneously discovered

when a Markov chain has an “average”

•  The “average” of the web? It’s the probability of finding the random surfer at a page.

•  In 1907

Page 20: A history of PageRank from the numerical computing perspective

What pages are important? Perron and Frobenius proved the following algorithm always converges to a solution… set prob[i] = 0 for all pages set p to a random page for t = 1 to ... increment prob[p] if rand() < alpha, set p to a random neighbor of p else, set p to a random page

Page 21: A history of PageRank from the numerical computing perspective

Richard von Mises •  Created “the power method” •  An efficient algorithm to

“average” a Markov chain •  It updated the probabilities of

all pages at once. “Praktische Verfahren der Gleichungsauflösung”"R. von Mises and H. Pollaczek-Geiringer, 1929

Page 22: A history of PageRank from the numerical computing perspective

What pages are important? Using the von Mises method …

set prob[i] = 1/n for all pages for t = 1 to about 80 set newprob[i] = 0 for all pages for all links from page i to page j set newprob[j] += prob[i]/deg[i] for all pages I set prob[i] = alpha*newprob[i] + (1-alpha)/n

Page 23: A history of PageRank from the numerical computing perspective

That algorithm underlying Google’s analysis of the web is from 1929!

Page 24: A history of PageRank from the numerical computing perspective

Leo Katz

Page 25: A history of PageRank from the numerical computing perspective

Leo Katz

That’s not quite right Wikipedia!

Page 26: A history of PageRank from the numerical computing perspective

A new status index (1953)"Leo Katz A paper about how information spreads in groups … “For example, the information that the new high-school principal is unmarried and handsome might occasion a violent reaction in a ladies' garden club and hardly a ripple of interest in a luncheon group of the local chamber of commerce. On the other hand, the luncheon group might be anything but apathetic in its response to information concerning a fractional change in credit buying restrictions announced by the federal government.”

Page 27: A history of PageRank from the numerical computing perspective

… there were many other shoulders too …

Page 28: A history of PageRank from the numerical computing perspective

Gene Golub Popularized numerical computing with matrices via the informal “Golub thesis” “anything worth computing can be stated as a matrix problem”

William Kahan

Formalized IEEE-754 floating point arithmetic.

Make it possible to compute with probabilities as “real numbers” instead of discrete counts.

Page 29: A history of PageRank from the numerical computing perspective

Credits

Most pictures taken from Google image search. Original idea from Massimo Franceschet. “PageRank: Standing on the shoulders of giants”


Top Related