building mini google in ruby
DESCRIPTION
A look at the math and implementation behind PageRank and how to apply it within a context of a Ruby / Rails application (for fun and profit!)TRANSCRIPT
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Building Mini-Google in Ruby
Ilya Grigorik@igrigorik
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
postrank.com/topic/ruby
The slides… Twitter My blog
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Ruby + MathOptimization
PageRank
IndexingExamplesMisc Fun
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank PageRank + Ruby
IndexingExamplesTools
+ Optimization
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Consume with care…everything that follows is based on released / public domain info
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Search-engine graveyardGoogle did pretty well…
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Search pipeline50,000-foot view
Query: Ruby
Results
1. Crawl 2. Index 3. Rank
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Query: Ruby
Results
1. Crawl 2. Index 3. Rank
Bah FunInteresting
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
circa 1997-1998
CPU Speed 333MhzRAM 32-64MB
Index 27,000,000 documentsIndex refresh once a month~ishPageRank computation several days
Laptop CPU 2.1GhzVM RAM 1GB1-Million page web ~10 minutes
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Creating & Maintaining an Inverted Index DIY and the gotchas within
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Building an Inverted Index
require 'set' pages = { "1" => "it is what it is", "2" => "what is it", "3" => "it is a banana"} index = {} pages.each do |page, content| content.split(/\s/).each do |word| if index[word] index[word] << page else index[word] = Set.new(page) end endend
{ "it"=>#<Set: {"1", "2", "3"}>, "a"=>#<Set: {"3"}>, "banana"=>#<Set: {"3"}>, "what"=>#<Set: {"1", "2"}>, "is"=>#<Set: {"1", "2", "3"}>} }
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Building an Inverted Index
require 'set' pages = { "1" => "it is what it is", "2" => "what is it", "3" => "it is a banana"} index = {} pages.each do |page, content| content.split(/\s/).each do |word| if index[word] index[word] << page else index[word] = Set.new(page) end endend
{ "it"=>#<Set: {"1", "2", "3"}>, "a"=>#<Set: {"3"}>, "banana"=>#<Set: {"3"}>, "what"=>#<Set: {"1", "2"}>, "is"=>#<Set: {"1", "2", "3"}>} }
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Building an Inverted Index
require 'set' pages = { "1" => "it is what it is", "2" => "what is it", "3" => "it is a banana"} index = {} pages.each do |page, content| content.split(/\s/).each do |word| if index[word] index[word] << page else index[word] = Set.new(page) end endend
{ "it"=>#<Set: {"1", "2", "3"}>, "a"=>#<Set: {"3"}>, "banana"=>#<Set: {"3"}>, "what"=>#<Set: {"1", "2"}>, "is"=>#<Set: {"1", "2", "3"}>} }
Word => [Document]
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Querying the index
# query: "what is banana"p index["what"] & index["is"] & index["banana"]# > #<Set: {}>
# query: "a banana"p index["a"] & index["banana"]# > #<Set: {"3"}>
# query: "what is"p index["what"] & index["is"]# > #<Set: {"1", "2"}>
{ "it"=>#<Set: {"1", "2", "3"}>, "a"=>#<Set: {"3"}>, "banana"=>#<Set: {"3"}>, "what"=>#<Set: {"1", "2"}>, "is"=>#<Set: {"1", "2", "3"}>} }
1 32
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Querying the index
# query: "what is banana"p index["what"] & index["is"] & index["banana"]# > #<Set: {}>
# query: "a banana"p index["a"] & index["banana"]# > #<Set: {"3"}>
# query: "what is"p index["what"] & index["is"]# > #<Set: {"1", "2"}>
{ "it"=>#<Set: {"1", "2", "3"}>, "a"=>#<Set: {"3"}>, "banana"=>#<Set: {"3"}>, "what"=>#<Set: {"1", "2"}>, "is"=>#<Set: {"1", "2", "3"}>} }
1 32
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Querying the index
# query: "what is banana"p index["what"] & index["is"] & index["banana"]# > #<Set: {}>
# query: "a banana"p index["a"] & index["banana"]# > #<Set: {"3"}>
# query: "what is"p index["what"] & index["is"]# > #<Set: {"1", "2"}>
{ "it"=>#<Set: {"1", "2", "3"}>, "a"=>#<Set: {"3"}>, "banana"=>#<Set: {"3"}>, "what"=>#<Set: {"1", "2"}>, "is"=>#<Set: {"1", "2", "3"}>} }
1 32
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Querying the index
# query: "what is banana"p index["what"] & index["is"] & index["banana"]# > #<Set: {}>
# query: "a banana"p index["a"] & index["banana"]# > #<Set: {"3"}>
# query: "what is"p index["what"] & index["is"]# > #<Set: {"1", "2"}>
{ "it"=>#<Set: {"1", "2", "3"}>, "a"=>#<Set: {"3"}>, "banana"=>#<Set: {"3"}>, "what"=>#<Set: {"1", "2"}>, "is"=>#<Set: {"1", "2", "3"}>} }
What order?
[1, 2] or [2,1]
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Building an Inverted Index
require 'set' pages = { "1" => "it is what it is", "2" => "what is it", "3" => "it is a banana"} index = {} pages.each do |page, content| content.split(/\s/).each do |word| if index[word] index[word] << page else index[word] = Set.new(page) end endend
Hmmm?
PDF, HTML, RSS?Lowercase / Upcase?
Compact Index?Stop words?Persistence?
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Ferret is a high-performance, full-featured text search engine library written for Ruby
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
require 'ferret'include Ferret index = Index::Index.new() index << {:title => "1", :content => "it is what it is"}index << {:title => "2", :content => "what is it"}index << {:title => "3", :content => "it is a banana"} index.search_each('content:"banana"') do |id, score| puts "Score: #{score}, #{index[id][:title]} "end
> Score: 1.0, 3
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
require 'ferret'include Ferret index = Index::Index.new() index << {:title => "1", :content => "it is what it is"}index << {:title => "2", :content => "what is it"}index << {:title => "3", :content => "it is a banana"} index.search_each('content:"banana"') do |id, score| puts "Score: #{score}, #{index[id][:title]} "end
> Score: 1.0, 3
Hmmm?
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
class Ferret::Analysis::Analyzerclass Ferret::Analysis::AsciiLetterAnalyzerclass Ferret::Analysis::AsciiLetterTokenizerclass Ferret::Analysis::AsciiLowerCaseFilterclass Ferret::Analysis::AsciiStandardAnalyzerclass Ferret::Analysis::AsciiStandardTokenizerclass Ferret::Analysis::AsciiWhiteSpaceAnalyzerclass Ferret::Analysis::AsciiWhiteSpaceTokenizerclass Ferret::Analysis::HyphenFilterclass Ferret::Analysis::LetterAnalyzerclass Ferret::Analysis::LetterTokenizerclass Ferret::Analysis::LowerCaseFilterclass Ferret::Analysis::MappingFilterclass Ferret::Analysis::PerFieldAnalyzerclass Ferret::Analysis::RegExpAnalyzerclass Ferret::Analysis::RegExpTokenizerclass Ferret::Analysis::StandardAnalyzerclass Ferret::Analysis::StandardTokenizerclass Ferret::Analysis::StemFilterclass Ferret::Analysis::StopFilterclass Ferret::Analysis::Tokenclass Ferret::Analysis::TokenStreamclass Ferret::Analysis::WhiteSpaceAnalyzerclass Ferret::Analysis::WhiteSpaceTokenizer
class Ferret::Search::BooleanQueryclass Ferret::Search::ConstantScoreQueryclass Ferret::Search::Explanationclass Ferret::Search::Filterclass Ferret::Search::FilteredQueryclass Ferret::Search::FuzzyQueryclass Ferret::Search::Hitclass Ferret::Search::MatchAllQueryclass Ferret::Search::MultiSearcherclass Ferret::Search::MultiTermQueryclass Ferret::Search::PhraseQueryclass Ferret::Search::PrefixQueryclass Ferret::Search::Queryclass Ferret::Search::QueryFilterclass Ferret::Search::RangeFilterclass Ferret::Search::RangeQueryclass Ferret::Search::Searcherclass Ferret::Search::Sortclass Ferret::Search::SortFieldclass Ferret::Search::TermQueryclass Ferret::Search::TopDocsclass Ferret::Search::TypedRangeFilterclass Ferret::Search::TypedRangeQueryclass Ferret::Search::WildcardQuery
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
ferret.davebalmain.com/trac
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Ranking Results0-60 with PageRank…
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Naïve: Term Frequency
index.search_each('content:"the brown cow"') do |id, score| puts "Score: #{score}, #{index[id][:title]} "end
> Score: 0.827, 3> Score: 0.523, 5> Score: 0.125, 4
Relevance?
3 5 4the 4 3 5
brown 1 3 1
cow 1 4 1
Score 6 10 7
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Naïve: Term Frequency
index.search_each('content:"the brown cow"') do |id, score| puts "Score: #{score}, #{index[id][:title]} "end
> Score: 0.827, 3> Score: 0.523, 5> Score: 0.125, 4
Skew
3 5 4the 4 3 5
brown 1 3 1
cow 1 4 1
Score 6 10 7
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
TF-IDFTerm Frequency * Inverse Document Frequency
Skew
3 5 4the 4 3 5
brown 1 3 1
cow 1 4 1
Total # of documents: 10
# of docsthe 6
brown 3
cow 4
Score = TF * IDF
TF = # occurrences / # wordsIDF = # docs / # docs with W
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
TF-IDFScore = 0.204 + 0.120 + 0.092 = 0.416
# of docsthe 6
brown 3
cow 4
3 5 4the 4 3 5
brown 1 3 1
cow 1 4 1
Total # of documents: 10# words in document: 10
Doc # 3 score for ‘the’:4/10 * ln(10/6) = 0.204
Doc # 3 score for ‘brown’:1/10 * ln(10/3) = 0.120
Doc # 3 score for ‘cow’:1/10 * ln(10/4) = 0.092
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Frequency Matrix
W1 W2 … … … … … … WN
Doc 1 15 23 …
Doc 2 24 12 …
… … … …
…
Doc K
Size = N * K * size of Ruby objectOuch.
Pages = N = 10,000Words = K = 2,000Ruby Object = 20+ bytes
Footprint = 384 MB
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
NArrayhttp://narray.rubyforge.org/
NArray is an Numerical N-dimensional Array class (implemented in C)
NArray.new(typecode, size, ...) NArray.byte(size,...) NArray.sint(size,...) NArray.int(size,...)
NArray.sfloat(size,...) NArray.float(size,...) NArray.scomplex(size,...) NArray.complex(size,...) NArray.object(size,...)
# create new NArray. initialize with 0.# 1 byte unsigned integer# 2 byte signed integer# 4 byte signed integer# single precision float# double precision float# single precision complex# double precision complex# Ruby object
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
NArrayhttp://narray.rubyforge.org/
NArray is an Numerical N-dimensional Array class (implemented in C)
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRankthe google juice
Links as votes
Problem: link gaming
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Random Surferpowerful abstraction
Follow link from page he/she is currently on.
Teleport to a random location on the web.
P = 0.85
P = 0.15
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Surfin’rinse & repeat, ad naseum
Follow link from page he/she is currently on.
Teleport to a random location on the web.
Page K
Page N Page M
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Surfin’rinse & repeat, ad naseum
On Page P, clicks on link to K
P = 0.15
P = 0.85
On Page K clicks on link to M
On Page M teleports to X
…
P = 0.85
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Analyzing the Web Graphextracting PageRank
P = 0.6
N
MK
X
P = 0.15
P = 0.20P = 0.05
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
What is PageRank?It’s a scalar!
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
What is PageRank?it’s a probability!
P = 0.6
N
MK
X
P = 0.15
P = 0.20P = 0.05
P = 0.6
P = 0.15
P = 0.20P = 0.05
P = 0.6
P = 0.15
P = 0.20P = 0.05
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
What is PageRank?it’s a probability!
P = 0.6
N
MK
X
P = 0.15
P = 0.20P = 0.05
P = 0.6
P = 0.15
P = 0.20P = 0.05
Higher Pr, Higher Importance?
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Teleportation?sci-fi fans, … ?
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Reasons for teleportationenumerating edge cases
N
M
K
X
1. No in-links!
M
2. No out-links!
3. Isolated Web
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Exploring Graphsgratr.rubyforge.com
•Breadth First Search•Depth First Search•A* Search •Lexicographic Search •Dijkstra’s Algorithm •Floyd-Warshall •Triangulation and Comparability detection
require 'gratr/import'
dg = Digraph[1,2, 2,3, 2,4, 4,5, 6,4, 1,6]
dg.directed? # truedg.vertex?(4) # truedg.edge?(2,4) # truedg.vertices # [5, 6, 1, 2, 3, 4]
Graph[1,2,1,3,1,4,2,5].bfs # [1, 2, 3, 4, 5]Graph[1,2,1,3,1,4,2,5].dfs # [1, 2, 5, 3, 4]
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Teleportationprobabilities
N
M
K
X
M
P(T) = 0.03
P(T) = 0.03
P(T) = 0.03
P(T) = 0.03
P(T) = 0.03
P(T) = 0.15 / # of pagesP(T) = 0.03
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank: Simplified Mathematical Def’ncause that’s how we roll
𝐿= 𝑇= ൮0.15 𝑁ൗ�⋮0.15 𝑁ൗ�൲
Assume the web is N pages bigAssume that probability of teleportation (t) is 0.15, and following link (s) is 0.85Assume that teleportation probability (E) is uniformAssume that you start on any random page (uniform distribution L), then
Then after one step, the probability your on page X is:𝐿∗ ሺ𝑠𝐺+ 𝑡𝐸ሻ 𝐿∗(0.85∗𝐺+ 0.15∗𝐸)
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
G = The Link Graphginormous and sparse
1 2 … … N1 1 0 … … 0
2 0 1 … … 1
… … … … … …
… … … … … …
N 0 1 … … 1
Link Graph No link from 1 to N
Huge!
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
G as a dictionarymore compact…
{ "1" => [25, 26], "2" => [1], "5" => [123,2], "6" => [67, 1]}
Page
Links to…
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Computing PageRankthe tedious way
Follow link from page he/she is currently on.
Teleport to a random location on the web.
Page K
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Computing PageRankin one swoop
Identity matrix
Don’t trust me! Verify it yourself!
𝑞 = 𝑡 ሺ𝐼− 𝑠𝐺ሻ−1𝐸= ൭𝑃1⋮𝑃𝑛൱
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Enough hand-waving, dammit!show me the code
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Birth of EM-Proxyflash of the obvious
Hot, Fast, Awesome
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Hot, Fast, Awesome
http://rb-gsl.rubyforge.org/
Click there! … Give yourself a weekend.
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Click there! … Give yourself a weekend. http://ruby-gsl.sourceforge.net/
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank in Ruby6 lines, or less
require "gsl"include GSL # INPUT: link structure matrix (NxN)# OUTPUT: pagerank scoresdef pagerank(g) raise if g.size1 != g.size2 i = Matrix.I(g.size1) # identity matrix p = (1.0/g.size1) * Matrix.ones(g.size1,1) # teleportation vector s = 0.85 # probability of following a link t = 1-s # probability of teleportation t*((i-s*g).invert)*pend
Verify NxN
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank in Ruby6 lines, or less
require "gsl"include GSL # INPUT: link structure matrix (NxN)# OUTPUT: pagerank scoresdef pagerank(g) raise if g.size1 != g.size2 i = Matrix.I(g.size1) # identity matrix p = (1.0/g.size1) * Matrix.ones(g.size1,1) # teleportation vector s = 0.85 # probability of following a link t = 1-s # probability of teleportation t*((i-s*g).invert)*pend
Constants…
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank in Ruby6 lines, or less
require "gsl"include GSL # INPUT: link structure matrix (NxN)# OUTPUT: pagerank scoresdef pagerank(g) raise if g.size1 != g.size2 i = Matrix.I(g.size1) # identity matrix p = (1.0/g.size1) * Matrix.ones(g.size1,1) # teleportation vector s = 0.85 # probability of following a link t = 1-s # probability of teleportation t*((i-s*g).invert)*pend
PageRank!
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Ex: Circular Webtesting intuition…
N
K
X P = 0.33
pagerank(Matrix[[0,0,1], [0,0,1], [1,0,0]])> [0.33, 0.33, 0.33]
P = 0.33
P = 0.33
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Ex: All roads lead to Ktesting intuition…
N
K
X P = 0.07
pagerank(Matrix[[0,0,0], [0.5,0,0], [0.5,1,1]])> [0.05, 0.07, 0.87]
P = 0.87
P = 0.05
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank + Ferretawesome search, ftw!
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
require 'ferret'include Ferret index = Index::Index.new() index << {:title => "1", :content => "it is what it is", :pr => 0.05 }index << {:title => "2", :content => "what is it", :pr => 0.07 }index << {:title => "3", :content => "it is a banana", :pr => 0.87 }
1
3
2 P = 0.07
P = 0.87
P = 0.05
Store PageRank
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
index.search_each('content:"world"') do |id, score| puts "Score: #{score}, #{index[id][:title]} (PR: #{index[id][:pr]})"end
puts "*" * 50
sf_pr = Search::SortField.new(:pr, :type => :float, :reverse => true)
index.search_each('content:"world"', :sort => sf_pr) do |id, score| puts "Score: #{score}, #{index[id][:title]}, (PR: #{index[id][:pr]})"end
# Score: 0.267119228839874, 3 (PR: 0.87)# Score: 0.17807948589325, 1 (PR: 0.05)# Score: 0.17807948589325, 2 (PR: 0.07)# ***********************************# Score: 0.267119228839874, 3, (PR: 0.87)# Score: 0.17807948589325, 2, (PR: 0.07)# Score: 0.17807948589325, 1, (PR: 0.05)
TF-IDF Search
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
index.search_each('content:"world"') do |id, score| puts "Score: #{score}, #{index[id][:title]} (PR: #{index[id][:pr]})"end
puts "*" * 50
sf_pr = Search::SortField.new(:pr, :type => :float, :reverse => true)
index.search_each('content:"world"', :sort => sf_pr) do |id, score| puts "Score: #{score}, #{index[id][:title]}, (PR: #{index[id][:pr]})"end
# Score: 0.267119228839874, 3 (PR: 0.87)# Score: 0.17807948589325, 1 (PR: 0.05)# Score: 0.17807948589325, 2 (PR: 0.07)# ***********************************# Score: 0.267119228839874, 3, (PR: 0.87)# Score: 0.17807948589325, 2, (PR: 0.07)# Score: 0.17807948589325, 1, (PR: 0.05)
PageRank FTW!
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
index.search_each('content:"world"') do |id, score| puts "Score: #{score}, #{index[id][:title]} (PR: #{index[id][:pr]})"end
puts "*" * 50
sf_pr = Search::SortField.new(:pr, :type => :float, :reverse => true)
index.search_each('content:"world"', :sort => sf_pr) do |id, score| puts "Score: #{score}, #{index[id][:title]}, (PR: #{index[id][:pr]})"end
# Score: 0.267119228839874, 3 (PR: 0.87)# Score: 0.17807948589325, 1 (PR: 0.05)# Score: 0.17807948589325, 2 (PR: 0.07)# ***********************************# Score: 0.267119228839874, 3, (PR: 0.87)# Score: 0.17807948589325, 2, (PR: 0.07)# Score: 0.17807948589325, 1, (PR: 0.05)
Others
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Search*: Graphs are ubiquitous!PageRank is a general purpose hammer
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank + Social GraphGitHub
Username GitCred==============================37signals 10.00imbriaco 9.76why 8.74rails 8.56defunkt 8.17technoweenie 7.83jeresig 7.60mojombo 7.51yui 7.34drnic 7.34pjhyett 6.91wycats 6.85dhh 6.84
http://bit.ly/3YQPU
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank + Social GraphTwitter
Hmm…
Analyze the social graph:- Filter messages by ‘TwitterRank’- Suggest users by ‘TwitterRank’- …
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank + Product GraphE-commerce
Link items purchased in same cart… Run PR on it.
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank = Powerful Hammeruse it!
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Personalizationhow would you do it?
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
PageRank + Personalizationcustomize the teleportation vector
𝑇= ൮0.15 𝑁ൗ�⋮0.15 𝑁ൗ�൲ Teleportation distribution doesn’t
have to be uniform!
yahoo.com is my homepage!
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Gaming PageRankfor fun and profit (I don’t endorse it)
Make pages with links!
http://bit.ly/pagerank-spam
Building Mini-Google in Ruby @igrigorik #railsconfhttp://bit.ly/railsconf-pagerank
Questions?
The slides… Twitter My blog
Slides: http://bit.ly/railsconf-pagerank
Ferret: http://bit.ly/ferretRB-GSL: http://bit.ly/rb-gsl
PageRank on Wikipedia: http://bit.ly/wp-pagerankGaming PageRank: http://bit.ly/pagerank-spam
Michael Nielsen’s lectures on PageRank:http://michaelnielsen.org/blog