applications of data structures and...

Post on 26-Jun-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Applications of Data Structures and Algorithms

Danfeng YaoCS 16

3/13/2006

2

Overview

Data structures in web interfaceGoogle – Indexing– PageRanking– Crawling

3

Forward/backward buttons of Browser

4

Forward/backward buttons and Stack

www.brown.edu

Back Forward

5

Forward/backward buttons and Stack

www.dam.brown.edu

www.cs.brown.edu

www.brown.edu

www.engin.brown.edu

Back Forward

Old pages

6

Then click the Back button once

www.cs.brown.edu

www.engin.brown.edu

www.brown.edu

www.dam.brown.edu

Back Forward

7

Then click the Forward button once

www.dam.brown.edu

www.cs.brown.edu

www.engin.brown.edu

www.brown.edu

ForwardBack

8

Planar point location: where is the mouse click?

9

Simple point location: Binary decision tree

Build a binary tree– internal nodes corresponding to line segments – external nodes corresponding to regions

a

b

c

a

b

d

below

c

d

below

left

above

above

left

right

right

10

The need for search engines to scale up What a search engine faces– Storage for index files (and maybe

documents themselves too)– Index system processes hundreds of

gigabytes– Queries at a rate of thousands per

secondAdvances in hardware technology– Faster CPU– Cheaper memory and disk space

But still, slow disk seek (~10ms) and operating system instabilityWhat will make today’s search engine scale?

11

Google facts

Will be on the stock market soon – Estimated annual profit $150 million to

$350 million– Estimated annual revenue $500 million to

$1 billion– Estimated market value $12 billion to $20

billion

The heart of Google software is PageRankTM

Google has integrity– No one can buy a higher PageRank

Sergey Brin

Larry Page

12

Data structures in Google– Compact data structures – Avoid disk seeks whenever possible

Data structures for indexing– Link structures– Inverted index

Page rankingCrawling

Google’s approach

13

Web as a directed graph

(Brown, Brown CS)

(Brown CS, CS16)

(Brown CS, rt)

A hyperlink is an edge

A web page is a vertex

14

Storing the link structures

Want to maintain the link relationship of two pages– Used for crawling, ranking, …

Main problem: how to store the set of pairs efficientlyURL is too long and has variable length– Storing (URLi, URLj) has too

much overhead and is slowUse a more compact docID and support fast docID/URLconversion docID URL

http://www.cs.brown.edu/people/rt/

15

Forward index and inverted index

vitae securityinformationdesignalgorithm

algorithm

vitae graphdrawingdesignalgorithm

Forward

Inverted

graph

Hit 3: algorithmHit 2: algorithmHit 1: algorithm

16

PageRanking: bringing order to the web

17

GoogleBot: where to crawl?Through addURL forms or linksDeep crawling– BFS or DFS?– Rumored to crawl in

PageRank orderFresh crawling– Recrawling to keep index

updated

18

More Google facts

Free lunch every day!Bring pets to workOn-site massage

19

Bibliography

The Anatomy of a Large-Scale Hypertextual Web Search Engine. Sergey Brin and Lawrence PageThe PageRank Citation Ranking: Bringing Order to the Web. Lawrence Page

top related