introduction to graphs

27
Introduction to Graphs 15-111 Advanced Programming 7/31/2009 1 Advanced Programming Concepts/Data Structures Ananda Gunawardena

Upload: others

Post on 08-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Introduction toGraphs

15-111

Advanced Programming

7/31/2009 1

Advanced Programming Concepts/Data Structures

Ananda Gunawardena

An Airline route Map

7/31/2009 2

7/31/2009 3

Introduction • Many real world problems can be modeled

using graphs

– Airline Route Map

• What is the fastest way to get from Pittsburgh to St Louis?

• What is the cheapest way to get from Pittsburgh to St Louis?

7/31/2009 4

– Electric Circuits

• Circuit elements - transistors, resistors, capacitors

• is everything connected together?

– Depends on interconnections (wires)

• If this circuit is built will it work?

– Depends on wires and objects they connect.

Graphs• More applications

– Job Scheduling

• Interconnections indicate which jobs to be performed before others

• When should each task be performed

• All these questions can be answered

7/31/2009 5

• All these questions can be answered

using a mathematical structure named a

“graph”. We will answer the questions

– what are graphs?

– what are their basic properties?

Graph Definitions• Graph

– A set of vertices(nodes) V = {v1, v2, …., vn}

– A set of edges(arcs) that connects the vertices E={e1, e2,

…, em}

– Each edge ei is a pair (v, w) where v, w in V

– |V| = number of vertices (cardinality)

– |E| = number of edges

• Graphs can be

7/31/2009 6

• Graphs can be

– directed (order (v,w) matters)

– Undirected (order of (v,w) doesn’t matter)

• Edges can be

– weighted (cost associated with the edge)

– eg: Neural Network, airline route map(vanguard airlines)

Graph Representation• How do we represent a graph internally?

• Two ways

– adjacency matrix

– Adjacency list

• Adjacency Matrix

7/31/2009 7

• Adjacency Matrix

– Use matrix entries to represent edges in the

graph

• Adjacency List

– Use an array of lists to represent edges in the

graph (we will discuss this later)

Adjacency Matrix• Adjacency Matrix

– For each edge (v,w) in E, set A[v][w] = edge_cost

– Non existent edges with logical infinity

• Cost of implementation

– O(|V|2) time for initialization

7/31/2009 8

– O(|V| ) time for initialization

– O(|V|2) space

• ok for dense graphs

• unacceptable for sparse graphs

Adjacency List• Adjacency List

– Ideal solution for sparse graphs

– For each vertex keep a list of all adjacent vertices

– Adjacent vertices are the vertices that are connected to the vertex

directly by an edge.

– Example

7/31/2009 9

List 0

List 1

List 2

1 2

2 0 1

1

Adjacency List

• The number of list nodes equals to number of edges

– O(|E|) space

• Space is also required to store the lists

– O(|V|) for |V| lists

• Note that the number of edges is at least round(|V|/2)

7/31/2009 10

• Note that the number of edges is at least round(|V|/2)

– assuming each vertex is in some edge

– Therefore disregard any O(|V|) term when O(|E|) is

present

• Adjacency list can be constructed in linear time (wrt to

edges)

Breadth First Traversal

• Algorithm

– Start from any node in the graph

– Traverse its neighbors (nodes that are directly

connected to it) using some heuristic

7/31/2009 11

connected to it) using some heuristic

– Next traverse the neighbors of the neighbors

etc.. Until some limit is reach or all the nodes

in the graph are visited

– Use a queue to perform the breadth first

traversal

Depth First Traversal

• Algorithm

– Start from any node in the graph

– Traverse deeper and deeper until dead end

– Back track and traverse other nodes that are

7/31/2009 12

– Back track and traverse other nodes that are

not visited

– Use a stack to perform the depth first traversal

Web as a Graph

URL 1

URL 2

URL 3

URL 4

7/31/2009 13

URL 7

URL 5

URL 6

URL 4

Web Algorithms

7/31/2009 14

Web Algorithms

Web Algorithms• Search

– Google, MSN, Altavista

• Image search– games

• Routing

7/31/2009 15

• Distributed Computing

• Shortest Path Algorithms– Google Maps, MapQuest

• Semantic Web– XML metadata

• Etc.

Web Search Engines

A Cool Application of Graphs

7/31/2009 16

A Cool Application of Graphs

Building a Search Engine

• Crawl the web

• Build a web index

• Then when we build/search, we may have to sort the index

7/31/2009 17

– Google sorts more than 100 billion index

items

• Novel algorithms, novel data structures, distributed

computing

A basic Search Engine Architecture

7/31/2009 18

Google Architecture

7/31/2009 19

Google’s server farm

7/31/2009 20

Web Crawlers

� Start with an initial page P0. Find URLs on P

0 and

add them to a queue

� When done with P0, pass it to an indexing

program, get a page P1from the queue and repeat

� Can be specialized (e.g. only look for email

7/31/2009 21

� Can be specialized (e.g. only look for email addresses)

� Issues

� Which page to look at next? (Special subjects, recency)

� How deep within a site do you go (depth search)?

� How frequently to visit pages?

So, why Spider the Web?

� Refresh Collection by deleting dead links

� OK if index is slightly smaller

� Done every 1-2 weeks in best engines

7/31/2009 22

� Done every 1-2 weeks in best engines

� Finding new sites

� Respider the entire web

� Done every 2-4 weeks in best engines

Cost of Spidering

� Spider can (and does) run in parallel on

hundreds of severs

� Very high network connectivity (e.g. T3 line)

7/31/2009 23

� Servers can migrate from spidering to query

processing depending on time-of-day load

� Running a full web spider takes days even with

hundreds of dedicated servers

Indexing

� Arrangement of data (data structure) to permit fast searching

� Which list is easier to search?

sow fox pig eel yak hen ant cat dog hog

ant cat dog eel fox hen hog pig sow yak

7/31/2009 24

ant cat dog eel fox hen hog pig sow yak

� Sorting helps. Why?

� Permits binary search. About log2n probes into list

� log2(1 billion) ~ 30

� Permits interpolation search. About log2(log

2n)

probes

� log2log

2(1 billion) ~ 5

Inverted Files

A file is a list of words by position

- First entry is the word in position 1 (first word)

- Entry 4562 is the word in position 4562 (4562nd word)

- Last entry is the last word

An inverted file is a list of positions by word!

POS

1

10

20

30

36

FILE

7/31/2009 25

a (1, 4, 40)

entry (11, 20, 31)

file (2, 38)

list (5, 41)

position (9, 16, 26)

positions (44)

word (14, 19, 24, 29, 35, 45)

words (7)

4562 (21, 27)

INVERTED FILE

Inverted Files for Multiple Documents

WORD NDOCS PTR

jezebel 20

jezer 3

jezerit 1

jeziah 1

34 6 1 118 2087 3922 3981 5002

44 3 215 2291 3010

56 4 5 22 134 992

DOCID OCCUR POS 1 POS 2 . . .

566 3 203 245 287

“jezebel” occurs6 times in document 34,3 times in document 44,4 times in document 56 . . .

LEXICON

7/31/2009 26

107 4 322 354 381 405

232 6 15 195 248 1897 1951 2192

677 1 481

713 3 42 312 802

jeziah 1

jeziel 1

jezliah 1

jezoar 1

jezrahliah 1

jezreel 39jezoar

566 3 203 245 287

67 1 132

. . .WORD

INDEX

Ranking (Scoring) Hits

� Hits must be presented in some order

� What order?

� Relevance, recency, popularity, reliability, alphabetic?

� Some ranking methods

7/31/2009 27

� Presence of keywords in title of document

� Closeness of keywords to start of document

� Frequency of keyword in document

� Link popularity (how many pages point to this one)