ch19

8
Chapter 19 Web Crawler

Upload: leminhvuong

Post on 15-Nov-2014

778 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Ch19

Chapter 19

Web Crawler

Page 2: Ch19

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-2

Chapter Objectives

• Provide a case study example from problem statement through implementation

• Demonstrate how hash tables and graphs can be used to solve a problem

Page 3: Ch19

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-3

Web Crawler

• A web crawler is a system that searches the web, beginning with a user-designated we page, looking for a designated target string

• A web crawler follows all of the links on each page that it encounter until there are no more pages or until it reaches a designated limit

Page 4: Ch19

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-4

Web Crawler

• For this case study, we will create a graphical web crawler with the following requirements– Enter a designated starting web page

– Enter a target string for which to search

– Limit the search to 50 pages

– Display the results when done

Page 5: Ch19

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-5

Web Crawler - Design

• Our web crawler system consists of three high-level components:– The driver

– The graphical user interface

– The web crawler implementation• Makes use of graphs and hashtables

Page 6: Ch19

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-6

Web Crawler - Design• The algorithm for the web crawler is as follows

– Add the starting page to a HashSet of pages to be searched and to our graph

– Remove a page from the set of pages to be searched

– Search the page for the target string• If string exists, add page to list of results

– Search the page for links• If links have not already been searched, add them to set of

pages to be searched and to our graph

– Repeat the three previous steps until our limit is reached or the set is empty

Page 7: Ch19

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-7

FIGURE 19.1 User interface design

Page 8: Ch19

Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 19-8

FIGURE 19.2 UML description