stavies

20
STAVIES BY K.RAJASEKHAR REDDY (08Q61A0528)

Upload: akhil-kumar

Post on 29-Nov-2014

1.030 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: stavies

STAVIES

BYK.RAJASEKHAR REDDY

(08Q61A0528)

Page 2: stavies

Contents:IntroductionWrappersClusteringSystem DescriptionWorkingTypesAdvantages and DisadvantagesConclusion

Page 3: stavies

Introduction:STAVIES is a system for Information Extraction through Automatic Web Wrapper Using clustering Techniques.

Page 4: stavies

STAVIES is used in:Automatic Information Discovery.

Extraction of structured web data.

Page 5: stavies

WRAPPERSPiece of software to extract the useful information from web data sources.

Data extracted is referred as Structural Tokens.

Page 6: stavies

Categories of Wrappers:Site Specific: Extracts information from a web pages

or family of web pages. Generic wrappers: Can be applied to almost any page

regardless of the structures.

Page 7: stavies

CLUSTERINGProcess of recognizing input data set in such a way that data points in same cluster are similar other than in different clusters.

Page 8: stavies

Quality Evaluation Measures:Cluster Compactness: Evaluates how the subsets of input are

redistributed by clustering system, compared with whole input set.

Cluster Separation: Indicates overall dissimilarity among the

output clusters.

Page 9: stavies

System DescriptionTwo modules

1.Transformation module

2.Extraction module

Page 10: stavies

Phases:

Preparation Phase: 1.Validation correction and XHTML

generation. 2.Tree transformation and Terminal

node selecton

Page 11: stavies

• Segmentation Phase: 1. Nodes Comparison. 2. Hierarchical clustering.

3. Cluster Evaluation and Target area Discover.

4. Boundary selection.

Page 12: stavies

• Information Retrieval Phase:

1. Information Extraction component.

Page 13: stavies

Working:

Page 14: stavies

Experimental Results:

Page 15: stavies

Types:OMINI

MDR

Page 16: stavies

Advantages:Executes in less than 0.4 sec.

No human assistance is required.

High performance.

Page 17: stavies

Disadvantage:Hard to implement in free texts and

non-template pages.

Page 18: stavies

ConclusionSTAVIES saves precious time and effort.

Tested successfully in more than 63,000

HTML pages from 50 different web data sources.

Page 19: stavies

THANK YOU.

Page 20: stavies

Queries????