stavies

Post on 29-Nov-2014

1.030 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

STAVIES

BYK.RAJASEKHAR REDDY

(08Q61A0528)

Contents:IntroductionWrappersClusteringSystem DescriptionWorkingTypesAdvantages and DisadvantagesConclusion

Introduction:STAVIES is a system for Information Extraction through Automatic Web Wrapper Using clustering Techniques.

STAVIES is used in:Automatic Information Discovery.

Extraction of structured web data.

WRAPPERSPiece of software to extract the useful information from web data sources.

Data extracted is referred as Structural Tokens.

Categories of Wrappers:Site Specific: Extracts information from a web pages

or family of web pages. Generic wrappers: Can be applied to almost any page

regardless of the structures.

CLUSTERINGProcess of recognizing input data set in such a way that data points in same cluster are similar other than in different clusters.

Quality Evaluation Measures:Cluster Compactness: Evaluates how the subsets of input are

redistributed by clustering system, compared with whole input set.

Cluster Separation: Indicates overall dissimilarity among the

output clusters.

System DescriptionTwo modules

1.Transformation module

2.Extraction module

Phases:

Preparation Phase: 1.Validation correction and XHTML

generation. 2.Tree transformation and Terminal

node selecton

• Segmentation Phase: 1. Nodes Comparison. 2. Hierarchical clustering.

3. Cluster Evaluation and Target area Discover.

4. Boundary selection.

• Information Retrieval Phase:

1. Information Extraction component.

Working:

Experimental Results:

Types:OMINI

MDR

Advantages:Executes in less than 0.4 sec.

No human assistance is required.

High performance.

Disadvantage:Hard to implement in free texts and

non-template pages.

ConclusionSTAVIES saves precious time and effort.

Tested successfully in more than 63,000

HTML pages from 50 different web data sources.

THANK YOU.

Queries????

top related