stavies
Post on 29-Nov-2014
1.030 Views
Preview:
DESCRIPTION
TRANSCRIPT
STAVIES
BYK.RAJASEKHAR REDDY
(08Q61A0528)
Contents:IntroductionWrappersClusteringSystem DescriptionWorkingTypesAdvantages and DisadvantagesConclusion
Introduction:STAVIES is a system for Information Extraction through Automatic Web Wrapper Using clustering Techniques.
STAVIES is used in:Automatic Information Discovery.
Extraction of structured web data.
WRAPPERSPiece of software to extract the useful information from web data sources.
Data extracted is referred as Structural Tokens.
Categories of Wrappers:Site Specific: Extracts information from a web pages
or family of web pages. Generic wrappers: Can be applied to almost any page
regardless of the structures.
CLUSTERINGProcess of recognizing input data set in such a way that data points in same cluster are similar other than in different clusters.
Quality Evaluation Measures:Cluster Compactness: Evaluates how the subsets of input are
redistributed by clustering system, compared with whole input set.
Cluster Separation: Indicates overall dissimilarity among the
output clusters.
System DescriptionTwo modules
1.Transformation module
2.Extraction module
Phases:
Preparation Phase: 1.Validation correction and XHTML
generation. 2.Tree transformation and Terminal
node selecton
• Segmentation Phase: 1. Nodes Comparison. 2. Hierarchical clustering.
3. Cluster Evaluation and Target area Discover.
4. Boundary selection.
• Information Retrieval Phase:
1. Information Extraction component.
Working:
Experimental Results:
Types:OMINI
MDR
Advantages:Executes in less than 0.4 sec.
No human assistance is required.
High performance.
Disadvantage:Hard to implement in free texts and
non-template pages.
ConclusionSTAVIES saves precious time and effort.
Tested successfully in more than 63,000
HTML pages from 50 different web data sources.
THANK YOU.
Queries????
top related