stavies
DESCRIPTION
TRANSCRIPT
![Page 1: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/1.jpg)
STAVIES
BYK.RAJASEKHAR REDDY
(08Q61A0528)
![Page 2: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/2.jpg)
Contents:IntroductionWrappersClusteringSystem DescriptionWorkingTypesAdvantages and DisadvantagesConclusion
![Page 3: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/3.jpg)
Introduction:STAVIES is a system for Information Extraction through Automatic Web Wrapper Using clustering Techniques.
![Page 4: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/4.jpg)
STAVIES is used in:Automatic Information Discovery.
Extraction of structured web data.
![Page 5: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/5.jpg)
WRAPPERSPiece of software to extract the useful information from web data sources.
Data extracted is referred as Structural Tokens.
![Page 6: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/6.jpg)
Categories of Wrappers:Site Specific: Extracts information from a web pages
or family of web pages. Generic wrappers: Can be applied to almost any page
regardless of the structures.
![Page 7: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/7.jpg)
CLUSTERINGProcess of recognizing input data set in such a way that data points in same cluster are similar other than in different clusters.
![Page 8: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/8.jpg)
Quality Evaluation Measures:Cluster Compactness: Evaluates how the subsets of input are
redistributed by clustering system, compared with whole input set.
Cluster Separation: Indicates overall dissimilarity among the
output clusters.
![Page 9: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/9.jpg)
System DescriptionTwo modules
1.Transformation module
2.Extraction module
![Page 10: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/10.jpg)
Phases:
Preparation Phase: 1.Validation correction and XHTML
generation. 2.Tree transformation and Terminal
node selecton
![Page 11: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/11.jpg)
• Segmentation Phase: 1. Nodes Comparison. 2. Hierarchical clustering.
3. Cluster Evaluation and Target area Discover.
4. Boundary selection.
![Page 12: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/12.jpg)
• Information Retrieval Phase:
1. Information Extraction component.
![Page 13: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/13.jpg)
Working:
![Page 14: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/14.jpg)
Experimental Results:
![Page 15: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/15.jpg)
Types:OMINI
MDR
![Page 16: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/16.jpg)
Advantages:Executes in less than 0.4 sec.
No human assistance is required.
High performance.
![Page 17: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/17.jpg)
Disadvantage:Hard to implement in free texts and
non-template pages.
![Page 18: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/18.jpg)
ConclusionSTAVIES saves precious time and effort.
Tested successfully in more than 63,000
HTML pages from 50 different web data sources.
![Page 19: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/19.jpg)
THANK YOU.
![Page 20: stavies](https://reader036.vdocument.in/reader036/viewer/2022081907/54809a79b37959a22b8b5b01/html5/thumbnails/20.jpg)
Queries????