semantic web bootstrapping & annotation hassan sayyadi [email protected] semantic web...
TRANSCRIPT
![Page 1: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/1.jpg)
Semantic web Bootstrapping & Annotation
Hassan Sayyadi
Semantic web research laboratory
Computer department
Sharif university of technology
![Page 2: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/2.jpg)
2
Outline• What is annotation?
• Why use annotation?
• Crawler
• Annotation model
• Annotation methods
• Our Implementation
![Page 3: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/3.jpg)
3
Outline• What is annotation?
• Why use annotation?
• Crawler
• Annotation model
• Annotation methods
• Our Implementation
![Page 4: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/4.jpg)
4
What is annotation?• People make notes to themselves in order to
preserve ideas that arise during a variety of activities
• The purpose of these notes is often to summarize, criticize, or emphasize specific phrases or events
• Semantic annotations are to tag ontology class instance data and map it into ontology classes.
![Page 5: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/5.jpg)
5
Outline• What is annotation?
• Why use annotation?
• Crawler
• Annotation model
• Annotation methods
• Our Implementation
![Page 6: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/6.jpg)
6
Why use annotation?
• To have the world knowledge at one's finger tips seems possible.
• The Internet is the platform for information.
• Unfortunately most of the information is provided in an unstructured and non-standardized form.
![Page 7: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/7.jpg)
7
Why use annotation? (continue)
![Page 8: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/8.jpg)
8
Outline• What is annotation?
• Why use annotation?
• Crawler
• Annotation model
• Annotation methods
• Our Implementation
![Page 9: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/9.jpg)
9
Crawler
• A crawler is a program which traverses the Internet following these links from one page to the next.
![Page 10: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/10.jpg)
10
Focused crawler• Not all the Internet knowledge is required for
every query.• This assumption seems reasonable because
most people work on a restricted domain and do not need the knowledge of the whole Internet
• Searching the whole Internet in this case is very inefficient and expensive.
• Free texts in the Internet contain various information in diverse domains.
![Page 11: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/11.jpg)
11
Focused crawler (continue)
• The focus can be achieved by examining keywords
• Problems: – “Understanding“ the semantic of document– Extremely focusing on one topic
• Another way to focus is the Internet connectivity structure
![Page 12: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/12.jpg)
12
Outline• What is annotation?
• Why use annotation?
• Crawler
• Annotation model
• Annotation methods
• Our Implementation
![Page 13: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/13.jpg)
13
Annotation models
• Mark in web page
• Example:– SUT is one of the largest engineering
schools in the Islamic Republic of Iran– <university>SUT</university> is one of the
largest universities in the <country>Islamic Republic of Iran</country>
![Page 14: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/14.jpg)
14
Annotation models (continue)• Generate RDF• Example:
– SUT is one of the largest engineering schools in the Islamic Republic of Iran
– <rdf:Description rdf:about="http://sharif.edu/#SUT"> <rdf:type>university</rdf:type>
<SHARIF:be_in rdf:resource="http://sharif.edu/#Islamic+Republic+of+Iran"/>
</rdf:Description> <rdf:Description rdf:about="http://sharif.edu/#Islamic+Republic+of+Iran”> <rdf:type>Country</rdf:type> </rdf:Description>
![Page 15: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/15.jpg)
15
Outline• What is annotation?
• Why use annotation?
• Crawler
• Annotation model
• Annotation methods
• Our Implementation
![Page 16: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/16.jpg)
16
Annotation methods
• Manually
• Semi-automatically
• Automatically
![Page 17: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/17.jpg)
17
Automatic Annotation
• The fully automatic creation of semantic annotations is an unsolved problem.
• Automatic semantic annotation for the natural language sentences in these pages is a daunting task and we are often forced to do it manually or semi-automatically using handwritten rules
![Page 18: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/18.jpg)
18
Manual Annotation• Manual annotation is more easily accomplished
today, using authoring tools, which provide an integrated environment for simultaneously authoring and annotating text.
• However, the use of human annotators is often fraught with errors due to factors such as annotator familiarity with the domain, amount of training, personal motivation and complex schemas
• Manual annotation is also an expensive process
![Page 19: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/19.jpg)
19
Semi-automatic Annotation
• To overcome the annotation acquisition bottleneck, semiautomatic annotation of documents has been proposed.
![Page 20: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/20.jpg)
20
Semi-automatic annotation
• assumptions:– vocabulary set is limited– word usage has patterns– semantic ambiguities are rare– terms and jargon of the domain appear
frequently
![Page 21: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/21.jpg)
21
Semantic Annotation Platform (SAP)
![Page 22: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/22.jpg)
22
Multistrategy SAPs• Multistrategy SAPs are able to combine
methods from both pattern-based and machine learning-based systems.
• No SAP currently implements the multistrategy approach for semantic annotation, although it has been implemented in systems for ontology extraction (such as On-To-Knowledge)
![Page 23: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/23.jpg)
23
Semi-automatic annotation (continue)• Example
– I go to Shanghai
• Link structure is
more like a RDF
graph
![Page 24: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/24.jpg)
24
The accuracy of concepts and relations about different algorithm
![Page 25: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/25.jpg)
25
Automatic annotation
![Page 26: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/26.jpg)
26
Source preprocessing
• Document Object Model (DOM)
• Text Model
• Layout Model
• NLP Model
![Page 27: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/27.jpg)
27
Information Identification• Operators
– perform extraction actions on document access models
– Retrieval, Check, Execute• Strategies
– build operator sequences according to user time and quality requirements
• Source Description– build operator sequences according to user time
and quality requirements
![Page 28: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/28.jpg)
28
Ontology population• The final stage of the overall process is to
decide which hypothesis represents the extracted information to insert into the ontology
• The module simulates insertions and calculates the cost according to the number of new instance creations, instance modifications or inconsistencies found
![Page 29: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/29.jpg)
29
Outline• What is annotation?
• Why use annotation?
• Crawler
• Annotation model
• Annotation methods
• Our Implementation
![Page 30: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/30.jpg)
30
Our implementation
• Crawler:– Crawl all link that contains:
• sharif.ir• sharif.edu• sharif.ac.ir
![Page 31: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/31.jpg)
31
Our implementation• Source pre-processing
– Html to text• text = text.replaceAll("\n", "*_newline_*");• text = text.replaceAll("\\<script.*?\\</script\\>", "");• text = text.replaceAll("\\<style.*?</style.*\\>", "");• text = text.replaceAll("<\\!--.*?--\\>", "");• text = text.replaceAll("\\<.*?\\>", "");• text = text.replaceAll(" ", " ");• text = text.replaceAll("<", "<");• …• text = text.replaceAll("\\*_newline_\\*", "\n");
– Additional• text = text.replaceAll("\n(\n|| )*\n",".");• text = text.replaceAll(",", " and ");
![Page 32: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/32.jpg)
32
Our implementation
• Information extraction:– JMontyLingua
• SUT is one of the largest engineering schools in the Islamic Republic of Iran
• ("be" "SUT" "one" "of largest engineering school" "in Islamic Republic" "of Iran")
![Page 33: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/33.jpg)
33
Our implementation
• JMontyLingua problem:– SUT has computer, mechanic and electric
engineering departments – ("have" "SUT" "computer mechanic and
electric engineering departments")– ("have" "SUT" "computer and mechanic
and electric engineering departments")
![Page 34: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/34.jpg)
34
Our inplementation• ("be" "SUT" “university" "in Islamic Republic" "of Iran")
• => ("be" "SUT" “university" "in Islamic Republic of Iran")
• =>SUT,be,university & SUT,be_in,Islamic Republic of Iran
• <rdf:Description rdf:about="http://sharif.edu/#SUT"> <rdf:type>university</rdf:type>
<SHARIF:be_in rdf:resource="http://sharif.edu/#Islamic+Republic+of+Iran"/>
</rdf:Description>
![Page 35: Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of](https://reader031.vdocument.in/reader031/viewer/2022012916/5697bfd71a28abf838cae416/html5/thumbnails/35.jpg)
35
Any question?