lecture semantic augmentation
DESCRIPTION
TRANSCRIPT
![Page 1: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/1.jpg)
1
COMP3725Knowledge Enriched Information
Systems
Lecture 13: Semantic Augmentation
Dhavalkumar Thakker (Dhaval)School of Computing, University of Leeds
![Page 2: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/2.jpg)
2
Outline
• Semantic Augmentation– What – Why– How
• Existing systems & services for Semantic Augmentation
• Challenges
![Page 3: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/3.jpg)
3
Semantic Augmentation
• From:
• To:
(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
http://dbpedia.org/Ontology/Apple_Corps
http://dbpedia.org/Ontology/New_York_City
![Page 4: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/4.jpg)
4
Semantic Augmentation
• Semantic augmentation is a process of attaching semantics to a selected part of a text to assist automatic interpretation of the meaning conveyed by the text.
• Also called semantic annotation, semantic tagging
![Page 5: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/5.jpg)
5
It provides additional information about an existing piece of data.
![Page 6: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/6.jpg)
6
Why Semantic Augmentation?
• Links to complementary information– “More about this”
• Show related or similar informatiom• Reasoning and inferencing offered by
semantics• Semantic annotation is the glue that ties
ontologies into document spaces – remember existing web is document web
• Manual metadata production cost is too high
![Page 7: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/7.jpg)
7
GATE for Semantic Augmentation
• GATE (General Architecture for Text Engineering) – see gate.ac.uk
• GATE Developer is a development environment that provides a rich set of graphical interactive tools for the creation, measurement and maintenance of software components for processing human language.
• See: http://gate.ac.uk/family/developer.html
![Page 8: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/8.jpg)
Overview of Gate Developer
• GATE Developer• Resources Pane
– applications: groups of processes to run on a document or corpus
– language resources: corpus, ontologies, schemas– processing resources: tools that operate on
unstructured text– datastores: saved documents and resources
• Display Pane: whatever you’re currently working with.
• See next slide
![Page 9: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/9.jpg)
9
GATE : Interface
Resources Pane Display
Pane
![Page 10: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/10.jpg)
Processing Resources: ANNIE
• A family of Processing Resources for language analysis included with GATE
• Stands for A Nearly-New Information Extraction system.
• Using finite state techniques to implement various tasks: tokenization, semantic tagging, verb phrase chunking, and so on.
![Page 11: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/11.jpg)
ANNIE IE Modules
http://gate.ac.uk/sale/tao/splitch6.html#chap:annie
![Page 12: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/12.jpg)
Some ANNIE Components
• Tokenizer– word, number, symbol, punctuation, and spaceToken.
• Sentence Splitter– Segments text into sentences
• Part of Speech Tagger– produces a part-of-speech tag as an annotation on each word or
symbol – Nouns, verbs etc.
• Gate Morphological Analyser – detecting morphemes in a piece of text (e.g. car,
caring)• OntoGazetteer
– Semantic Tagging component – uses ontology
![Page 13: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/13.jpg)
13
Demo:
• From:
• To:
(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
http://dbpedia.org/Ontology/Apple_Corps
http://dbpedia.org/Ontology/New_York_City
13
![Page 14: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/14.jpg)
14
Step : Download & Start the GATE application
• Download GATE from: http://gate.ac.uk/download/
• Note: the demonstration is using GATE 6.0
![Page 15: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/15.jpg)
15
Step: From Language Resources Select
• GATE document-> Make sure that String content is selected in the last field, see screenshot below. Name the file “Test”
![Page 16: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/16.jpg)
16
Paste following text…in the file
• Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.
![Page 17: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/17.jpg)
17
Step: From Processing resources select following resources
• ANNIE English Tokeniser• ANNIE Sentence Splitter• ANNIE POS Tagger• GATE Morphological Analyser• Note: For all the above, leave the “Name”
field Empty
![Page 18: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/18.jpg)
18
Step: From Processing resources select following resources
![Page 19: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/19.jpg)
19
Step: From Language Resources Select
• OWLIM Ontology– Specify the location of the ontology you would
like to use for semantic augmentation– For example, we are using dbpedia ontology
![Page 20: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/20.jpg)
20
OWLIM Ontology window
![Page 21: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/21.jpg)
21
From Processing Resources Select
• Select Onto Root Gazetteer • & specify parameters as follows:
![Page 22: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/22.jpg)
22
Final steps: Create Corpus
• Go to Language resources and click on GATE Corpus, and add “Test” document created earlier
![Page 23: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/23.jpg)
23
Final steps: Create Corpus Pipeline
• From application
• And add processing resources in order shown below and press “run this application”
![Page 24: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/24.jpg)
24
Results: Go to file, Click on Annotation Set, Annotation List, Lookup
Semantic Augmentation
![Page 25: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/25.jpg)
Other features
• JAPE– a Java Annotation Patterns Engine, provides
regular-expression based pattern/action rules over annotations.
– Grammar to detect entities, validate detected entities, pre & post processing
– Example: “at the Carnegie Stadium”, “at the Emirates Stadium”, “at the O2 Arena”
– See Tutorial: http://gate.ac.uk/sale/thakker-jape-tutorial/index.html
![Page 26: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/26.jpg)
Some Links• Home page is http://gate.ac.uk/• Some good short tutorial videos for getting started:
http://gate.ac.uk/demos/developer-videos/ . These are only a few minutes each, so they’re fast
• User Guide: http://gate.ac.uk/sale/tao/index.html . This is apparently for version 7.1, which is a development build, but again it seems to be fine.
• Lots of documentation : http://gate.ac.uk/documentation.html
• The wiki: http://gate.ac.uk/wiki/ • JAPE grammar by Dhaval Thakker et al
http://gate.ac.uk/sale/thakker-jape-tutorial/index.html
![Page 27: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/27.jpg)
27
Challenge: Term Ambiguity
• ...this apple on the palm of my hand...• ...Apple tried to acquire Palm Inc....• ...eating an apple sitted by a palm tree...
• What do “apple” and “palm” mean in each case?
• Objective is to recognize entities and disambiguate their meaning.
DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva,
and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .
![Page 28: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/28.jpg)
Challenges
• Disambiguation• Unknown entities • Ontology learning• Scale and speed• Co-referencing
![Page 29: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/29.jpg)
Existing Services for Semantic Augmentation
![Page 30: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/30.jpg)
Existing Services for Semantic Augmentation
![Page 31: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/31.jpg)
31
DBpedia Spotlight
• DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data
• DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages
• Learns how to recognize that a DBpedia resource was mentioned
• Given plain text as input, generates annotated texthttp://dbpedia-spotlight.github.com/demo/
![Page 33: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/33.jpg)
33
DBpedia Spotlight
![Page 34: Lecture semantic augmentation](https://reader033.vdocument.in/reader033/viewer/2022051322/546212e8b4af9f531c8b45de/html5/thumbnails/34.jpg)
34
References
• DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .
• Introduction to GATE, Dr. Paula Matuszek• Various resources from gate.ac.uk