nif – nlp interchange formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_nlp... · 2011....
TRANSCRIPT
![Page 1: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/1.jpg)
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.euAKSW, Universität Leipzig
Sebastian Hellmann
NIF – NLP Interchange Format
http://aksw.org/Projects/NIF
![Page 2: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/2.jpg)
2
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
Outline:• NLP Interchange Format • Use Cases
– Integration of tools– Meaning Representation Language– Knowledge Extraction with SPARQL– Machine Learning
• Related Projects
2KAIST LOD2 17.8.2011
![Page 3: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/3.jpg)
3
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
Problem:• Currently NLP software is organized in pipelines• Integration is done „hard-wired“
– For each tool and each framework an adapter has to be created (n*m)
• Difficult to aggregate output• Difficult to exchange single components
3KAIST LOD2 17.8.2011
![Page 4: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/4.jpg)
4
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
Overview: • NLP tools can be integrated via a common output format (Common
pattern in Enterprise Application Integration)• For each tool a wrapper needs to be created, that reads NIF and
produces NIF• The combination of tools can be adhoc, i.e. it is not a pipeline that
needs to be configured• Multi-layer and overlapping annotations are possible• Ontologies provide interfaces for each layer and for applications
4KAIST LOD2 17.8.2011
![Page 5: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/5.jpg)
5
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
• First Challenge: Representing Strings in RDF• How to give a part of a document or text an identifier (URI)?• What properties can such URIs have?
5KAIST LOD2 17.8.2011
![Page 6: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/6.jpg)
6
Creating Knowledge out of Interlinked Data
LOD2 Event . 06.09.2010 . Page http://lod2.eu
NIF – NLP Interchange Format
6
![Page 7: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/7.jpg)
7
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
7
Example URIs for annotating „Semantic Web“
KAIST LOD2 17.8.2011
![Page 8: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/8.jpg)
8
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
• First Challenge: Representing Strings in RDF• How to give a part of a document or text an identifier (URI)?• What properties can such URIs have?
8KAIST LOD2 17.8.2011
![Page 9: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/9.jpg)
9
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
• URIs are used to integrate output. RDF merges naturally, if the URIs are the same (or convertible using a certain recipe)
9KAIST LOD2 17.8.2011
![Page 10: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/10.jpg)
10
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
• Second challenge: Output of each layer is required to be stable.• Components and layers can be interchanged• Domain ontologies are needed to provide stable interfaces:
– OLiA provides an ontological interface for morpho-syntax http://nachhalt.sfb632.uni-potsdam.de/owl/
– DBpedia provides stable ids for Things
10KAIST LOD2 17.8.2011
![Page 11: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/11.jpg)
11
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
11KAIST LOD2 17.8.2011
![Page 12: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/12.jpg)
12
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
12KAIST LOD2 17.8.2011
![Page 13: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/13.jpg)
13
Creating Knowledge out of Interlinked Data
http://lod2.eu
NIF – NLP Interchange Format
13KAIST LOD2 17.8.2011
![Page 14: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/14.jpg)
14
Creating Knowledge out of Interlinked Data
http://lod2.eu
Demo - Integration
• http://nlp2rdf.lod2.eu/annotator-stanford/NIFStemmer?input=My%20favorite%20actress%20is%20Natalie%20Portman!&type=text
• http://nlp2rdf.lod2.eu/annotator-stanford/NIFStanfordCore?input=My%20favorite%20actress%20is%20Natalie%20Portman!&type=text
14KAIST LOD2 17.8.2011
![Page 15: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/15.jpg)
15
Creating Knowledge out of Interlinked Data
http://lod2.eu
Use Cases
• Use Cases– Integration of tools– Meaning Representation Language– Knowledge Extraction with SPARQL– Machine Learning
15KAIST LOD2 17.8.2011
![Page 16: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/16.jpg)
16
Creating Knowledge out of Interlinked Data
http://lod2.eu
Use Case – Integration of tools
16KAIST LOD2 17.8.2011
![Page 17: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/17.jpg)
17
Creating Knowledge out of Interlinked Data
http://lod2.eu
Use Case – Meaning Representation Language
• RDF makes data integration easy: URIref, LinkedData
• OWL is based on Description Logics (Guarded Fragment)
• Availability of open data sets (access and licence)
• Diverse serializations for annotations: XML, Turtle, RDFa+XHTML
• Scalable tool support (Databases, Reasoning)
17KAIST LOD2 17.8.2011
![Page 18: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/18.jpg)
18
Creating Knowledge out of Interlinked Data
http://lod2.eu
Use Case – Meaning Representation Language
18KAIST LOD2 17.8.2011
![Page 19: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/19.jpg)
19
Creating Knowledge out of Interlinked Data
http://lod2.eu
Use Case – Knowledge Extraction with SPARQL
• Classical approach:• POS tag / Dependency parser (e.g. Stanford)• create a rule/pattern language to extract knowledge
19KAIST LOD2 17.8.2011
![Page 20: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/20.jpg)
20
Creating Knowledge out of Interlinked Data
http://lod2.eu
Use Case – Knowledge Extraction with SPARQL
Johanna Völker – Learning Expressive Ontologies (LExO)
# Example:# A fish is any aquatic vertebrate animal that is covered with scales, and equipped with two sets of paired fins and several unpaired fins.# [fish] subClassOf [any aquatic vertebrate animal that is covered …]
Construct {?sub rdfs:subClassOf ?super} { ?is a penn:BePresentTense . ?is nlp:superToken ?is_any_aquatic_. ?is_any_aquatic_ a olia:VerbPhrase . ?is_any_aquatic_ nlp:syntacticSubToken [ nlp:normUri ?super] . ?animal nlp:cop ?is . ?animal nlp:nsubj ?fish .?fish nlp:superToken [ nlp:normUri ?sub] .}
20KAIST LOD2 17.8.2011
![Page 21: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/21.jpg)
21
Creating Knowledge out of Interlinked Data
http://lod2.eu
Use Case - Machine Learning
21KAIST LOD2 17.8.2011
![Page 22: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/22.jpg)
22
Creating Knowledge out of Interlinked Data
http://lod2.eu
Use Case - Machine Learning
22KAIST LOD2 17.8.2011
![Page 23: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/23.jpg)
23
Creating Knowledge out of Interlinked Data
http://lod2.eu
Workplan
• EU Deliverable almost finished
• Integration of SnowballStemming and the Stanford Parser
• Next step: Integration of Knowledge Extraction tools (Zemanta, DBpedia Spotlight, Alchemy, OpenCalais, FOX)
• Web Service that read NIF and Output NIF
• Google Code Project: http://code.google.com/p/nlp2rdf/
• Web Site: http://aksw.org/Projects/NIF
23KAIST LOD2 17.8.2011
![Page 24: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/24.jpg)
24
Creating Knowledge out of Interlinked Data
http://lod2.eu
Summary
• NIF allows to represent NLP output using Knowledge Representation Formalisms (RDF/OWL)
• It is possible to mix it with other Knowledge (e.g. Wikipedia/DBpedia)
• Good foundation to optimize machine learning:• Choose the best algortihms • Choose the best data
24KAIST LOD2 17.8.2011
![Page 25: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/25.jpg)
25
Creating Knowledge out of Interlinked Data
http://lod2.eu
Related Projects
• Wiktionary
• LLOD
• CKAN / Open Lingusistics
25KAIST LOD2 17.8.2011
![Page 26: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/26.jpg)
NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 26 http://lod2.eu
Creation of data sets: Wiktionary2RDF
![Page 27: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/27.jpg)
NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 27 http://lod2.eu
Creation of data sets: Wiktionary2RDF
http://en.wiktionary.org/wiki/house• Covers 170 languages• Total of 10 million pages• 900.000 users• RDF Dump will increase number of editors• Same properties as Wikipedia (stable identifiers)•• Hundreds of Wiktionary parsers (especially for English)• Information is trapped in the Wiki• Structure changes make software obsolete•Why try it again?• DBpedia Extraction Framework is very mature (5 years, 15 developers)• Configuration over Code, Templates will allow Wiktionarians to update Parsers• Early contact with the community
![Page 28: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/28.jpg)
NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 28 http://lod2.eu
Wiktionary, Wortschatz, OLiA can become the Crystallization point for a Linguistic Linked Data Web
Four major types:• Lexical Semantic Resources• Dictionaries• Corpora• Schemas/Ontologies
![Page 29: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/29.jpg)
NLP2RDF – http://aksw.org/Projects/NLP2RDF . Page 29 http://lod2.eu
Open Licences – Focus of LOD2 and OKFN
http://ckan.net/
CKAN is an open registry of data and content packages. Harnessing the CKAN software, this site makes it easy to find, share and reuse content and data, especially in ways that are machine automatable.
Working Group on Open Data in Linguisticshttp://linguistics.okfn.org
• Founded on Nov 2010• 40 Members• Membership open, please join• Over 100 data sets in CKAN
![Page 30: NIF – NLP Interchange Formatsemanticweb.kaist.ac.kr/workshop2011/presentation/17_NLP... · 2011. 8. 29. · NIF – NLP Interchange Format Overview: • NLP tools can be integrated](https://reader035.vdocument.in/reader035/viewer/2022071007/5fc53246f1d5d543272af1f1/html5/thumbnails/30.jpg)
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page http://lod2.eu
Thank you for your attention!