kyoto architecture - baselines

16
KYOTO (ICT-211423) Knowledge Yielding Ontologies for Transition-Based Organization Intelligent Content and Semantics The First KYOTO Workshop February 2-3 2009 Overall Kyoto Architecture and Kyoto Annotation Format Carlo Aliprandi - SyNTHEMA

Upload: kris

Post on 16-Jan-2016

46 views

Category:

Documents


1 download

DESCRIPTION

KYOTO ( ICT - 211423) K nowledge Y ielding O ntologies for T ransition-Based O rganization Intelligent Content and Semantics The First KYOTO Workshop February 2-3 2009 Overall Kyoto Architecture and Kyoto Annotation Format Carlo Aliprandi - SyNTHEMA. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Kyoto Architecture - Baselines

KYOTO (ICT-211423)Knowledge Yielding Ontologies for Transition-Based Organization

Intelligent Content and Semantics

The First KYOTO WorkshopFebruary 2-3 2009

Overall Kyoto Architecture and Kyoto Annotation Format

Carlo Aliprandi - SyNTHEMA

Page 2: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

Kyoto Architecture - Baselines

• KYOTO: an information sharing system that enables the extraction of deep semantics (Web 3.0) from texts, for a selected domain, anchoring meaning across cultures and languages

• KYOTO: a social platform (Web 2.0) for knowledge sharing and transfer supporting people and organization in building, maintaining and improving knowledge

• Baselines for KYOTO architecture:– Strong backbone for data exchange among components – Adopt and adapt existing standards – Open and public system– Synchronize across versions/languages/NLP tools/research groups– API to connect to sources and services– Services to plug and unplug different knowledge sources (Lexicon,

Wordnets, Ontologies– Tradeoff btw generic vs domain resources

Page 3: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

System components

• Capture Serversystem for selecting, converting and storing documents into

the Kyoto document DB.

linguistic processors producing KAF annotations • Wikyoto system

wiki system for yielding wordnets and ontologies. Main interface for concept and fact users

• Document Manager• Term Editor• Kybot Editor

Page 4: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

System components

• Tybot ServerAutomatic term and relation extraction from KAF documents

and population of term database Validation of terms and population and mapping to D-WNs

via Wikyoto• Kybots Server

Semi-Automatic fact annotation on KAF documents, using patterns (Kybots)

• Kyoto Search systemMain interface for end-users

• Fact search system• Fact alert system

Page 5: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

(Simplified) architecture: domain expert point-of-view

Page 6: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

Overall architecture

Tybot ServerTybot Server

CaptureServer

CaptureServer

IndexingServer

IndexingServer

Kybot ServerKybot Server

KybotsDB

KybotsDB

DocumentBase

DocumentBase

DomainOntologyDomain

Ontology

Wordnet (Japanese)Wordnet

(Japanese) Wordnet (Dutch)

Wordnet (Dutch)Wordnet

(Spanish)Wordnet (Spanish)

Wordnet (Chinese)Wordnet (Chinese)

Wordnets

FrameNetFrameNet

DOLCEDOLCEFrameNetFrameNet

KyotoOntology

KyotoOntology

SUMOSUMO

Ontologies

Basque Term DB

Japan Term DB

Domain WordnetDomain

Wordnet

Extracted Terms

Extracted Terms

L.P.(Dutch)

L.P.(Dutch)L.P.

(English)L.P.

(English)

L.P.(Basque)

L.P.(Basque)

L.P.(Italian)

L.P.(Italian)

Linguistic Processor

TermEditorTermEditor

Doc.Manager

Doc.Manager

Kybot EditorKybot Editor

Wikyoto

[2]

[1]

[3]

SearchApp.

SearchApp.

BrowseBrowse

Kyoto System

Concept User

Fact User

End User

Page 7: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

Data formats: KAF

• Kyoto Annotation Format (Level 1)a multi-layered annotation format for:– Tokenizaton and word form segmentation– POS tagging – Lemmatization and Term extraction – Constituency Tagging– Dependency Tagging

ENG-3.0-107695012-N

Page 8: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

Semantic Annotation

• Semantic Annotation Format for:– Named Entity Recognition (time, events, quant. …)

– Word Sense Disambiguation (D-WSD)– Semantic Role Labeling (SRL)

no synsets

KAF level2 (SemKAF)ENG-3.0-107630294-N

Page 9: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

Data formats

Level of annotation:1. Morpho-syntax annotation2. Semantic annotation3. Terms representation

4. Facts annotation

5. Wordnets6. Ontologies

Standard format

}KAF

TMF

KAF

LMF OWL

Page 10: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

KAF annotation : words

<text> <wf wid="w1" sent="1" para="1">Tropical</wf> <wf wid="w2" sent="1" para="1">terrestrial</wf> <wf wid="w3" sent="1" para="1">species</wf> <wf wid="w4" sent="1" para="1">populations</wf> <wf wid="w5" sent="1" para="1">declined</wf> <wf wid="w6" sent="1" para="1">by</wf> <wf wid="w7" sent="1" para="1">55</wf> <wf wid="w8" sent="1" para="1">per</wf> <wf wid="w9" sent="1" para="1">cent</wf> <wf wid="w10" sent="1" para="1">on</wf> <wf wid="w11" sent="1" para="1">average</wf> <wf wid="w12" sent="1" para="1">from</wf> <wf wid="w13" sent="1" para="1">1970</wf> <wf wid="w14" sent="1" para="1">to</wf> <wf wid="w15" sent="1" para="1">2003</wf> </text>

Tropical terrestrial species

populations declined by 55 per cent

on average from 1970 to 2003.

Page 11: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

KAF annotation : terms

<term tid="t5" type="open" lemma="decline" pos="V"> <spans> <target id="w5"/> </spans>

<term tid="t7" type="open" lemma="55 per cent" pos="N"> <spans> <target id="w7"/> <target id="w8"/> <target id="w9"/> </spans> </term>

Tropical terrestrial species

populations declined by 55 per cent

on average from 1970 to 2003.

Page 12: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

KAF annotation : constituents

<chunks> <!-- terrestrial species --> <chunk cid="2" head="t3" phrase="NP"> <spans> <target id="t2"/> <target id="t3"/> </spans> </chunk> <!-- terrestrial species populations --> <chunk cid="3" head="t4" phrase="NP"> <spans> <target id="t2"/> <target id="t3"/> <target id="t4"/> </spans> </chunk> <!-- Tropical terrestrial species --> <chunk cid="4" head="t3" phrase="NP"> <spans> <target id="t1"/> <target id="t2"/> <target id="t3"/> </spans> </chunk> </chunks>

Tropical terrestrial species

populations declined by 55 per cent

on average from 1970 to 2003.

Page 13: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

KAF annotation : dependencies

<deps> <dep from="t4" to="t5" rfunc="subj"/> <dep from="t4" to="t1" rfunc="mod"/> <dep from="t4" to="t2" rfunc="mod"/> <dep from="t4" to="t3" rfunc="mod"/>

<term tid="t1" type="open" lemma="tropical" pos="G"> .. <term tid="t2" type="open" lemma="terrestrial" pos="G"> .. <term tid="t3" type="open" lemma="species" pos="N"> .. <term tid="t4" type="open" lemma="population" pos="N"> .. <term tid="t5" type="open" lemma="decline" pos="V"> ..

Tropical terrestrial species

populations declined by 55 per cent

on average from 1970 to 2003.

Page 14: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

<term tid="t4" type="open" lemma="population" pos="N"> <spans> <target id="w4"/> </spans> <senseAlt>

<sense sensecode="EN-17-00861095-n" /><sense sensecode="EN-17-00859568-n" />.......

<term tid="t4" type="open" lemma="population" pos="N"> <spans> <target id="w4"/> </spans> <senseAlt>

<sense sensecode="EN-17-00859568-n" confidence="0.80 "/><sense sensecode="EN-17-00257849-n" confidence="0.13 /><sense sensecode="EN-17-00962397-n" confidence="0.07 />

</senseAlt> </term>

KAF annotation: WSD

Page 15: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

Kyoto open-ness

– The kernel of the system. – Core components available as Open Source– Integrating existing resources – Usable by anybody in the 7 Kyoto langs– Fast delivery: at M12 beta available for several components

(Capture Server, LPs, Tybot server, Wikyoto …)

– Third-part resources as plug-ins– Third-part (open sources) linguistic processors– New languages– Search Interface– Fact Alert System - News Monitoring System

Page 16: Kyoto Architecture - Baselines

The First KYOTO Workshop, Amsterdam, February 2-3 2009 ICT-211423

Thanks