sasaki mlkrep-20150710
TRANSCRIPT
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 1
Co-funded by the Horizon 2020Framework Programme of the European UnionGrant Agreement Number 644771
MLKREP, 10 JULY 2015
Felix SasakiDFKI / W3C Fellow
APPROACHES AND APPLICATION SCENARIOS FOR INTEGRATING MULTILINGUAL KNOWLEDGE RESOURCES AND WEB CONTENT
www.freme-project.eu
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 3
THE FREME PROJECT
• Two year H2020 Innovation action; start February 2015
• Industry partners leading four business cases arounddigital content and (linked) data
• Technology development bridging language and data
• Outreach and business modelling demonstrating monetization of the multilingual data value chain
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 4
CHALLENGE AND OPPORTUNITY: BIG DATA IS GROWING ACROSS LANGUAGES, SECTORS AND DOMAINS
• BC: Digital publishing
• BC: Translation and localisation
• BC: Agriculture and food domain data
• BC: Web site personalisation
Agriculture metadata, user content, news
content, …
WHAT LIES AHEAD FOR SEVERAL INDUSTRIES? SEE THE FREME BUSINESS CASES
EN
ES JA, ZH, ...
AR
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 5
CURRENT STATE OF SOLUTIONS
Machine translation, terminology
annotation, ...
Linked data creation & processing
GAPS THAT HINDER BUSINESS:
• Plethora of formats• Adaptability and platform dependency• Language coverage• Usability “The right tool for the right person
in given and new enterprises”: technology influences job profiles
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 6
FREME TO THE RESCUE: ENRICHING DIGITAL CONTENT
Machine translation, terminology
annotation, ...
Linked data creation & processing
LT and LD as first class citizens on the Web
A SET OF INTERFACES* - DESIGN DRIVENBY BUSINESS CASES
LT and LD for varioususer types: (application) developer, content architect, content author, …
* Graphical interfaces* Software Interfaces
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 8
EACH SERVICE IN ONE SENTENCE
• e-Translation: “Translate from Dutch to English”
• e-Terminology: “Add terminology annotations”
• e-Entity: “Identify unique entities”
• e-Link: “Add information from (linked open) data sources”
• e-Publishing: “Publish as digital book content”
• e-Internationalisation: “Use standardised metadata for multilingual content production”
A KEY ASPECT FREME: FREME will allow to combine data and language technologies via adequate software interfaces (APIs) and graphical user interfaces (GUIs)
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 9
CHALLENGES FOR MULTILINGUAL KNOWLEDGE RESOURCES AND SOLUTIONS PROVIDED BY FREME
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 10
CHALLENGE: INTEGRATION OF KNOWLEDGE RESOURCES INTO CONTENT
• Content comes in a plethora of formats
• There is no standardised way to representation knowledge related information in widely used content formats
• Keynote from Michael Wetzel: too many competing formats!◦ SKOS, OWL, TBX, …
• Solution by FREME:◦ Using NIF to represent natural natural language processing workflows◦ Enrich with interlinked information◦ Linking => benefit from the network effect on the Web
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 11
WHAT IS NIF?
• Natural Language Processing Interchange Format
• See http://nlp2rdf.org/
• Linked Data format to store annotations & to organize NLP pipelines
• API specification to create NIF workflows
• Following slides: main roles for NIF
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 12
EXAMPLE (PARTIAL; JSON-LD SYNTAX)
{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 13
EXAMPLE (PARTIAL; JSON-LD SYNTAX)
{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
• Identifying and typing annotations
• Identifying annotation offsets
• Adding additional knowledge, e.g. named entity identifier
• Interrelating annotations
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 14
EXAMPLE (PARTIAL; JSON-LD SYNTAX)
{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
• Identifying and typing annotations
• Identifying annotation offsets
• Adding additional knowledge, e.g. named entity identifier
• Interrelating annotations
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 15
EXAMPLE (PARTIAL; JSON-LD SYNTAX)
{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
• Identifying and typing annotations
• Identifying annotation offsets
• Adding additional knowledge, e.g.named entity identifier
• Interrelating annotations
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 16
EXAMPLE (PARTIAL; JSON-LD SYNTAX)
{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }
• Identifying and typing annotations
• Identifying annotation offsets
• Adding additional knowledge, e.g.named entity identifier
• Interrelating annotations
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 17
A POTENTIAL NIF WORKFLOW
Existing content
Content analytics, e.g. named entity recognition
Conversion to NIF
Deploying knowledge from theLinguistic Linked Data (LLD) cloud
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 18
A POTENTIAL NIF WORKFLOW
Existing content
Content analytics, e.g. named entity recognition
Conversion to NIF
Deploying knowledge from theLinguistic Linked Data (LLD) cloud
Integrating world knowledge andterminological knowledge
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 19
INTEGRATING WORLD KNOWLEDGE ANDTERMINOLOGICAL KNOWLEDGE
{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, …] }
• Step 1: creating NIF from existing content
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 20
INTEGRATING WORLD KNOWLEDGE ANDTERMINOLOGICAL KNOWLEDGE
{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, { "@id" : "p:char=9,20", …"taIdentRef" : "http://dbpedia.org/resource/screwdriver" }, …] }
• Step 1: creating NIF from existing content
• Step 2: adding world knowledge based on Dbpedia
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 21
INTEGRATING WORLD KNOWLEDGE ANDTERMINOLOGICAL KNOWLEDGE
{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, { "@id" : "p:char=9,20", …"taIdentRef" : "http://dbpedia.org/resource/screwdriver" },"termInfoRef" : "http://tbx2rdf.lider-project.eu/…/query=schraubendreher" }, …] }
• Step 1: creating NIF from existing content
• Step 2: adding world knowledge based on Dbpedia
• Step 3: adding terminological knowledge from IATE
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 22
INTEGRATING WORLD KNOWLEDGE ANDTERMINOLOGICAL KNOWLEDGE
{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, { "@id" : "p:char=9,20", …"taIdentRef" : "http://dbpedia.org/resource/screwdriver" },"termInfoRef" : "http://tbx2rdf.lider-project.eu/…/query=schraubendreher" }, …] }
• Step 1: creating NIF from existing content
• Step 2: adding world knowledge based on Dbpedia
• Step 3: adding terminological knowledge from IATE
• IATE is used as a linked data version, via http://tbx2rdf.lider-project.eu
• The query to IATE uses the translation suggested from DBpedia• The network effect: interlinking adds value
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 24
AUTHORING AND PUBLISHING MULTILINGUALLY AND SEMANTICALLY ENRICHED EBOOKS
• Example: Integration into ePub editing mode of oXygen XML Editor
e-Entity: annotate named entities
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 25
INTEGRATING SEMANTIC ENRICHMENT INTO MULTILINGUAL CONTENT IN TRANSLATION AND LOCALISATION
• Example: Integration into XLIFF 2.0 editing mode of oXygen XML Editor
• Combination of services◦ e-Entity: annotate named entities; e-Terminology: fetch terminological information◦ e-Link: fetch additional information from a linked data source like DBpedia, specific to the
type of entities (places, persons, …)
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 26
INTEGRATING SEMANTIC ENRICHMENT INTO MULTILINGUAL CONTENT IN TRANSLATION AND LOCALISATION
• Enriching content with machine readable information – represented as JSON-LD◦ Input: “Welcome to Berlin … Marlene Dietrich!”◦ Output:
[ { "@id": "dbpedia:Marlene_Dietrich", "@type": "person", "born": "1901-12-27" }
]
May be basis e.g. for further processing, e.g.multilingual generation:• “… born 1901”• “… geboren 1901”• “…1901年生まれ”• …
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 28
DEMO
• Generating translation suggestions
• Knowledge being used◦ World knowledge: DBedia◦ Terminological knowledge: IATE
• Storage in ePub based on Internationalization Tag Set (ITS) 2.0◦ Standardised markup for multilingual content production◦ Storage of translation suggestions here are ITS “Localization Note”
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 29
WANT TO TRY THINGS OUT?
• Go to http://api.freme-project.eu/doc/0.1/
• Check out API demo calls
• Time line for next prototypes◦ 0.2: mid July◦ 0.3: end of August◦ Feedback to GitHub: https://github.com/freme-project
- Will be made public repro mid July
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 30
CONTACTS
Felix Sasaki, on behalf of the FREME consortium
E-mail: [email protected]
CONSORTIUM