sasaki mlkrep-20150710

30
Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 1 Co-funded by the Horizon 2020 Framework Programme of the European Union Grant Agreement Number 644771 MLKREP, 10 JULY 2015 Felix Sasaki DFKI / W3C Fellow APPROACHES AND APPLICATION SCENARIOS FOR INTEGRATING MULTILINGUAL KNOWLEDGE RESOURCES AND WEB CONTENT www.freme-project.eu

Upload: fremeprojecth2020

Post on 16-Aug-2015

148 views

Category:

Technology


1 download

TRANSCRIPT

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 1

Co-funded by the Horizon 2020Framework Programme of the European UnionGrant Agreement Number 644771

MLKREP, 10 JULY 2015

Felix SasakiDFKI / W3C Fellow

APPROACHES AND APPLICATION SCENARIOS FOR INTEGRATING MULTILINGUAL KNOWLEDGE RESOURCES AND WEB CONTENT

www.freme-project.eu

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 2

BACKGROUND: THE FREME PROJECT

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 3

THE FREME PROJECT

• Two year H2020 Innovation action; start February 2015

• Industry partners leading four business cases arounddigital content and (linked) data

• Technology development bridging language and data

• Outreach and business modelling demonstrating monetization of the multilingual data value chain

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 4

CHALLENGE AND OPPORTUNITY: BIG DATA IS GROWING ACROSS LANGUAGES, SECTORS AND DOMAINS

• BC: Digital publishing

• BC: Translation and localisation

• BC: Agriculture and food domain data

• BC: Web site personalisation

Agriculture metadata, user content, news

content, …

WHAT LIES AHEAD FOR SEVERAL INDUSTRIES? SEE THE FREME BUSINESS CASES

EN

ES JA, ZH, ...

AR

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 5

CURRENT STATE OF SOLUTIONS

Machine translation, terminology

annotation, ...

Linked data creation & processing

GAPS THAT HINDER BUSINESS:

• Plethora of formats• Adaptability and platform dependency• Language coverage• Usability “The right tool for the right person

in given and new enterprises”: technology influences job profiles

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 6

FREME TO THE RESCUE: ENRICHING DIGITAL CONTENT

Machine translation, terminology

annotation, ...

Linked data creation & processing

LT and LD as first class citizens on the Web

A SET OF INTERFACES* - DESIGN DRIVENBY BUSINESS CASES

LT and LD for varioususer types: (application) developer, content architect, content author, …

* Graphical interfaces* Software Interfaces

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 7

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 8

EACH SERVICE IN ONE SENTENCE

• e-Translation: “Translate from Dutch to English”

• e-Terminology: “Add terminology annotations”

• e-Entity: “Identify unique entities”

• e-Link: “Add information from (linked open) data sources”

• e-Publishing: “Publish as digital book content”

• e-Internationalisation: “Use standardised metadata for multilingual content production”

A KEY ASPECT FREME: FREME will allow to combine data and language technologies via adequate software interfaces (APIs) and graphical user interfaces (GUIs)

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 9

CHALLENGES FOR MULTILINGUAL KNOWLEDGE RESOURCES AND SOLUTIONS PROVIDED BY FREME

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 10

CHALLENGE: INTEGRATION OF KNOWLEDGE RESOURCES INTO CONTENT

• Content comes in a plethora of formats

• There is no standardised way to representation knowledge related information in widely used content formats

• Keynote from Michael Wetzel: too many competing formats!◦ SKOS, OWL, TBX, …

• Solution by FREME:◦ Using NIF to represent natural natural language processing workflows◦ Enrich with interlinked information◦ Linking => benefit from the network effect on the Web

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 11

WHAT IS NIF?

• Natural Language Processing Interchange Format

• See http://nlp2rdf.org/

• Linked Data format to store annotations & to organize NLP pipelines

• API specification to create NIF workflows

• Following slides: main roles for NIF

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 12

EXAMPLE (PARTIAL; JSON-LD SYNTAX)

{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 13

EXAMPLE (PARTIAL; JSON-LD SYNTAX)

{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }

• Identifying and typing annotations

• Identifying annotation offsets

• Adding additional knowledge, e.g. named entity identifier

• Interrelating annotations

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 14

EXAMPLE (PARTIAL; JSON-LD SYNTAX)

{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }

• Identifying and typing annotations

• Identifying annotation offsets

• Adding additional knowledge, e.g. named entity identifier

• Interrelating annotations

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 15

EXAMPLE (PARTIAL; JSON-LD SYNTAX)

{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }

• Identifying and typing annotations

• Identifying annotation offsets

• Adding additional knowledge, e.g.named entity identifier

• Interrelating annotations

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 16

EXAMPLE (PARTIAL; JSON-LD SYNTAX)

{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }

• Identifying and typing annotations

• Identifying annotation offsets

• Adding additional knowledge, e.g.named entity identifier

• Interrelating annotations

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 17

A POTENTIAL NIF WORKFLOW

Existing content

Content analytics, e.g. named entity recognition

Conversion to NIF

Deploying knowledge from theLinguistic Linked Data (LLD) cloud

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 18

A POTENTIAL NIF WORKFLOW

Existing content

Content analytics, e.g. named entity recognition

Conversion to NIF

Deploying knowledge from theLinguistic Linked Data (LLD) cloud

Integrating world knowledge andterminological knowledge

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 19

INTEGRATING WORLD KNOWLEDGE ANDTERMINOLOGICAL KNOWLEDGE

{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, …] }

• Step 1: creating NIF from existing content

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 20

INTEGRATING WORLD KNOWLEDGE ANDTERMINOLOGICAL KNOWLEDGE

{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, { "@id" : "p:char=9,20", …"taIdentRef" : "http://dbpedia.org/resource/screwdriver" }, …] }

• Step 1: creating NIF from existing content

• Step 2: adding world knowledge based on Dbpedia

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 21

INTEGRATING WORLD KNOWLEDGE ANDTERMINOLOGICAL KNOWLEDGE

{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, { "@id" : "p:char=9,20", …"taIdentRef" : "http://dbpedia.org/resource/screwdriver" },"termInfoRef" : "http://tbx2rdf.lider-project.eu/…/query=schraubendreher" }, …] }

• Step 1: creating NIF from existing content

• Step 2: adding world knowledge based on Dbpedia

• Step 3: adding terminological knowledge from IATE

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 22

INTEGRATING WORLD KNOWLEDGE ANDTERMINOLOGICAL KNOWLEDGE

{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, { "@id" : "p:char=9,20", …"taIdentRef" : "http://dbpedia.org/resource/screwdriver" },"termInfoRef" : "http://tbx2rdf.lider-project.eu/…/query=schraubendreher" }, …] }

• Step 1: creating NIF from existing content

• Step 2: adding world knowledge based on Dbpedia

• Step 3: adding terminological knowledge from IATE

• IATE is used as a linked data version, via http://tbx2rdf.lider-project.eu

• The query to IATE uses the translation suggested from DBpedia• The network effect: interlinking adds value

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 23

SAMPLE APPLICATION SCENARIOS

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 24

AUTHORING AND PUBLISHING MULTILINGUALLY AND SEMANTICALLY ENRICHED EBOOKS

• Example: Integration into ePub editing mode of oXygen XML Editor

e-Entity: annotate named entities

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 25

INTEGRATING SEMANTIC ENRICHMENT INTO MULTILINGUAL CONTENT IN TRANSLATION AND LOCALISATION

• Example: Integration into XLIFF 2.0 editing mode of oXygen XML Editor

• Combination of services◦ e-Entity: annotate named entities; e-Terminology: fetch terminological information◦ e-Link: fetch additional information from a linked data source like DBpedia, specific to the

type of entities (places, persons, …)

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 26

INTEGRATING SEMANTIC ENRICHMENT INTO MULTILINGUAL CONTENT IN TRANSLATION AND LOCALISATION

• Enriching content with machine readable information – represented as JSON-LD◦ Input: “Welcome to Berlin … Marlene Dietrich!”◦ Output:

[ { "@id": "dbpedia:Marlene_Dietrich", "@type": "person", "born": "1901-12-27" }

]

May be basis e.g. for further processing, e.g.multilingual generation:• “… born 1901”• “… geboren 1901”• “…1901年生まれ”• …

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 27

DEMO

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 28

DEMO

• Generating translation suggestions

• Knowledge being used◦ World knowledge: DBedia◦ Terminological knowledge: IATE

• Storage in ePub based on Internationalization Tag Set (ITS) 2.0◦ Standardised markup for multilingual content production◦ Storage of translation suggestions here are ITS “Localization Note”

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 29

WANT TO TRY THINGS OUT?

• Go to http://api.freme-project.eu/doc/0.1/

• Check out API demo calls

• Time line for next prototypes◦ 0.2: mid July◦ 0.3: end of August◦ Feedback to GitHub: https://github.com/freme-project

- Will be made public repro mid July

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 30

CONTACTS

Felix Sasaki, on behalf of the FREME consortium

E-mail: [email protected]

CONSORTIUM