sasaki mlkrep-20150710

Sasaki – MLKRep – 10 July 2015 WWW.FREME-PROJECT.EU 1

Co-funded by the Horizon 2020Framework Programme of the European UnionGrant Agreement Number 644771

MLKREP, 10 JULY 2015

Felix SasakiDFKI / W3C Fellow

APPROACHES AND APPLICATION SCENARIOS FOR INTEGRATING MULTILINGUAL KNOWLEDGE RESOURCES AND WEB CONTENT

www.freme-project.eu

http://www.freme-project.eu/


BACKGROUND: THE FREME PROJECT


THE FREME PROJECT

• Two year H2020 Innovation action; start February 2015

• Industry partners leading four business cases arounddigital content and (linked) data

• Technology development bridging language and data

• Outreach and business modelling demonstrating monetization of the multilingual data value chain


CHALLENGE AND OPPORTUNITY: BIG DATA IS GROWING ACROSS LANGUAGES, SECTORS AND DOMAINS

• BC: Digital publishing

• BC: Translation and localisation

• BC: Agriculture and food domain data

• BC: Web site personalisation

Agriculture metadata, user content, news

content, …

WHAT LIES AHEAD FOR SEVERAL INDUSTRIES? SEE THE FREME BUSINESS CASES

EN

ES JA, ZH, ...

AR


CURRENT STATE OF SOLUTIONS

Machine translation, terminology

annotation, ...

Linked data creation & processing

GAPS THAT HINDER BUSINESS:

• Plethora of formats• Adaptability and platform dependency• Language coverage• Usability “The right tool for the right person

in given and new enterprises”: technology influences job profiles


FREME TO THE RESCUE: ENRICHING DIGITAL CONTENT

Machine translation, terminology

annotation, ...

Linked data creation & processing

LT and LD as first class citizens on the Web

A SET OF INTERFACES* - DESIGN DRIVENBY BUSINESS CASES

LT and LD for varioususer types: (application) developer, content architect, content author, …

* Graphical interfaces* Software Interfaces


EACH SERVICE IN ONE SENTENCE

• e-Translation: “Translate from Dutch to English”

• e-Terminology: “Add terminology annotations”

• e-Entity: “Identify unique entities”

• e-Link: “Add information from (linked open) data sources”

• e-Publishing: “Publish as digital book content”

• e-Internationalisation: “Use standardised metadata for multilingual content production”

A KEY ASPECT FREME: FREME will allow to combine data and language technologies via adequate software interfaces (APIs) and graphical user interfaces (GUIs)


CHALLENGES FOR MULTILINGUAL KNOWLEDGE RESOURCES AND SOLUTIONS PROVIDED BY FREME


CHALLENGE: INTEGRATION OF KNOWLEDGE RESOURCES INTO CONTENT

• Content comes in a plethora of formats

• There is no standardised way to representation knowledge related information in widely used content formats

• Keynote from Michael Wetzel: too many competing formats!◦ SKOS, OWL, TBX, …

• Solution by FREME:◦ Using NIF to represent natural natural language processing workflows◦ Enrich with interlinked information◦ Linking => benefit from the network effect on the Web


WHAT IS NIF?

• Natural Language Processing Interchange Format

• See http://nlp2rdf.org/

• Linked Data format to store annotations & to organize NLP pipelines

• API specification to create NIF workflows

• Following slides: main roles for NIF


EXAMPLE (PARTIAL; JSON-LD SYNTAX)

{ "@graph" : [ { "@id" : "p:char=0,18", "@type" : [ "nif:Context", "nif:Sentence", "nif:RFC5147String" ], "anchorOf" : "Welcome to Prague.", "beginIndex" : "0", "endIndex" : "18", "isString" : "Welcome to Prague.", "referenceContext" : "p:char=0,18” }, { "@id" : "p:char=11,17", "@type" : [ "nif:RFC5147String", "nif:Word" ], … "referenceContext" : "p:char=0,18", "taIdentRef" : "http://dbpedia.org/resource/Prague" }, …] }




• Identifying and typing annotations

• Identifying annotation offsets

• Adding additional knowledge, e.g. named entity identifier

• Interrelating annotations






• Adding additional knowledge, e.g. named entity identifier







• Adding additional knowledge, e.g.named entity identifier



A POTENTIAL NIF WORKFLOW

Existing content

Content analytics, e.g. named entity recognition

Conversion to NIF

Deploying knowledge from theLinguistic Linked Data (LLD) cloud


A POTENTIAL NIF WORKFLOW

Existing content

Content analytics, e.g. named entity recognition

Conversion to NIF

Deploying knowledge from theLinguistic Linked Data (LLD) cloud

Integrating world knowledge andterminological knowledge


INTEGRATING WORLD KNOWLEDGE ANDTERMINOLOGICAL KNOWLEDGE

{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, …] }

• Step 1: creating NIF from existing content



{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, { "@id" : "p:char=9,20", …"taIdentRef" : "http://dbpedia.org/resource/screwdriver" }, …] }


• Step 2: adding world knowledge based on Dbpedia



{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, { "@id" : "p:char=9,20", …"taIdentRef" : "http://dbpedia.org/resource/screwdriver" },"termInfoRef" : "http://tbx2rdf.lider-project.eu/…/query=schraubendreher" }, …] }



• Step 3: adding terminological knowledge from IATE



{ "@graph" : [ { "@id" : "p:char=0,21", … "isString" : "I have a screwdriver.", "referenceContext" : "p:char=0,21" }, { "@id" : "p:char=9,20", …"taIdentRef" : "http://dbpedia.org/resource/screwdriver" },"termInfoRef" : "http://tbx2rdf.lider-project.eu/…/query=schraubendreher" }, …] }



• Step 3: adding terminological knowledge from IATE

• IATE is used as a linked data version, via http://tbx2rdf.lider-project.eu

• The query to IATE uses the translation suggested from DBpedia• The network effect: interlinking adds value


SAMPLE APPLICATION SCENARIOS


AUTHORING AND PUBLISHING MULTILINGUALLY AND SEMANTICALLY ENRICHED EBOOKS

• Example: Integration into ePub editing mode of oXygen XML Editor

e-Entity: annotate named entities


INTEGRATING SEMANTIC ENRICHMENT INTO MULTILINGUAL CONTENT IN TRANSLATION AND LOCALISATION

• Example: Integration into XLIFF 2.0 editing mode of oXygen XML Editor

• Combination of services◦ e-Entity: annotate named entities; e-Terminology: fetch terminological information◦ e-Link: fetch additional information from a linked data source like DBpedia, specific to the

type of entities (places, persons, …)


INTEGRATING SEMANTIC ENRICHMENT INTO MULTILINGUAL CONTENT IN TRANSLATION AND LOCALISATION

• Enriching content with machine readable information – represented as JSON-LD◦ Input: “Welcome to Berlin … Marlene Dietrich!”◦ Output:

[ { "@id": "dbpedia:Marlene_Dietrich", "@type": "person", "born": "1901-12-27" }

]

May be basis e.g. for further processing, e.g.multilingual generation:• “… born 1901”• “… geboren 1901”• “…１９０１年生まれ”• …


DEMO


DEMO

• Generating translation suggestions

• Knowledge being used◦ World knowledge: DBedia◦ Terminological knowledge: IATE

• Storage in ePub based on Internationalization Tag Set (ITS) 2.0◦ Standardised markup for multilingual content production◦ Storage of translation suggestions here are ITS “Localization Note”


WANT TO TRY THINGS OUT?

• Go to http://api.freme-project.eu/doc/0.1/

• Check out API demo calls

• Time line for next prototypes◦ 0.2: mid July◦ 0.3: end of August◦ Feedback to GitHub: https://github.com/freme-project

- Will be made public repro mid July

http://api.freme-project.eu/doc/0.1/

http://api.freme-project.eu/doc/0.1/


CONTACTS

Felix Sasaki, on behalf of the FREME consortium

E-mail: [email protected]

CONSORTIUM

http://www.tilde.com/

http://www.iminds.be/

http://www.agroknow.gr/agroknow

http://wripl.com/

http://www.vistatec.com/index.html

http://infai.org/de/Aktuelles

http://www.ismb.it/

http://www.dfki.de/web

sasaki mlkrep-20150710

Technology

content content

sasaki mlkrep

web content

user content

content architect

content author

news content

content formats keynote