dbpedia as gaeilge chapter

37
DBpedia as Gaeilge Chapter Mee*ng, 9th January 2015

Upload: bianca-pereira

Post on 15-Jul-2015

125 views

Category:

Data & Analytics


6 download

TRANSCRIPT

DBpedia  as  Gaeilge  Chapter  Mee*ng,  9th  January  2015  

@en LOD Cloud

DBpedia  

??? ???

@ga LOD Cloud

Creating DBpedia as Gaeilge Chapter

Wikipedia  Infoboxes   DBpedia  Mappings   DBpedia  Triples   Applica:ons  

DBpedia as Gaeilge

Chapter Workflow

Wikipedia  Infoboxes   DBpedia  Mappings   DBpedia  Triples   Applica:ons  

DBpedia as Gaeilge

Wikipedia Structure & Infoboxes

How Data is Structured in Wikipedia

SOURCE: http://stats.wikimedia.org/EN/SummaryGA.htm

SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett

Terry Pratchett Vicipéid Page

SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett

Infobox (Bosca Sonraí Scríbhneoir)

Terry  Pratche?  

SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett

Infobox (Bosca Sonraí Scríbhneoir)

Vicipéid Infobox Template

Editing the Infobox: Visual UI

SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett

From Wikipedia to DBpedia…

Wikipedia  Infoboxes   DBpedia  Mappings   DBpedia  Triples   Applica:ons  

DBpedia as Gaeilge

•  What are DBpedia mappings? •  How to create those mappings? •  What is the current status?

What is the DBpedia Ontology?

Terry Pratchett is an Artist of subclass …

Writer subclass has label: @en: writer @ga: scríbhneoir @fr: écrivain

SOURCE: http://mappings.dbpedia.org/server/ontology/classes/

Ontology Labels & Comments

•  Current  ontology:  685  classes,  2795  proper:es  •  Star:ng  point:  Label  &  comment  transla:ons  for  the  classes?  

•  These   labels  and  comments  are  available   to  all  DBpedia   queries,   not   just   those   from  ga.dbpedia  

 

Mapping  Process

?

SOURCE: http://mappings.dbpedia.org/server/statistics/ga/

Wikipedia  Infoboxes   DBpedia  Mappings   DBpedia  Triples   Applica:ons  

DBpedia as Gaeilge

•  DBpedia Extraction Framework •  SPARQL Endpoint

Extracting Triples

Extracted Terry Pratchett Triples

SOURCE: http://ga.wikipedia.org/wiki/Terry_Pratchett SOURCE: http://ga.dbpedia.org/sparql

Raw Extractions (not from Ontology Mappings)

SOURCE: http://ga.dbpedia.org/sparql

SPARQL Endpoint Interface

SOURCE: http://ga.dbpedia.org/sparql

Query Result

SPARQL  Endpoint  

SOURCE: http://ga.dbpedia.org/sparql

SPARQL  Endpoint  

SOURCE: http://ga.dbpedia.org/sparql

DBpedia as Gaeilge Use Cases

Wikipedia  Infoboxes   DBpedia  Mappings   DBpedia  Triples   Applica:ons  

DBpedia as Gaeilge

A Linked Data proof-of-concept

SOURCE: http://apps.dri.ie/locationLODer/

Video: http://www.bbc.com/news/entertainment-arts-25324655 Article: http://www.irishtimes.com/culture/books/david-butler-great-writers-enrich-experience-even-the-mundane-1.2082759 Image: http://tvtropes.org/pmwiki/pmwiki.php/Creator/TerryPratchett

http://ga.dbpedia.org/resource/Terry_Pratchett

Archives

What can the Chapter do?

Wikipedia  Infoboxes   DBpedia  Mappings   DBpedia  Triples   Applica:ons  

DBpedia as Gaeilge

•  Adding  Infoboxes  to  Vicipéid  

•  Transla:ng  DBpedia  Ontology  (Labels  &  Proper:es)  

•  Crea:ng  Wikipedia  Infobox  to  Ontology  Mappings

•  Create  ga.dbpedia.org  instance  (Insight)  

•  Crea:ng  applica:ons  

What can we do?

Who can help?

•  Linked  Data  specialists  •  Irish-­‐speaking  so\ware  developers  •  Vicipéid   community,   students,   translators-­‐in-­‐training  

•  Translators,  editors,  linguists,  cultural  scholars  

Cumas  Gaeilge  

Linked  Data  Knowledge  

Editors   Translators  

Cultural  Scholars   Linguists  

Translators  in  training   Students  

         Irish-­‐speaking  So\ware  Developers  

Data  Scien:sts  

Linked  Data  Academics  Groups  &  Possible  Tasks  

Vicipéid  Editors  

Initial Work on DBpedia Gaeilge

ü Instance  ga.dbpedia  created  ü Chapter  website  created    ü Experiments  with  automa:c  transla:ons  of  ontology  class  labels  

       

   SMT

system

English  Labels   Irish  Labels   Comments  canadian  football  

team   foireann  sacair  cheanada   Correct  gramma*cal  case,  football  translated  as  soccer  

database   bunachar  sonraí   correct  

television  personality   pearsantacht  teiliIse   Correct  gramma*cal  case  for  2  nouns  together  

broadcast  network   craoladh  líonra   Specific  term  required  here,  'líonra  craolacháin'  on  focal.ie,  but  stáisiún  in  general  use.  En.  word  order.  

government  agency   rialtas  an  ghníomhaireacht  

English  word  order,  changes  meaning  en*rely,  incorrect  case  used  

military  structure   struchtúr  míleata   Correct  word  order  but  ambiguous  descrip*on  

radio  program   raidió  ríomhchlár   English  word  order,  domain  compu*ng  instead  of  media  for  program  

English Irish

Next Steps & Discussion?

Contacts

http://ga.dbpedia.org

Bianca Pereira [email protected]

Caoilfhionn Lane [email protected]