lod2 webinar series: dbpedia spotlight

34
LOD2 Webinar . 26.02.2013 . Page 1 http://lod2.eu Creating Knowledge out of Interlinked Data

Upload: lod2-creating-knowledge-out-of-interlinked-data

Post on 01-Nov-2014

753 views

Category:

Documents


2 download

DESCRIPTION

DBpedia Spotlight is a tool employed in the Extraction stage of the LOD Lyfe Cycle, performing Entity Recognition and Linking. Although the tool currently specializes in English language, the support for other languages is currently being tested, and demos for German, Dutch and others are available or underway. The tool can be used to enable faceted browsing, semantic search, among other applications. In this webinar we will describe what is DBpedia Spotlight, how it works and how can you benefit from it in your application. If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series! http://lod2.eu/BlogPost/webinar-series

TRANSCRIPT

Page 1: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013 . Page 1 http://lod2.eu

Creating Knowledge out of Interlinked Data

Page 2: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. This 4-year project comprises leading Linked Open Data technology researchers, companies, and service providers. Coming from across 12 countries the partners are coordinated by the Agile Knowledge Engineering and Semantic Web Research Group at the University of Leipzig, Germany.

LOD2 will integrate and syndicate Linked Data with existing large-scale applications. The project shows the benefits in the scenarios of Media and Publishing, Corporate Data intranets and eGovernment.

Page 3: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

Once per month the LOD2 webinar series offer a free webinar about tools and services along the Linked Open Data Life Cycle.

Stay with us and learn more about acquisition, editing, composing, connected applications – and finally publishing Linked Open Data.

Page 4: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 4 http://lod2.eu

Creating Knowledge out of Interlinked Data

Agenda

Profiles: Pablo N Mendes and the DBpedia Spotlight team

Linked Data life cycle and role of DBpedia Spotlight within LOD2

What is DBpedia Spotlight

Demonstration

Lessons Learned and Next steps

Q&A

Page 5: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 5 http://lod2.eu

Creating Knowledge out of Interlinked Data

Pablo N. Mendes and the DBpedia Spotlight team

Pablo N. MendesResearch Associate at the

Open Knowledge Foundation, Germanyhttp://okfn.de

Interests: - Information Extraction, Integration,

Retrieval and Exploration More info:http://pablomendes.com

ContributorsSandro Coelho (BS student at UFJF, Brazil)Chris Hokamp (PhD student at University of North Texas, USA)Dirk Weissenborn (MS student at University of Dresden, Germany)Liu Zhengzhong (now PhD student at Carnegie Mellon University, USA)Marcus Nitschke (student at U. Leipzig)...Full list on GitHub.

Co-maintainersMax Jakob (Neofonie Gmbh)Joachim Daiber (MS student at the Rijksuniversiteit Groningen)

FundingLOD2, DICODE, Google Summer of Code 2012, IKS

HostingU.Mannheim, MTA SZTAKI, Globo.com, RNP.br

Page 6: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 6 http://lod2.eu

Creating Knowledge out of Interlinked Data

Linked Data Life Cycle

Classification Enrichment

Quality Analysis

Evolution Repair

Search Browsing

Exploration

Extraction

Storage Querying

Manual revision

authoring

Interlinking Fusing

Page 7: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 7 http://lod2.eu

Creating Knowledge out of Interlinked Data

Linked Data Life Cycle

Classification Enrichment

Quality Analysis

Evolution Repair

Search Browsing

Exploration

Extraction

Storage Querying

Manual revision

authoring

Interlinking Fusing

Page 8: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 8 http://lod2.eu

Creating Knowledge out of Interlinked Data

Shedding Light on the Web of Documents

Page 9: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 9 http://lod2.eu

Creating Knowledge out of Interlinked Data

Named Entity Recognition/Disambiguation• Automatically put Wikipedia links to (plain) text.

Page 10: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 10 http://lod2.eu

Creating Knowledge out of Interlinked Data

Named Entity Recognition/Disambiguation• Automatically put Wikipedia links to (plain) text.

• 1. Recognition: find „interesting“ strings• surface forms

Page 11: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 11 http://lod2.eu

Creating Knowledge out of Interlinked Data

Named Entity Recognition/Disambiguation• Automatically put Wikipedia links to (plain) text.

• 1. Recognition: find „interesting“ strings• surface forms

Page 12: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 12 http://lod2.eu

Creating Knowledge out of Interlinked Data

Named Entity Recognition/Disambiguation

• Automatically put Wikipedia links to (plain) text.

• 1. Recognition: find „interesting“ strings• surface forms

• 2. Disambiguation: choose appropriate Wikipedia page• Each Wikipedia page represents an entity

• Every surface form can have multiple candidate entities for linking

Page 13: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 13 http://lod2.eu

Creating Knowledge out of Interlinked Data

Michael Jackson died in 2007.

Page 14: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 14 http://lod2.eu

Creating Knowledge out of Interlinked Data

Michael Jackson died in 2007.

• Recognition: Find surface forms

Page 15: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 15 http://lod2.eu

Creating Knowledge out of Interlinked Data

[Michael Jackson] died in 2007.

• Recognition: Find surface forms

Page 16: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 16 http://lod2.eu

Creating Knowledge out of Interlinked Data

[Michael Jackson] died in 2007.

• Disambiguation: Choose correct entity

Page 17: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 17 http://lod2.eu

Creating Knowledge out of Interlinked Data

[Michael Jackson] died in 2007.

• Disambiguation: Choose correct entity• Candidates for [Michael Jackson]

Page 18: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 18 http://lod2.eu

Creating Knowledge out of Interlinked Data

[Michael Jackson] died in 2007.• Disambiguation: Choose correct entity

• Candidates for [Michael Jackson]

Page 19: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 19 http://lod2.eu

Creating Knowledge out of Interlinked Data

[Michael Jackson] died in 2007.• Disambiguation: Choose correct entity

• Candidates for [Michael Jackson]

context

Page 20: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 20 http://lod2.eu

Creating Knowledge out of Interlinked Data

[Michael Jackson] came to Paris.• Disambiguation: Choose correct entity

• Candidates for [Michael Jackson]

Singer Journalist

less distinctive context

Page 21: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 21 http://lod2.eu

Creating Knowledge out of Interlinked Data

[Michael Jackson] came to Paris.• Disambiguation: Choose correct entity

• Candidates for [Michael Jackson]

Singer Journalist

less distinctive context

Page 22: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 22 http://lod2.eu

Creating Knowledge out of Interlinked Data

Probabilities

• P(entity | surface form)

• Who is typically meant by a name?

• For example, given [Michael Jackson] (and ignoring the context), what are the probabilities of the candidates?

• Michael J ackson (singer) 0.98

• Michael J ackson (journalist) 0.02

• Other useful probabilities:• P(surface form | entity), P(entity), P(surface form)

• Estimate Maximum Likelihood using Wikipedia page links

Page 23: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 23 http://lod2.eu

Creating Knowledge out of Interlinked Data

Data Processing• Two pipelines

− Single machine with Scala− MapReduce-style with Apache Pig

• Apache Pig for analyzing large datasets on top of Hadoop

− Data-flow language

− Think in tuples, bags and maps

− load, filter, join, group by, store, …

− from which Pig derives a MapReduce plan− We build on pignlproc , started by Olivier Grisel (Stanbol)

Page 24: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 24 http://lod2.eu

Creating Knowledge out of Interlinked Data

Probability estimation

• P( entity | surface form ) =

• P( Michael J ackson (singer) | Michael J ackson) = 0.98

• P( Michael J ackson (journalist) | Michael J ackson) = 0.02

• Check the project web for estimation of other scores

– Other probabilities...

– TF*ICF (modification of TF*IDF) and others...

count( surface form, entity )

count( surface form )

Page 25: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 25 http://lod2.eu

Creating Knowledge out of Interlinked Data

Page 26: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 26 http://lod2.eu

Creating Knowledge out of Interlinked Data

Annotate

http://dbpedia.org/resource/LSU_Tigers

Page 27: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 27 http://lod2.eu

Creating Knowledge out of Interlinked Data

Annotate

http://dbpedia.org/resource/LSU_Tigers

http://dbpedia.org/resource/No. 4 (album)

Page 28: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 28 http://lod2.eu

Creating Knowledge out of Interlinked Data

Top K Candidates

LSU_Tigers

Louisiana State University

Page 29: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 29 http://lod2.eu

Creating Knowledge out of Interlinked Data

Demo:– http://spotlight.dbpedia.org/demo/

Web Service:– http://spotlight.dbpedia.org/rest/{API}– APIs:

• Phrase Recognition (/spot), Disambiguation (/disambiguation)• Top K disambiguations (/candidates)• Annotation (/annotation)

Source code:– https://github.com/dbpedia-spotlight/dbpedia-spotlight/

Apache V2 License!

Page 30: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 30 http://lod2.eu

Creating Knowledge out of Interlinked Data

Lessons learned

A generic solution to the problem is tough– Most of the research focuses on solving very specialized cases– Some entity types are harder than others– Some types of text are harder than others

Yet, users expect it to “just work”.

We are focusing on a generic core that can be easily customized.

Page 31: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013. Page 31 http://lod2.eu

Creating Knowledge out of Interlinked Data

Next steps

More experiments with DBpedia Spotlight in the context of LOD2 Use Case packages: Wolters Kluwer (legal domain, German language), Emergency Response,

Automating build process and release to LOD2 Stack

Expanding to other languages

Easier adaptation to other knowledge bases beyond DBpedia

New algorithms, collective disambiguation, etc.

Page 32: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 26.02.2013 . Page 32 http://lod2.eu

Creating Knowledge out of Interlinked Data

Credits

Jingle R.E.M., Martin Kaltenböck, Florian Kondert

Coordination Thomas Thurner

Martin Kaltenböck

Moderation Martin Kaltenböck

Presented by Pablo N. Mendes

Slides from Pablo N. Mendes, Max Jakob, Joachim Daiber

Page 33: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 29.11.2011 . Page 33 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

Hope you enjoyed staying with us – if you need more detailed information, visit us at www.lod2.eu and let us know how we can improve to meet your expectations!

Don’t forget to register for our next webinar

27.03.2013 – CKAN and PublicData.eu (OKFN) April – Vituoso 7 (Openlink Software)

Have a great day and don’t forget ...

Page 34: LOD2 Webinar Series: DBpedia Spotlight

LOD2 Webinar . 29.11.2011 . Page 34 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu