introducing jsonpedia

21
JSONpedia Facilitating consumption of MediaWiki content. WWW.SPAZIODATI.EU Michele Mostarda <[email protected]> , TW: @micmos mercoledì 10 ottobre 12

Upload: spaziodati

Post on 15-Jan-2015

2.804 views

Category:

Technology


0 download

DESCRIPTION

Introduction to JSONpedia a JSON version of Wikipedia

TRANSCRIPT

Page 1: Introducing JSONpedia

JSONpediaFacilitating consumption of MediaWiki content.

WWW.SPAZIODATI.EU

Michele Mostarda <[email protected]>, TW: @micmosmercoledì 10 ottobre 12

Page 2: Introducing JSONpedia

What is JSONpedia?

mercoledì 10 ottobre 12

Page 3: Introducing JSONpedia

“JSONpedia is a library and a web service meant to read WikiText markup as JSON.”

mercoledì 10 ottobre 12

Page 4: Introducing JSONpedia

‣ Initially conceived as a tool to produce data to train Machine Learning models.

‣ The REST service,inspired by Sweeble Crystalball,produces JSON, HTML and (coming soon) RDF data.

‣ Written over a context-dependent event based parser to be more performant than an Regex based parser (like the wikiparser) or a DOM based parser (like Sweeble).

mercoledì 10 ottobre 12

Page 5: Introducing JSONpedia

Differences with Sweeble

mercoledì 10 ottobre 12

Page 6: Introducing JSONpedia

‣ Lightweight Event based parser.‣ More tolerant to frequent syntax errors

present within WikiText pages.‣ Serializes to JSON output which is easier

to consume!

mercoledì 10 ottobre 12

Page 7: Introducing JSONpedia

Differences with DBpedia

mercoledì 10 ottobre 12

Page 8: Introducing JSONpedia

‣ JSONpedia doesn't add any semantic to the extracted data.

‣ JSONpedia could integrate the current DBpedia regex-based parser.

‣ JSONpedia is a not competitor of DBpedia but rather a complement.

mercoledì 10 ottobre 12

Page 9: Introducing JSONpedia

JSONpedia Internals

mercoledì 10 ottobre 12

Page 10: Introducing JSONpedia

ArchitectureParser

Input WikiText

Structure

Validator

Extractor

Splitter

Linker

+

DBpedia API/Freebase

Output JSON

mercoledì 10 ottobre 12

Page 11: Introducing JSONpedia

WikiText Parser Events// Document bounding.void beginDocument(URL document);void endDocument();

// Error handling.void parseWarning(String msg, ParserLocation location);void parseError(Exception e, ParserLocation location);

// Tag handling.void beginTag(String node, Attribute[] attributes);void endTag(String node);void inlineTag(String node, Attribute[] attributes);void commentTag(String comment);

// Sectionsvoid section(String title, int level);

// Referencesvoid beginReference(String label);void endReference(String label);

// Linksvoid beginLink(String url);void endLink(String url);

// listsvoid beginList();void listItem();void endList();

// Templatesvoid beginTemplate(String name);void endTemplate(String name);

// Tablesvoid beginTable();void headCell(int row, int col);void bodyCell(int row, int col);void endTable();

// Generic parametervoid parameter(String param);// parameter / text valuevoid text(String content);

mercoledì 10 ottobre 12

Page 12: Introducing JSONpedia

WikiText Processors

‣ Structure‣ Extractors‣ Linkers‣ Splitters‣ Validator

Processors receive the stream of events generated by the parser and perform data construction and transformation.

mercoledì 10 ottobre 12

Page 13: Introducing JSONpedia

Structure

The Structure Processor receives a stream of WikiText parsing events and builds a 1-1JSON representation of the document DOM.

mercoledì 10 ottobre 12

Page 14: Introducing JSONpedia

Extractors

Extractors are specific Processors that collect a certain type of data from the event stream: for example the SectionsExtractor collects the list of all sections detected in the document stream.

mercoledì 10 ottobre 12

Page 15: Introducing JSONpedia

Linkers

A Linker is a Processor which links the current document entity to other informations acquired from external sources. An example of Linker is the FreebaseLinker which connects an entity to the same representation in Freebase if any.

mercoledì 10 ottobre 12

Page 16: Introducing JSONpedia

Splitters

A Splitter is a Processor able to cut sub trees of the JSON document built by the Structure processor. An example of Splitter is the TableSplitter which extract the JSON structures representing the tables declared in the document.

mercoledì 10 ottobre 12

Page 17: Introducing JSONpedia

Validator

A Validator is a Processor performing the check of data structures parsed from a document.

mercoledì 10 ottobre 12

Page 18: Introducing JSONpedia

Forthcoming Features

‣ JSONpedia DB (based on MongoDB + ElasticSearch) can be queried online. Also JSONpedia dumps will be available.

‣ Online data model Exporter Tool (CSV)‣ RDF output.

mercoledì 10 ottobre 12

Page 19: Introducing JSONpedia

Release

JSONpedia will be fully released OpenSource in by the end of the year.

mercoledì 10 ottobre 12

Page 20: Introducing JSONpedia

Live Demo

http://bit.ly/jsonpediaor

http://json.it.dbpedia.org/frontend/form.html

mercoledì 10 ottobre 12

Page 21: Introducing JSONpedia

Thanks!

Michele Mostarda <[email protected]>, TW: @micmos

WWW.SPAZIODATI.EU

mercoledì 10 ottobre 12