big data e tecnologie semantiche - utilizzare i linked data come driver d'integrazione di dati

Post on 12-Apr-2017

108 Views

Category:

Internet

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Big Data e tecnologie semantiche - Utilizzare i Linked Data come driver d'integrazione di dati

Giuseppe FutiaNexa Center for Internet and Society, Politecnico di Torino (DAUIN)

27 July 2016

Outline

• Information management challenges and Big Data

• Linked Data framework (explained with examples)

• Linked Data approach for Big Data community

• The impact of Big Structured Data

Enterprise/Research Information Management Challenges

• Disparate data sources and data silos

• Data sources with similar/inconsistent information

• Most of the knowledge is hidden in texts (unstructured data)

• Difficult to integrate and analyse structured and unstructured data

The 3 V’s of Big Data

• Velocity

• Volume

• Variety

The 3 V’s of Big Data

• Velocity

• Volume

• Variety (Veracity and Value)

From Big Linked Data toLinked Big Data

Big Linked Data

Linked Data Cloud Diagram (2014)

Big Linked Data

Linked Data Vision (W3C)• Extend principles of the Web from documents to data

• Data should be accessed using the general Web architecture (e.g., URIs, HTTP, …)

• Data should be linked each other just as documents

• Creation of a common framework that allows:– Data to be shared and reused across applications– Data to be processed automatically– New relationships between pieces of data to be

inferred

Resource Description Framework • Everything is a triple – Subject (resource), Predicate

(relation), Object (resource or literal)

•The Resource Description Framework (RDF) graph is a collection of triples predicate subject object

SPARQL

11

• SQL-like query language for RDF data

• Simple protocol for querying remote databases over HTTP

• Query types– select: query data by complex graph pattern– ask: whether a query returns results (result is true/false)– describe: returns all triples about a particular resource– construct: create new triples based on query results

Nexa projects

Contratti pubblici

Le PEC dei comuni italiani con più di 100 mila abitanti che pubblicano contratti con anomalie

?

TellMeFirstA Knowledge Discovery Application

TellMeFirst Architecture http://tellmefirst.polito.it

“The final work of legendary director Stanley Kubrick, who died within a week of completing the edit, is based upon a novel by Arthur Schnitzler. Tom Cruise and Nicole Kidman play William and Alice Harford, a physician and a gallery manager who are wealthy, successful, and travel in a sophisticated social circle.”

Linked Big Data

Linked Data approach adopted by the Big Data community

• RDF data model for Variety– Flexible, easy to evolve data model– Efficiently integrate structured and unstructured data

• Enrich Big Data with metadata and semantics–More powerful analytics on top of it–Discover implicit links and relationships

• Interlink Big Data sets–Information interchange across a value chain

Semantic technologies for Big Data

Blazegraph and DASL• Blazegraph is a high performance graph database platform

that supports RDF/SPARQL APIs

• In 2016 Blazegraph introduced a programming environment called DASL

• DASL supports the development of graph algorithms within the Apache Spark ecosystem specifically optimised for GPUs

• Complex graph analytic environments, especially where relationships are unknown in advance

EP-SPARQL• Event processing provides on-the-fly analysis of event

streams, but cannot combine streams with background knowledge and cannot performing reasoning tasks

• Semantic tools can effectively handle background knowledge and perform reasoning tasks, but cannot deal with rapidly changing data provided by event streams

• Event Processing SPARQL (EP-SPARQL) as a new language for complex event and stream reasoning

The impact of Big Structured Data

Google Knowledge Graph Freebase-to-Wikidata transition

Facebook’s Social Graph(in 2013)

The Graph API is the primary way to get (our) data in and out

of Facebook's social graph

Facebook Web is progressively smarter than the Web of data…

Open source, libre contents, and linked data as a framework to build an open linked big data graph

Grazie!

Mailgiuseppe.futia@polito.it

Repository GitHubhttps://github.com/giuseppefutia/

top related