big data e tecnologie semantiche - utilizzare i linked data come driver d'integrazione di dati
Post on 12-Apr-2017
108 Views
Preview:
TRANSCRIPT
Big Data e tecnologie semantiche - Utilizzare i Linked Data come driver d'integrazione di dati
Giuseppe FutiaNexa Center for Internet and Society, Politecnico di Torino (DAUIN)
27 July 2016
Outline
• Information management challenges and Big Data
• Linked Data framework (explained with examples)
• Linked Data approach for Big Data community
• The impact of Big Structured Data
Enterprise/Research Information Management Challenges
• Disparate data sources and data silos
• Data sources with similar/inconsistent information
• Most of the knowledge is hidden in texts (unstructured data)
• Difficult to integrate and analyse structured and unstructured data
The 3 V’s of Big Data
• Velocity
• Volume
• Variety
The 3 V’s of Big Data
• Velocity
• Volume
• Variety (Veracity and Value)
From Big Linked Data toLinked Big Data
Big Linked Data
Linked Data Cloud Diagram (2014)
Big Linked Data
Linked Data Vision (W3C)• Extend principles of the Web from documents to data
• Data should be accessed using the general Web architecture (e.g., URIs, HTTP, …)
• Data should be linked each other just as documents
• Creation of a common framework that allows:– Data to be shared and reused across applications– Data to be processed automatically– New relationships between pieces of data to be
inferred
Resource Description Framework • Everything is a triple – Subject (resource), Predicate
(relation), Object (resource or literal)
•The Resource Description Framework (RDF) graph is a collection of triples predicate subject object
SPARQL
11
• SQL-like query language for RDF data
• Simple protocol for querying remote databases over HTTP
• Query types– select: query data by complex graph pattern– ask: whether a query returns results (result is true/false)– describe: returns all triples about a particular resource– construct: create new triples based on query results
Nexa projects
Contratti pubblici
Le PEC dei comuni italiani con più di 100 mila abitanti che pubblicano contratti con anomalie
?
TellMeFirstA Knowledge Discovery Application
TellMeFirst Architecture http://tellmefirst.polito.it
“The final work of legendary director Stanley Kubrick, who died within a week of completing the edit, is based upon a novel by Arthur Schnitzler. Tom Cruise and Nicole Kidman play William and Alice Harford, a physician and a gallery manager who are wealthy, successful, and travel in a sophisticated social circle.”
Linked Big Data
Linked Data approach adopted by the Big Data community
• RDF data model for Variety– Flexible, easy to evolve data model– Efficiently integrate structured and unstructured data
• Enrich Big Data with metadata and semantics–More powerful analytics on top of it–Discover implicit links and relationships
• Interlink Big Data sets–Information interchange across a value chain
Semantic technologies for Big Data
Blazegraph and DASL• Blazegraph is a high performance graph database platform
that supports RDF/SPARQL APIs
• In 2016 Blazegraph introduced a programming environment called DASL
• DASL supports the development of graph algorithms within the Apache Spark ecosystem specifically optimised for GPUs
• Complex graph analytic environments, especially where relationships are unknown in advance
EP-SPARQL• Event processing provides on-the-fly analysis of event
streams, but cannot combine streams with background knowledge and cannot performing reasoning tasks
• Semantic tools can effectively handle background knowledge and perform reasoning tasks, but cannot deal with rapidly changing data provided by event streams
• Event Processing SPARQL (EP-SPARQL) as a new language for complex event and stream reasoning
The impact of Big Structured Data
Google Knowledge Graph Freebase-to-Wikidata transition
Facebook’s Social Graph(in 2013)
The Graph API is the primary way to get (our) data in and out
of Facebook's social graph
Facebook Web is progressively smarter than the Web of data…
Open source, libre contents, and linked data as a framework to build an open linked big data graph
Grazie!
Mailgiuseppe.futia@polito.it
Repository GitHubhttps://github.com/giuseppefutia/
top related