data integration and visualization

Download Data Integration And Visualization

If you can't read please download the document

Upload: ivan-ermilov

Post on 16-Apr-2017

805 views

Category:

Education


1 download

TRANSCRIPT

Data integration and visualizationIvan ErmilovUniversity of Leipzig

USING RDF

AgendaData discovery

Data conversion

Data integration

Linked Data Lifecycle

http://stack.lod2.eu/blog/

DATA DISCOVERY

Data DiscoveryOntologies

Vocabularies

Documents

Data Discovery: Ontologies

Specification of a conceptualization

Data Discovery: Ontologies

Data Discovery: Ontologieshttp://swoogle.umbc.edu/

http://watson.kmi.open.ac.uk/WatsonWUI/

Data Discovery: VocabulariesFOAF Friend of a Friend:A Semantic Web Vocabulary used to describe people, their activities and their relationships between one another.

It is becoming very popular for people who discover this to setup and have their own FOAF profile.

This vocabulary is the base from which other vocabularies are extended.

Data Discovery: Vocabularies

http://xmlns.com/foaf/spec/

Data Discovery: Vocabularies

Data Discovery: Vocabularies

http://lov.okfn.org/dataset/lov/

Data Discovery: Documents

"Tim Berners-Lee - LinkedIn"@en ._:node0 ._:node0 "Greater Boston Area" . ._:node1 ._:node1 "MIT" ._:node1 "Director, World Wide Web Consortium\n\nAlso, part time Prof in ECS at Southampton University, UK" .

Data Discovery: Documents

http://sindice.com/

Data CatalogsCommunity maintained registry exists

Contains 362 data catalogs (growing)

Based on CKAN data catalog platform

http://datacatalogs.org/

Data Catalogs

http://datacatalogs.org/

What is CKAN?Metadata repository with crowd-sourcing enabled

Everybody can register and publish data about their datasets

Developer-friendly web applicationProvides a well-documented API

Easy to install, easy to use as your own metadata repository

CKAN Architecture

Packages

Resources

containAnd you can search for them

The Data Hub

The Data Hub

Hub of Data

Hub of Data

CKAN APIWell-documented

http://docs.ckan.org/en/latest/api.html

Covers everything you can do with the web interfaceYou can write your own web interface

OKFN maintained library for accessing APIckanclient (python)

CKAN API: MethodsRetrieving data

Creating new data

Update existing data

Delete existing data

Data is: packages, resources, groups, tags, users etc.

CKAN API: Examplesckan = CkanClient(base_location=ckan_api_url,api_key=ckan_api_key)package_list = ckan.package_list() formats = []for package in package_list: resource_list = package[resources]for resource in resource_list: if(not resource['format'] in formats): formats.append(resource['format'])return sorted(formats)

https://github.com/okfn/ckanclient

Use Case: CSV2RDF Conversion

Framework for CSV2RDF conversion

Crowd-sourcing enabled

RDF Visualizations

https://github.com/earthquakesan/CSV2RDF-WIKI

CSV2RDF Conversion: Why CSV?

CSV2RDF Conversion: Data Quality

Data conversion

Data Conversion Structured: Relational Databases

Semi structured: XML, HTML, XLS, CSV, APIs

Unstructured: Raw text

PublicData.eu Statistics

XML

RDB

Spreadsheet

?How does government spending in certain sectors relates to my companys earnings?

How does the historic spending relates to the current figures?

Give me report about all of my customers across the whole organization

Data Conversion

Customscripts

XML

RDB

Spreadsheet

?Data Conversion

XPath

SQL

Resultaggregation

Merging data with RDFXML

RDB

Spreadsheet

Once in RDF:Easily integrate your data

Concepts can be mapped to one another

Query everything with one W3C standard language (SPARQL)

Merging Data with RDF: Example

Blue App has model

Red App has model

Need to integrate Red & Blue models

Merging Data with RDF: Example

Step 1: Merge RDF

Same nodes (URIs) join automatically

Merging Data with RDF: Example

Step 2: Add relationships and rules

(Relationships are also RDF)

Merging Data with RDF: Example

Step 3: Define Green model

(Making use of Red

& Blue models)

Merging Data with RDF: Example

What the Blue app sees:

No difference!

Merging Data with RDF: Example

What the Red app sees

No difference!

Merging Data with RDF: Example

RDF helps bridge other formats/modelsProducers and consumers may use different formats/models

Rules can specify transformations

Inference engine finds path to desired result model

RDFModelTransform

A1

A2A3B1B2C1C2

X

YZ

Ontologies& Rules

Ontologies& Rules

Ontologies& Rules

RDB2RDF

Extract, Transform, Load (ETL)

Automatic Mapping

Semi-Automatic Mapping

R2RML

Sparqlify: Examples

Sparqlify: Examples

Sparqlify: Examples

Sparqlify: Examples

Sparqlify: Examples

Sparqlify: CSV2RDFPrefix pdd: Prefix pdo: Create View Template DefaultMapping AsConstruct {?s ?p1 ?o1 ;?p2 ?o2 ...} With?s = uri(concat(pdd:,csv-path/,?rowId))?p1 = uri(concat(pdo:, ?headingName1))?o1 = plainLiteral(?1)?p2 = ...http://sparqlify.org/

Raw Text Processing: ConTEXTNo installation and configuration required.

Access content from a variety of sources

Instantly show the results of text analysis to users in a variety of visualizations.

Allow refinement of automatic annotations and take feedback into account

Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together.

http://rdface.aksw.org/nlp/hub.php

Processing Raw Text: ConTEXT

Data Integration

DefinitionIn general, integration of multiple information systems aims at combining selected systems so that they form a unified new whole and give users the illusion of interacting with one single information system

Semantic Data Integration

Federated SPARQL QueriesQuery processing involving multiple distributed data sources, e.g. Linked Open Data cloud

DBpediaNew York TimesQuery both data collections in an integrated wayExample scenarioDBpedia and New York Times collectionsDBpedia as structured knowledge base

New York Times as a news provider

Federated Query Processing Federation mediator at the server Virtual integration of (remote) data sources Communication via SPARQL protocol

SPARQLData SourceSPARQLData SourceFederationMediator

SPARQLData Source

Query

Federated Query Engines

Engine NameImplementationlanguageLicense

FedXJavaGNU A.G.P.L

SPLENDIDJavaL.G.P.L

LHDJavaMIT

DARQJavaGPL

ANAPSIDPythonGNU G.P.L

ADERISJavaApache

Data Visualization

LD Visualization Techniques

LD Visualization Techniques

LD Visualization Techniques

LD Visualization Techniques

Classification of Visualization Techniques

Comparison of Values/Attributes

http://goo.gl/IvsGbU

http://goo.gl/JeFhlM

Analysis of Relationships and Hierarchies

Analysis of Relationships and Hierarchies

http://rhizomik.net/dbpedia/treemap.jsp

http://lov.okfn.org/dataset/lov/

Analysis of Temporal and Geographical Events

http://lov.okfn.org/dataset/lov/details/vocabulary_dcterms.html

Analysis of Multidimensional Data

http://mbostock.github.io/protovis/ex/cars.html

Other Visualization Techniques

Applications of LD Visualization Techniques

Tool Types

Tool Types

CubeViz

Facete

Thank youIvan [email protected] of Leipzig

FOR YOUR ATTENTION

Click to edit the title text format