data integration and visualization
TRANSCRIPT
Data integration and visualizationIvan ErmilovUniversity of Leipzig
USING RDF
AgendaData discovery
Data conversion
Data integration
Linked Data Lifecycle
http://stack.lod2.eu/blog/
DATA DISCOVERY
Data DiscoveryOntologies
Vocabularies
Documents
Data Discovery: Ontologies
Specification of a conceptualization
Data Discovery: Ontologies
Data Discovery: Ontologieshttp://swoogle.umbc.edu/
http://watson.kmi.open.ac.uk/WatsonWUI/
Data Discovery: VocabulariesFOAF Friend of a Friend:A Semantic Web Vocabulary used to describe people, their activities and their relationships between one another.
It is becoming very popular for people who discover this to setup and have their own FOAF profile.
This vocabulary is the base from which other vocabularies are extended.
Data Discovery: Vocabularies
http://xmlns.com/foaf/spec/
Data Discovery: Vocabularies
Data Discovery: Vocabularies
http://lov.okfn.org/dataset/lov/
Data Discovery: Documents
"Tim Berners-Lee - LinkedIn"@en ._:node0 ._:node0 "Greater Boston Area" . ._:node1 ._:node1 "MIT" ._:node1 "Director, World Wide Web Consortium\n\nAlso, part time Prof in ECS at Southampton University, UK" .
Data Discovery: Documents
http://sindice.com/
Data CatalogsCommunity maintained registry exists
Contains 362 data catalogs (growing)
Based on CKAN data catalog platform
http://datacatalogs.org/
Data Catalogs
http://datacatalogs.org/
What is CKAN?Metadata repository with crowd-sourcing enabled
Everybody can register and publish data about their datasets
Developer-friendly web applicationProvides a well-documented API
Easy to install, easy to use as your own metadata repository
CKAN Architecture
Packages
Resources
containAnd you can search for them
The Data Hub
The Data Hub
Hub of Data
Hub of Data
CKAN APIWell-documented
http://docs.ckan.org/en/latest/api.html
Covers everything you can do with the web interfaceYou can write your own web interface
OKFN maintained library for accessing APIckanclient (python)
CKAN API: MethodsRetrieving data
Creating new data
Update existing data
Delete existing data
Data is: packages, resources, groups, tags, users etc.
CKAN API: Examplesckan = CkanClient(base_location=ckan_api_url,api_key=ckan_api_key)package_list = ckan.package_list() formats = []for package in package_list: resource_list = package[resources]for resource in resource_list: if(not resource['format'] in formats): formats.append(resource['format'])return sorted(formats)
https://github.com/okfn/ckanclient
Use Case: CSV2RDF Conversion
Framework for CSV2RDF conversion
Crowd-sourcing enabled
RDF Visualizations
https://github.com/earthquakesan/CSV2RDF-WIKI
CSV2RDF Conversion: Why CSV?
CSV2RDF Conversion: Data Quality
Data conversion
Data Conversion Structured: Relational Databases
Semi structured: XML, HTML, XLS, CSV, APIs
Unstructured: Raw text
PublicData.eu Statistics
XML
RDB
Spreadsheet
?How does government spending in certain sectors relates to my companys earnings?
How does the historic spending relates to the current figures?
Give me report about all of my customers across the whole organization
Data Conversion
Customscripts
XML
RDB
Spreadsheet
?Data Conversion
XPath
SQL
Resultaggregation
Merging data with RDFXML
RDB
Spreadsheet
Once in RDF:Easily integrate your data
Concepts can be mapped to one another
Query everything with one W3C standard language (SPARQL)
Merging Data with RDF: Example
Blue App has model
Red App has model
Need to integrate Red & Blue models
Merging Data with RDF: Example
Step 1: Merge RDF
Same nodes (URIs) join automatically
Merging Data with RDF: Example
Step 2: Add relationships and rules
(Relationships are also RDF)
Merging Data with RDF: Example
Step 3: Define Green model
(Making use of Red
& Blue models)
Merging Data with RDF: Example
What the Blue app sees:
No difference!
Merging Data with RDF: Example
What the Red app sees
No difference!
Merging Data with RDF: Example
RDF helps bridge other formats/modelsProducers and consumers may use different formats/models
Rules can specify transformations
Inference engine finds path to desired result model
RDFModelTransform
A1
A2A3B1B2C1C2
X
YZ
Ontologies& Rules
Ontologies& Rules
Ontologies& Rules
RDB2RDF
Extract, Transform, Load (ETL)
Automatic Mapping
Semi-Automatic Mapping
R2RML
Sparqlify: Examples
Sparqlify: Examples
Sparqlify: Examples
Sparqlify: Examples
Sparqlify: Examples
Sparqlify: CSV2RDFPrefix pdd: Prefix pdo: Create View Template DefaultMapping AsConstruct {?s ?p1 ?o1 ;?p2 ?o2 ...} With?s = uri(concat(pdd:,csv-path/,?rowId))?p1 = uri(concat(pdo:, ?headingName1))?o1 = plainLiteral(?1)?p2 = ...http://sparqlify.org/
Raw Text Processing: ConTEXTNo installation and configuration required.
Access content from a variety of sources
Instantly show the results of text analysis to users in a variety of visualizations.
Allow refinement of automatic annotations and take feedback into account
Provide a generic architecture where different modules for content acquisition, natural language processing and visualization can be plugged together.
http://rdface.aksw.org/nlp/hub.php
Processing Raw Text: ConTEXT
Data Integration
DefinitionIn general, integration of multiple information systems aims at combining selected systems so that they form a unified new whole and give users the illusion of interacting with one single information system
Semantic Data Integration
Federated SPARQL QueriesQuery processing involving multiple distributed data sources, e.g. Linked Open Data cloud
DBpediaNew York TimesQuery both data collections in an integrated wayExample scenarioDBpedia and New York Times collectionsDBpedia as structured knowledge base
New York Times as a news provider
Federated Query Processing Federation mediator at the server Virtual integration of (remote) data sources Communication via SPARQL protocol
SPARQLData SourceSPARQLData SourceFederationMediator
SPARQLData Source
Query
Federated Query Engines
Engine NameImplementationlanguageLicense
FedXJavaGNU A.G.P.L
SPLENDIDJavaL.G.P.L
LHDJavaMIT
DARQJavaGPL
ANAPSIDPythonGNU G.P.L
ADERISJavaApache
Data Visualization
LD Visualization Techniques
LD Visualization Techniques
LD Visualization Techniques
LD Visualization Techniques
Classification of Visualization Techniques
Comparison of Values/Attributes
http://goo.gl/IvsGbU
http://goo.gl/JeFhlM
Analysis of Relationships and Hierarchies
Analysis of Relationships and Hierarchies
http://rhizomik.net/dbpedia/treemap.jsp
http://lov.okfn.org/dataset/lov/
Analysis of Temporal and Geographical Events
http://lov.okfn.org/dataset/lov/details/vocabulary_dcterms.html
Analysis of Multidimensional Data
http://mbostock.github.io/protovis/ex/cars.html
Other Visualization Techniques
Applications of LD Visualization Techniques
Tool Types
Tool Types
CubeViz
Facete
Thank youIvan [email protected] of Leipzig
FOR YOUR ATTENTION
Click to edit the title text format