tomas knap | rdf data processing and integration tasks in unifiedviews: use cases & lessons...
Post on 07-Jan-2017
64 Views
Preview:
TRANSCRIPT
Tomas KnapSemantic Web Company
RDF Data Processing and Integration Tasks in UnifiedViews
Use Cases & Lessons Learned
1
Agenda
▸ UnifiedViews▹ Introduction of the Tool
▸ UnifiedViews Use Cases▹ 3 Use Cases▹ Benefits/Lessons Learned
2
UnifiedViews Motivation
▸ Maintaining RDF data processing tasks is challenging▹ Different tools▹ Different configurations▹ Tens of data processing tasks
sharing parts of the data processing
▸ Debugging
4
UnifiedViewsApproach
▸ UnifiedViews is an ETL tool for RDF data processing▹ Allows users to manage RDF data
processing tasks▹ Natively supporting RDF data
format
5
UnifiedViews Approach
▸ Standard maintenance interface▹ Define, execute, monitor, schedule, and
share data processing tasks▹ Predefined and customizable building
blocks (plugins) to set up the individual data processing tasks
▸ Debugging features▸ Simplified documentation
▹ Visualizations of the prepared tasks■ Plugins■ Data flow
6
UnifiedViews Core Components
▸ Web administration interface▹ Define and maintain pipelines▹ Validate, execute, monitor pipelines▹ Possibility to schedule pipelines
■ Notifications ▹ Possibility to debug pipelines▹ Possibility to share pipelines and plugins▹ Define and maintain plugins▹ Multi-user environment, SSO support
▸ Robust engine running the tasks▸ API to work with tasks, executions,
schedulled events
8
UnifiedViews Core Plugins
▸ Set of Core plugins available▹ Extractors
■ Obtaining external sources (CSV, DBF, XLS, XML files, RDF data, or relational tables)
▹ Transformers■ Transforming them between various formats
(e.g. CSV files to RDF data, relational tables to RDF data)
■ Executing typical transformations such as SPARQL Update queries, or XSL transformations
▹ Loaders■ Loading the transformed and curated data to
external systems, repositories
▸ 35+ plugins
9
UnifiedViews Custom Plugins
▸ Easy way to extend UnifiedViews with your own plugins
▹ Guide for creating new plugins▹ Tutorials
10
PoolParty Semantic Integrator and UnifiedViews
▸ UnifiedViews is part of PoolParty Semantic Integrator
▸ A semantic technology suite▹ Organize and maintain company
knowledge▹ Annotate documents with concepts from
the knowledge base▹ Provide focused search on top of the
annotated document space▸ https://www.poolparty.biz/
▹ Or please visit PoolParty booth
12
UnifiedViews Availability
▸ Available under an open source license (GPL + LGPL v3)▹ Commercial license also available as part
of PoolParty Semantic Integrator
▸ Hosted on GitHub▹ https://github.com/UnifiedViews
▸ Latest release (June 2016):▹ UnifiedViews Core 2.3.1
▸ http://unifiedviews.eu
13
3 Use Cases
1. Aligned Project▹ Extraction/Annotation of data
from Atlassian Confluence/JIRA
2. Boehringer Ingelheim▹ Publication tracker
3. World Bank ▹ Annotation of World Bank docs▹ Integration with MarkLogic
15
About
▸ Aligned project:▹ H2020, http://aligned-project.eu/
▸ One of the goals:▹ Integrate outputs from commercial
tools such as Atlassian Confluence, JIRA to bring a data-centric approach to governance of software and data engineering
17
UnifiedViews Use Case
▸ UnifiedViews pipeline▹ Extracting data from Atlassian
Confluence, JIRA▹ Annotating textual content with a
taxonomy maintained in PoolParty ▹ Loading everything to a remote
triple store
18
Benefits, Lessons Learned
▸ Predefined plugins which may be used out of the box▹ No heavy programming
▸ Easy pipeline management via user interface
▸ Further support when preparing the pipeline▹ Pipeline validation▹ Pipeline debugging
20
About
▸ Boehringer Ingelheim wanted to get better overview over world-wide research activities
▸ Extract and annotate articles published at PubMed▹ http://www.ncbi.nlm.nih.gov/pubmed
▸ Linking of unstructured and structured / internal and external information
22
Demo
https://workingontologist.poolparty.biz/GraphSearch/
24
Benefits, Lessons Learned
▸ Pipelines in UnifiedViews may be easily ▹ scheduled▹ extended in the future
▸ Detailed information about the pipeline executions is available ▹ Events, logs
▸ Maintenance simplified
25
Benefits, Lessons Learned
▸ Missing▹ Long running pipelines
■ Tighter integration of UnifiedViews and PoolParty Semantic Integrator
▹ Loops, conditional execution of plugins
26
About
▸ Goal: Search over annotated World Bank documents▹ World Bank topical taxonomy▹ Geo taxonomy
▸ Demo:▹ http://marklogic-demo.poolparty.biz
28
UnifiedViews Use Case
▸ UnifiedViews pipeline to annotate portions of the World Bank documents▹ Country & region information
annotated with Geo taxonomy▹ Full text, topics annotated with
World Bank topical taxonomy
29
Benefits, Lessons Learned
▸ Easy pipeline management via user interface▹ Easy pipeline configuration
▸ Reusing already existing plugins▹ Pipeline prepared quickly
31
Summary
▸ UnifiedViews▹ UnifiedViews and PoolParty
Semantic Integrator▸ UnifiedViews Use Cases
▹ Conversion of sources to RDF data▹ Annotation of sources▹ Enrichment of the data▹ Publication of the curated data to
the target store▸ UnifiedViews 2.0 in 5mins
33
Summarized Lessons Learned
▸ Easy pipeline management via user interface
▸ Predefined plugins which may be used out of the box▹ No heavy programming▹ Simplified pipeline creation
▸ Further support when preparing pipeline▹ Pipeline validation▹ Pipeline debugging
▸ Pipeline scheduling
34
Contact
Tomas Knap, PhDTechnical Consultant, ResearcherSemantic Web Company
▸ t.knap@semantic-web.at▸ https://www.semantic-web.at/ ▸ https://twitter.com/semwebcompany
35
© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/
top related