semantically enhanced quality assurance in the jurion business use case
TRANSCRIPT
![Page 1: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/1.jpg)
Semantically Enhanced Quality Assurance in the
JURION Business Use Case
Dimitris Kontokostas, Christian Mader, Christian Dirschl, Katja Eck, Michael Leuthold,
Jens Lehmann, Sebastian Hellmann
![Page 2: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/2.jpg)
ESWC 2016
Overview
● Wolters Kluwers overview● Use Case Tools● Challenges● Solutions● Evaluation● Future Work
![Page 3: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/3.jpg)
ESWC 2016
Wolters Kluwers
Wolters Kluwer provides solutions to customers in over 170 countries and provides content in at least a dozen languages.
Focusing on legal, tax, finance and health industries.
![Page 4: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/4.jpg)
ESWC 2016
Wolters Kluwer Transformation
![Page 5: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/5.jpg)
ESWC 2016
Wolters Kluwer Transformation
Quality
![Page 6: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/6.jpg)
ESWC 2016
WKD in LOD2 project
![Page 7: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/7.jpg)
ESWC 2016
![Page 8: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/8.jpg)
ESWC 2016
WKD in the ALIGNED Project
![Page 9: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/9.jpg)
ESWC 2016
RDF in the publishing industry
![Page 10: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/10.jpg)
ESWC 2016
Use Case Tools
![Page 11: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/11.jpg)
ESWC 2016
● TDDD: Test Driven (Data) Development○ Methodology, definitions & Tools
● SPARQL● Reusable unit tests for
○ vocabularies○ datasets○ applications
● Test Auto Generators○ OWL○ IBM Shapes○ DSP (Dublin Core Set Profiles)○ W3c Shapes (in progress)
● Open Source (Apache license)
● Stable tool, used in many research & industrial settings
http://rdfunit.aksw.org
![Page 12: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/12.jpg)
ESWC 2016
https://www.poolparty.biz
● Commercial product developed by Semantic Web Company● Thesauri development in a collaborative way
○ From scratch / by extraction of terms from a document corpus
● Compliance to the 5-star Open Data principles (RDF & SKOS)● Automatically retrieve potential additional concepts for inclusion into the
thesauri by querying SPARQL endpoints (e.g. DBpedia)● identify and link to related resources from local / remote projects ● Simple ontology editing (rdf:type, rdfs:subClassOf, rdfs:domain/range,...)● Automated quality assurance mechanisms
○ Conformance to SKOS or a custom schema
○ Enforcement level of some quality metrics can be configured by the user so that it is, e.g.,
possible to get an alert if circular hierarchical relation○ Check a taxonomy “as a whole” against a set of potential quality violations
![Page 13: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/13.jpg)
ESWC 2016
Challenges
![Page 14: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/14.jpg)
ESWC 2016
Metadata RDF Conversion Verification
Existing Infrastructure
● Platform Content Interface (PCI) ontology
○ proprietary schema that describes legal documents and metadata in OWL
● PCI revisions => verify data conforms to PCI
● Proprietary SOAP-based validation service
○ Package based validation => hard error detection
○ Asynchronous & complex web service => hard to use
○ Network dependency => potentially unstable
![Page 15: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/15.jpg)
ESWC 2016
Metadata RDF Conversion Verification
Continuous & high quality triplification of semi-structured data is a common problem in the information industry. Schema changes and enhancements are routine tasks, but ensuring data quality is still very often purely manual effort. So any automation will support a lot of real-life use cases in different domains.
Goal: Based on the schema, test cases should automatically be created, which are run on a regular basis against the data that needs to be transformed. The errors detected will lead to refinements and changes of the XSLT scripts and sometimes also to schema changes, which impose again new automatically created test cases
![Page 16: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/16.jpg)
ESWC 2016
![Page 17: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/17.jpg)
ESWC 2016
RDFUnit / JUnit Integration
![Page 18: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/18.jpg)
ESWC 2016
Quality Control in Thesaurus Management
● WKD develops multiple controlled vocabularies for annotating documents (e.g., court decision, labour law,...) using PoolParty
● Interconnected to each other● Consistency and quality must be ensured over all vocabularies● Various quality issues, e.g.,
○ Duplicates○ Links to deprecated (deleted) concepts○ Unresolvable links
● Up to now curated manually in deployed system, regular errors in production versions
![Page 19: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/19.jpg)
ESWC 2016
Quality Control in Thesaurus Management
The creation and maintenance of knowledge models is gaining importance in the Web of Data. These tasks are increasingly being executed by SME’s in the domain, not in knowledge modelling and IT as such. Therefore, better automatic support of these processes will directly help achieving quality and efficiency gains.
● Automated quality checks over multiple vocabularies● Improved notifications: email on changes performed by users● Additional statistics on, e.g, vocabulary dependencies, changes, etc
![Page 20: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/20.jpg)
ESWC 2016
Vocabulary link validation (PoolParty)
● Uses project metadata to identify linked vocabularies
● Link is invalid if target concept is either deprecated or deleted
● Creates a report for human curators
● Vocabulary repair still manual process
Quality Control in Thesaurus Management
![Page 21: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/21.jpg)
ESWC 2016
Results & Evaluation
The analysis is based on measured metrics and the qualitative feedback of experts and users.
Participants of the evaluation study were selected from WKD staff in the fields of software development and data development. There were seven participants in total: four involved in the expert evaluation and three content experts involved in the usability/interview evaluation.
● Productivity
● Quality
● Agility
![Page 22: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/22.jpg)
ESWC 2016
Productivity (RDFUnit)
● Total time for quality checks and error detection● The time need for manual interaction.
What we measured:
● 1ms to 50ms per single test (depending on the document / ontology size)○ as close to real-time as possible, currently a couple of minutes
● Quality checks can be triggered by manual execution, but they are always verified automatically by the CI build system
● A total of 44.000 tests with a total duration of 11 minutes ○ may scale-up easily when parallelized or clustered
![Page 23: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/23.jpg)
ESWC 2016
Quality (RDFUnit)
What kind of errors can be detected and is categorization possible?
● Experts concluded that it is helpful to spot errors introduced by changes, since issues spotted in this way can be assumed to point to really existing errors; the causes of which can be identified and addressed
● Successful tests are less significant as we are not yet able to evaluate whether and how the measurements taken correspond to target measures and these tests do not point to concrete errors.
○ Coverage & other metrics needed
![Page 24: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/24.jpg)
ESWC 2016
Agility (RDFUnit)
… time to include new requirements
● Including new constraints or adapting existing constraints works by adding new reference documents to the input dataset to make the test environment as representative as possible.
● The process of generating tests and testing is fully automated, it adapts very easily to changed parameters.
● Adding more documents to the input dataset increases the total runtime
![Page 25: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/25.jpg)
ESWC 2016
Productivity (PoolParty)
● The number of checked links● The number of violations ● The total time
What we measured:
The presentation of the results was well understood. In general, the tool was received well by the experts, which was reflected by their feedback in the interviews.
![Page 26: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/26.jpg)
ESWC 2016
Quality (PoolParty)
● No false broken link detection● Prototype still lacks some usability.
![Page 27: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/27.jpg)
ESWC 2016
Agility (PoolParty)
… integration, configuration time and extension
● Very useful for getting an overview
● cases it is desired to limit the link lookups and adapt the way links to external datasets are detected
○ Use custom base URI or regular expression-based techniques
● Re-configuration is possible but recompiling the application might be needed
○ Plans to delegate this process to unified views
![Page 28: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/28.jpg)
ESWC 2016
Future Work
● Error analysis (statistics, time to fix an issue, regressions)
● Test coverage and better metrics
● Improve the UI of the Link Validation tool
● Provide more advanced settings
● Inter-repository Link Validation
![Page 29: Semantically enhanced quality assurance in the jurion business use case](https://reader031.vdocument.in/reader031/viewer/2022030313/58a4aba11a28abe2428b62fd/html5/thumbnails/29.jpg)
ESWC 2016
Thank You!
Questions ?
(You might want to) take a look at…RDF and XML Interoperability W3c Community grouphttps://www.w3.org/community/rax/