ontology-driven business intelligence for comparative data analysis · 2013-07-07 · business...
TRANSCRIPT
JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering 1
Michael SCHREFL
& the SemCockpit Team: Neuböck, Neumayr, Schütz, …
26.06.2013
Ontology-drivenBusiness Intelligence for
Comparative Data Analysis
2JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Overview
1. Introduction: Comparative Data Analysis2. Multi-Dimensional Ontology3. Ontology-based Measures and Scores4. Comparative OLAP5. BI Analysis Graphs6. Guidance, Judgment, and Analysis Rules7. Conclusion
3JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
1. Introduction
■ Setting: SemCockpit („Semantic Cockpit“)■ Is-reporting vs. is-to-target-comparison vs. is-to-is-comparison■ Use case: DM2-Analysis project (of an Austrian Health Insurer)■ Comparative Data Analysis: Aims & Characteristics■ The wish list of analysts – and how to meet it■ The technical „ingredients“: md-ontology, (generic) measures &
scores, BI analysis graphs, guidance/judgment/analysis rules ■ Steps to run a comparative analysis project („The SemCockpit
process“)■ Underlying DWH schema (MDO-base)
4JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Setting: The SemCockpit Project
An□ ontology-driven□ interactive□ Business Intelligence tool□ for comparative data analysis
Partners:Solvistas (BI solutions developer and provider)Johannes Kepler University of LinzOÖGKK, DAK (Austrian and German Health Insurers)
Funded by: Austrian Ministry of Transport, Innovation & Industry
5JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
OLAP (Online Analytical Processing) in BI
■ Is-reporting: business monitoring■ Is-to-target comparison: performance management■ Is-to-Is comparison: comparative data analysis
6JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: DM2-Analysis Project (ÖOGKK)
Our insurance organisation pays a high amount for insurants with
Diabetes Mellitus of Type 2 (DM2).
Could costs be reduced?
What can we learn by comparison? What should we
compare with what?
What specific questions to ask statisticians or data miners?
7JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: DM2-Analysis (ÖOGKK, simplified)
8JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Data Warehouse (simple, nothing new)
■ Schema□ Dimension: Levels, rolls-up relation between levels (lattice)□ Fact class: Dimension roles, Measures
■ Extension:□ Fact: Multi-dimensional point (of dimension nodes) + measures□ Dimension node: Each node belongs to one level□ Rolls-up relation rollsUp(sub,sup); transitive,reflexive: rollsUp*
■ Analysis:□ Roll-up facts and cubes with “derived” measures□ Granularity: Levels of nodes of fact, denoted by [L1,…Ln]
9JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Search for meaningful comparisonsAny striking
differences in comparing costs for
DM2 patients?
Avg drug cost per patient, Drug costs prescribed byGPs for regular patients
What drugs to consider? … oral antidiabtic drugs, insulin,
metaformin, …
What are meaningfulcomparison groups?
Patients of different insurersPatients of different provinces
Patients of rural vs. urban areas
10JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
What is a rural district?
What is a regular patient of a doctor?
What is a country doctor?
What are “cheap“oral antidiabetic
drugs?
What is an urban district?
What are DM2-Patients?
Example: Relevant concepts … meaning?
11JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
SemCockpit – An Ontology-Driven, Interactive Business Intelligence Tool for Comparative Data Analysis
Comparative Data Analysis targetedin business terminology
Data Warehouse
(DWH)
BusinessIngelligence
Tool
> 300.000 medical termsand their relationships (terminology)
Snomed CT
MaternalDM2
Diabetes
IA 1B
DM1
Common BI-Tool■ „Diagnoses Codes“ in DWH,
associated medical terminologynot available in BI tool
12JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
SemCockpit – An Ontology-Driven, Interactive Business Intelligence Tool for Comparative Data Analysis
SemCockpit■ Meaning (semantics) of
„Diagnosis Codes“ available inBI tool via ontology
Comparative Data Analysistargetedin business terminology
+ simpler, more efficient
SnomedOntologyRepository
Terminologiy• in OWL• machine readable
BusinessIngelligence
Tool
> 300.000 medical termsand their relationships (terminology)
Snomed CT
MaternalDM2
Diabetes
IA 1B
DM1
Data Warehouse
(DWH)
13JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Comparative Data Analysis in BI: Aims
■ Understand your business■ Know where to investigate further■ Detect striking differences between comparable groups of
entities/facts■ Suggest plausible explanations■ Conjecture or refute possible cause-effect relationships through
further comparisons■ Pre-cursor to data mining
14JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Comparative Data Analysis: Characteristics
■ Initially vague analysis question■ What are relevant measures and comparison groups?■ Specific analysis questions crystallise over time■ Interactive, exploratory, iterative■ Applications of OLAP-operations (on two groups in parallel)■ Repetitive: once questions clear, comparative analysis process
re-applied for similar comparison situations (e.g. one year later)■ Two phases:
1. Search for relevant measures and comparison groups2. Proper comparative analysis (repeated)
15JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: From vague to more specific
■ Specific analysis questions crystallise over time□ Compare the prescription costs for oral anti-diabetic drugs (OAD)
of DM2 patients who are treated by doctors in rural areas with ones of urban areas.
■ Interactive, exploratory, iterative□ Compare patients in Upper Austria with patients in Austria
compare patients in urban districts of Upper Austria with patients in urban districts of Austria …
□ Hundreds of light variations of simple measures (>300 pages list)
■ Applications of OLAP-operations (on two groups in parallel)□ Compare costs of Upper Austria with Austria per year compare
costs of Upper Austria with Austria per month
16JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: DM2-Analysis, Measure Catalogue
17JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Two Phases
1. Search for relevant measures and comparison groups□ Which patients to include when comparing groups of doctors?
- regular patients only (e.g., minimum of 4 contacts per year)but: regular is different for GPs (6 contacts) and oculists (2 contacts)
□ Analyst recognizes that it is meaningful to compare prescription costs of oral anti-diabetic drugs of patients in urban versus patients in rural areas
2. Proper comparative analysis (repeated)□ Once questions clear: comparative analysis process re-applied for
similar comparison situations (e.g. one year later)□ Compare drug prescription costs of urban and rural patients: apply
same comparisons in other years or for other diseases
18JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
The wish list of business analysts
1. “Business terms” as first-class citizens in BI tool2. Support in definition and organisation of business terms3. Use business terms in defining
a) measuresb) comparison groupsc) analysis steps and analysis paths
4. Reuse of existing terminologies of the domain5. “Comparison” (between a group of interest and group of
comparison) as first-class modelling primitive
19JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
The wish list of business analysts (ctd.)
6. Comparison explicitly captured and not left to the “human eye” through visual comparison by diagram inspection
7. Intelligent “guidance” on how to proceed in analysis8. Capture and exploit “judgement knowledge”, i.e., knowledge
about striking differences between compared groups9. Context-sensitive support on which business terms,
measures, comparison groups, or analysis steps are applicable in a given (i.e., already partially specified) analysis situation
10. Automatic mapping of high-level analysis to data warehouse
20JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
What current BI tools provide
“Surf and save” BI tools, e.g., Tableau+ perfect for is-reporting, multi-granular, multi-perspective- no support for business terms- no direct support for comparison- no guidance
Data Warehouse Products of mega vendors+ perfect for complex analysis tasks- no direct support for comparison (but workarounds)- low IT-skilled analysts complain about complex use and
rely on IT-department “programming” cycles
21JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
What SemCockpit provides
Modelling and reasoning support to meet the analysts wish list1. Multi-dimensional ontology (MDO) to capture “business terms”
in data warehouse context 2. MDO-concepts are organised in subsumption hierarchies
along levels, dimension hierarchies, and dimension spaces through reasoning; concepts may be defined for some context and be “overridden” in a subcontext
3. Use of MDO-concepts to define measures and comparison groups
4. Re-use of existing domain ontology as “semantic dimension”
22JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
What SemCockpit provides (ctd.)
5. Comparative analysis situations to express the comparison of a group of interest against a group of comparison
6. Scores (complementing measures) to explicitly capture comparison results
7. Guidance through analysis graphs and associated guidance rules
8. Judgment rules capture judgment knowledge9. Reasoning support to determine concepts, measures, and
comparisons applicable in a current analysis situation10. Automatic mapping of high-level analysis to SQL.
23JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Complementary Work (Selection)
■ Ontology-based Data Warehouse Design□ Multi-dimensional design from ontologies (Oscar Romero et.al.)□ Concept-based design of data warehouses (Jarke et. al.)
■ Ontology-based Integration□ Ontology-based information extraction (e.g. Saggion, et.al.)□ Ontology-based querying (of multiple sources) (e.g. Spahn, et.al.)
■ Extracting semantic web data into a multi-dimensional data warehouse□ Multi-dimensional Integrated Ontologies: Designing Semantic Data
Warehouses (e.g., Nebot/Berlanga/Perez/Aramburu/Pederson)■ SemCockpit
□ a-posterior enrichment of given DWH□ specific support for comparative data analysis
24JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Fact■ md-point
□ Point of interest
■ Measure□ describes a md-point
□ <ins:Vienna,time:2010>.totalCosts
.
Comparative Fact■ 2 md-points
□ Point of interest□ Point of comparison
■ Score□ describes relationship
between 2 md-points□ (<ins:Tyrol,time:2011>,
<ins:Austria,time:2011>).MeanPercentilRkOfCosts-PerPatient
Comparison: Comparative Facts & Scores
25JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Kinds of Comparisons
■ Group of Interest (GoI) and Group of Comparison (GoC):□ Identified by point of interest & comparison (or more, see later)□ Members: Facts
■ Groups involved:□ Group vs. Peer Group□ Group vs. Super Group
■ Scores: Typically based on comparing measure of 2 groups by□ Simple arithmetic: Ratio or percentage difference□ Median/Mean PercentileRank (typically for group vs. super group)
- of median/average “member” of group of interest- considered as “member” of group of comparison
26JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
BI: Conventional vs. SemCockpit
General ExpertisePrevious Insights
Judgement Knowledge
Query-Templates
Business Analyst Manager
DWH
OLAP Tool
General Expertise
Business Analyst Manager
MDO-DWH Mapper
semCockpit Frontend
Multi-Dimensional Ontology (MDO) Engine
Ontology Reasoner
Ontology Manager
Measures & Scores
Domain Ontology
Analysis GraphGuidance Rules
Judgement RulesAnalysis Rules
Conventional Comparative Data Analysis Comparative Data Analysis with semCockpit
sem
Coc
kpit
DWH
Dom
ain-
Kno
wle
dge
Rule Engine
Analysis Graph (AG)
Engine
27JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
SemCockpit: StackContext-SensitiveSemantic Guidance
Analysis Graphs &Judgement Rules
Judgement Rule:FOR „comp. of (Internist,DM2-Pat; GP, DM2-Pat)“IF ratioDrugCosts > 1.4DO Explanation: Internists better trained for DM2
AB
C
Measure & Score Application
<ins:Austria,time:2010>.avgDrugCosts(Rural) (<ins:Austria,time:2010,<ins:Austria,time,2010>).r(Rural,Urban)
multi-layered(stratified, 1st level, 2nd level)
ratioDrugCostsRural-vs-urbanDM2-P
Measures & ScoresavgDrugCostsRuralDM2-P
Generic Measures & ScoresavgDrugCosts(%q) ratioDrugCosts(%q1,%q2)
MD-OntologyDM2-Pat
Urban Rural
District.PopDensity≥ 400 pers/km²
avgDrugCostsUrbanDM2-P
External Ontology (Semantic Dimension)
MaternalDM2
Diabetes
IA 1B
DM1
28JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
The SemCockpit Process
DWH Schema
MDO Base
MDO Concepts, Semantic Dimensions
Measures & Scores
Analysis Graphs
Guidance Rules, Judgement Rules, Analysis Rules
SemanticEnrichment
CalculationDefinition
Model Tranformation
AnalysisDesign
RuleDesign
29JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Development strategy for tool support
■ Focus on primary analysis needs of business partners■ Provide support for most frequent analysis tasks■ Keep it simple (for users and to make reasoning tractable)■ Possibly ignore specific, complicated cases (seldom anyway!)■ Provide extension hooks such that complicated concepts,
measures, scores, and analysis queries can, if needed, still be defined by IT-department and be provided as “built ins”
■ “Built-ins” should be usable like non-built-ins (but can excluded from certain reasoning tasks)
■ Initially basic, e.g., no ragged hierarchies, etc.
30JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
MDO-Base: Canonical Model for DWH
■ Background: Data Cube□ Comprises all „roll-up“ points + derived measures□ Analyst selects points (of some granularity) + measure□ Point: exactly one node for each dimension role of data cube,
(Motivation: avoid „nested structures“ in dimension columns)■ MDO-Base: Like Dimensional-Fact-Model, but
□ Non-dimensional attributes extracted into „entity classes“ (for re-use and for later definition of entity concepts)
□ Unique role assumption for dimension roles to facilitate „drill across“, and (explained later) concept definition and use
□ Named hierarchies and hierarchy-specific dimension roles□ Dimension space: set of dimension roles + granularity/range
31JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
DWH schema (MDO-Base)
drugATC
ins
leadDoc
time
doctor
drugPrescription
quantitycosts
district
province
insurantdistrict
province
monthquarter
year
drugatcChemSubGr
atcPharm
atcTherap
atcAnatomweek
day
e_district
districtinhPerSqkm
e_doctor
doctorage
e_drug
drugprice
e_insurant
insurantageincome
medSec
actDoc
DoctorLoc
MedSec
Insurant
Time
DrugATC
32JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Model Transformation: DWH -> MDO-Base
■ Transformation steps1. If the intended analysis within a dimension is „multi-dimensional“
(e.g., doctor location, doctor medical-section), use hierarchies2. Extract non-dimensional attributes of a level into entity classes
(for sharing across dimensions and for later definition of entity concepts)
3. Define a universal dimension space consisting of the dimension roles (of dimensions) used to define fact classes and data cubes
4. Define dimension spaces, each consisting of a set of dimension roles and a granularity (or granularity range), that are used to define the domain of fact classes (or cubes), concepts, …
33JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
2. Multi-dimensional Ontology (md-ontology)
■ Enriches given dimensional model from underlying data warehouse with “business terms” in analysis context
■ Concepts defined over entities, nodes, points, pairs of points■ Flat vs. hierarchical concepts■ Context-specific and contextualized concepts■ Semantic dimensions (incorporating domain ontology)■ Mapping into OWL: for concept organization■ Mapping into SQL: for execution of OLAP in DWH
34JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
MDO Concepts
■ Defined by: Signature + concept expression■ Signature: Set of individuals for which concept is defined■ Concept expression: Boolean expressions over “properties” of
1. Entities (entity concept)2. Nodes of a dimension (dimensional concept)3. Points of a dimension space (md-concept)4. Pairs of points (comparative md-concept)
■ For later use in□ Definitions of measures and scores□ Specifications of groups of interest & comparison (via
parameters, explained later) in measure & score applications
35JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Entity Concepts of e_district
e_district
districtinhPerSqkm
ruralDistrict
inhPerSqkm <= 400
urbanDistrict
inhPerSqkm > 400
highRuralDistrict
inhPerSqkm <= 50
highUrbanDistrict
inhPerSqkm > 1000
defined for
subsumed by(derived)
36JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example. Entity Concepts of e_insurant
37JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Dimensional Concepts
■ Signature□ Dimension □ Level (flat concept) or level range (hierarchical concept)
Naming convention: flat with small initial, hierarchical with big or ‘In’■ Concept Expression
□ entity concept reference□ hierarchical expansion□ level or level-range restriction□ union, intersection, complement (Prequisite: same level ranges)□ <node> concept [levelOrLevelRange] – expression□ SQL built-in
38JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Hierarchical Concept
■ Motivation:□ Make concepts available for multiple levels / granularities□ Avoid need for roll-ups of nodes or points in use of concepts
■ The general principle of hierarchical concepts□ Individuals organized in a hierarchy (semi-lattice)□ Hierarchy property: If an individual is in the interpretation of a
hierarchical concept, all descendants are in the interpretation as well.
■ Definition:□ By hierarchical expansion: *
39JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Flat vs. Hierarchical Concept
A10BA A10BB
OAD
A10BB02A10BB01A10BA02A10BA01
starterOad(flat)
InStarterOad(hierarchical)
atcTherap
atcPharm
atChemSubGr
A10BC
40JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Node & Level Restriction
Doc_Kufst. Doc_Innsbrk
Doc_Tyrol
Doc_FischerDoc_BauerDoc_MeierDoc_Huber
province
district
doctor
Doc_Graz
Doc_Alltop
<Doc:Tyrol>UrbanDistrict[doctor]
urbanDistrict =inhPerSqkm
> 400
UrbanDistrict=urbanDistrict*
Doc_Vorarlb Doc_Styria
41JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Dimensional Concepts
42JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Multi-Dimensional Concepts (md-Concepts)
■ Signature (or „domain“ of concept)□ Dimension space (set of dimension roles + granularity/range)
■ Concept Expression□ dimensional concept reference□ hierarchical expansion□ granularity or granularity-range restriction□ union, intersection, complement (Prereq.: same granurarity/range)□ <point> concept [granulartiyOrRange] – expression□ fact-based□ SQL-built-in
43JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Multi-Dimensional Concepts
ins
Insurant Doctor
insurant
district
province
doctor
district
province
InsInRuralDistrLeadDocInUrbanDistr= (ins:ruralDistrict AND leadDoc:urbanDistrict)*
InsInHighRuralDistrLeadDocInUrbanDistr= (ins:highRuralDistrict AND
leadDoc:urbanDistrict)*leadDoc
44JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Fact-Based md-Concepts
Insurant
ins
insurant
district
province
day
month
quarter
patWithRegularDocVisitsInYear
NumOfDocVisits >= 7
patWithFrequentDocVisitsInYear
NumOfDocVisits >= 14
year
time
Time
45JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Corner-stone concept expression: <d>(c)[g]
■ <dice-point>(external-slice-cond)[roll-up-granularity/range]■ A point p is in the interpretation iff
□ p rolls-up to the dice-point d (if specified)□ p satisfies the external slice condition c (if specified)□ p is at the indicated granularity g or within the indicated
granularity range g■ Used later to
□ Define cube space (i.e., points of a cube)□ To define or query a cube
■ Analogous for comparison (see later)
46JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: <p>(c)[g]-expression
■ <actDoc:GP, time:2012>(ins:Youth)[time:month]■ <actDoc:GP, time:2012,leadDoc:all,ins:all,drugAtc:all>
(ins:Youth) [time:month,leadDoc:medSection,ins:top,drugAtc:top]
Note: For simplicity, shorthand notation used with missing dimension roles – relative to considered dimension space – in point assumed to be “all”; missing dimension roles in granularity assumed to be at level of point in <p>(c)[g]-expression (supporting tool completes missing info by default at frontend)
47JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Context-Specific Concepts
■ Problem:□ Concepts have different definitions for different nodes / points□ Ex.: Who is regarded to be a regular patient of a doctor in some
year? (Motivation: only those should be considered in analysis)
■ Context-specific concepts□ Signature is extended by a “context specification” (indicating a
subset of nodes of a dimension or points of a dimension space)□ Context is specified by a concept, typically in form of a <p>(c)[g]-
expression
48JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Context-Specific Concept
■ Context-specific concept: regularPatient (Who counts as regular patient of an acting GP in a rural district in a year?)
■ Signature□ Dimension Space: [ins:insurant, actDoc:doctor, time: year]□ Context: @<actDoc: GP>(actDoc:RuralDistrict)[actDoc:doctor]
■ Concept Expression: □ noOfvisits ≥ 3
49JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Contextualized Concepts
■ Aims and issues□ Polymorphic use of a concept signature (like methods in object-
oriented systems)□ Inheritance and overriding along concept hierarchies□ Consistency checking: unique definition
■ Approach:□ Contextualized concepts defines a signature (without concept
expression)□ Several context-specific concepts of the same name provide
different concept expressions for different nodes / points in the domain of the contextualized concept
50JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Contextualized Concept
■ Contextualized concept: regularPatient (Who counts as regular patient of a doctor in a year?)
■ Signature:[patLoc:insurant,actDoc:doctor,time:year]
■ Context-specific concepts.□ regularPatient@<actDoc:GP>(actDoc:RuralDistrict)[actDoc:doctor]
:= noOfvisits ≥ 3□ regularPatient@<actDoc:Oculist>[actDoc:doctor]:= noOfvisits ≥ 2
51JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Context-Specific Contexts: Unique Definition
■ Problem□ Identify relevant “context concept” for concept evaluation
■ General approach (like for “cooperation contracts”)□ Use most specific “context concept”□ Inconsistency: If more than one most specific contexts exists
■ Check for global unique definition of concept1. Consider subsumption hierarchy of “context concepts”2. Apply “abstract super class rule” from oo-modelling:
For each inner context concept A with sub contexts A1 to An add a context concept A0 = A \ (A1 ⋃ … ⋃ An)
3. Check if no two leaf nodes “overlap”, i.e., concepts are disjoint
52JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Context-Specific Concepts: Evaluation
■ Aim□ Polymorphic use: 1 concept name□ Evaluate concept by a single predicate (or SQL view as built-in)
■ Approach1. Ensure unique definition2. For each inner context concept A change c@A to c@A03. Let X be the set of contexts for which c is defined:
Define c as union over x in X: c@x.
53JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Context-Specific Concepts
AllDocs
Urban
Rural
Rural
Oculist GP
≥ 6
≥ 5
≥ 4
≥ 3
≥ 2
Number of contacts to count as regular patient of a doctor in a year
54JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Context-Specific Concepts
AllDocs
Urban
Rural
Rural
Oculist
AllDocsOther
RuralOtherGP
>5
>3
>6
>2 >4
↯
↯ ↯
55JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Comparative Concepts
■ Signature: Comparative dimension space (2 dimension spaces)□ group-of-interest (GoI) dimension space□ group-of-comparison (GoC) dimension space
■ Concept expression:□ Basic form of expression:
– md-concept for GoI, md-concept for GoC– join-condition: relating points of GoI and GoC by some
comparison function (e.g., equality, next, previous)□ Optionally fact-based: Boolean expression of score-value
comparisons□ Plus operators known for md-concepts, such as union, …, built-in
56JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Semantic Dimensions
■ Reuse existing terminologies of domain, e.g, SNOMED□ Terminology: subsumption hierarchy of concepts (OWL)□ Facts refer to these concepts in “semantic dimension”□ Meaning “c only” if non-leaf concept c is referred to
■ Use of semantic dimension [ER2012]□ existing concepts are considered nodes of dimension hierarchy□ subsumption hierarchy is interpreted as “rollsUp”□ new concepts may be defined upon existing ones (post-coordination)□ levels may be introduced “locally” (summarizability checked)□ pre-coordinated concepts can be used like “nodes”□ post-coordinated concepts can be used like “mdo-concepts”
57JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: DM2-Analysis (ÖOGKK, simplified)
drugATC
ins
leadDoc
time
doctor
drugPrescription
quantitycosts
district
province
insurantdistrict
province
monthquarter
year
drugatcChemSubGr
atcPharm
atcTherap
atcAnatomweek
day
medSec
actDoc
Doctor
Insurant
Time
DrugATC
disease
58JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Semantic Attribute SNOMED CT Concept
Clinical Finding
Disease
Metabolic disease
Disease by body site
Disorder of carbohydrate metabolism
Disorder ofbody system
Disorder of glucose metabolism Disorder of
nervous system
Disorder ofendocrine system
Diabetes mellitus
Diabetes mellitus type 1
Diabetes mellitus type 2
Chronic nervous system disorder
Chronic progressive paraparesis Type I diabetes mellitus
with hypoglycemic coma
Type II diabetes mellituswith hypoglycemic coma
Structure of nervous system
Body system structure
Anatomical structure
Anatomical or aquired body
structure
Body structure
Finding site
Finding site
Finding site
MrHuber27/04/2011Problem3500
qty = 27MrHuber
27/04/2011costs = 3500qty = 27
MrHuberDiabetis
27/04/2011costs = 3500qty = 27
Transactional System:Medical Records
Query#1: Get all medical records for Diabetis?
Query#2: Get all medical records for Diabetis without further information?
59JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: SNOMED ontology (selection)SNOMED CT Concept
Clinical Finding
Disease
Metabolic disease
Disease by body site
Disorder of carbohydrate metabolism
Disorder ofbody system
Disorder of glucose metabolism Disorder of
nervous system
Disorder ofendocrine system
Diabetes mellitus
Diabetes mellitus type 1
Diabetes mellitus type 2
Chronic nervous system disorder
Chronic progressive paraparesis Type I diabetes mellitus
with hypoglycemic coma
Type II diabetes mellituswith hypoglycemic coma
Structure of nervous system
Body system structure
Anatomical structure
Anatomical or aquired body
structure
Body structure
Finding site
Finding site
Finding site
60JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: Semantic Dimension augmented by “-only”-nodes
Chronic nervous system disorder
only
Disorder of endoctrine
system only
SNOMED CT Concept
Clinical Finding
Disease
Metabolic disease
Disease by body site
Disorder of carbohydrate metabolism
Disorder ofbody system
Disorder of glucose metabolism Disorder of
nervous system
Disorder ofendocrine system
Diabetes mellitus
Diabetes mellitus type 1
Diabetes mellitus type 2
Chronic nervous system disorder
Chronic progressive paraparesis Type I diabetes mellitus
with hypoglycemic coma
Type II diabetes mellituswith hypoglycemic coma
Disease only
Metabolic disease only
Disorder ofbody
system only
Disorder of carbohydrate
metabolism only
Disorder of body
system only
Disorder of glucose metabolism only
Disorder of nervous
system only
Diabetes mellitus only
Diabetes mellitus type 1 only
Diabetes mellitus type 2 only
Structure of nervous system
Body system structure
Anatomical structure
Anatomical or aquired body
structure
Body structure
Finding site
Finding site
Finding site
61JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: DM
Chronic nervous system disorder
only
Disorder of endoctrine
system only
SNOMED CT Concept
Clinical Finding
Disease
Metabolic disease
Disease by body site
Disorder of carbohydrate metabolism
Disorder ofbody system
Disorder of glucose metabolism Disorder of
nervous system
Disorder ofendocrine system
Diabetes mellitus
Diabetes mellitus type 1
Diabetes mellitus type 2
Chronic nervous system disorder
Chronic progressive paraparesis Type I diabetes mellitus
with hypoglycemic coma
Type II diabetes mellituswith hypoglycemic coma
Disease only
Metabolic disease only
Disorder ofbody
system only
Disorder of carbohydrate
metabolism only
Disorder of body
system only
Disorder of glucose metabolism only
Disorder of nervous
system only
Diabetes mellitus only
Diabetes mellitus type 1 only
Diabetes mellitus type 2 only
DICE
Structure of nervous system
Body system structure
Anatomical structure
Anatomical or aquired body
structure
Body structure
Finding site
Finding site
Finding site
<disease: Diabetes_mellitus>
62JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: DM found in „nervous system”
Chronic nervous system disorder
only
Disorder of endoctrine
system only
SNOMED CT Concept
Clinical Finding
Disease
Metabolic disease
Disease by body site
Disorder of carbohydrate metabolism
Disorder ofbody system
Disorder of glucose metabolism Disorder of
nervous system
Disorder ofendocrine system
Diabetes mellitus
Diabetes mellitus type 1
Diabetes mellitus type 2
Chronic nervous system disorder
Chronic progressive paraparesis Type I diabetes mellitus
with hypoglycemic coma
Type II diabetes mellituswith hypoglycemic coma
Disease only
Metabolic disease only
Disorder ofbody
system only
Disorder of carbohydrate
metabolism only
Disorder of body
system only
Disorder of glucose metabolism only
Disorder of nervous
system only
Diabetes mellitus only
Diabetes mellitus type 1 only
Diabetes mellitus type 2 only
DICE
SLICE
Structure of nervous system
Body system structure
Anatomical structure
Anatomical or aquired body
structure
Body structure
Finding site
Finding site
Finding site
<disease: Diabetes_mellitus>(Finding_site: some
Structure_of_nervous_system)
63JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: DM found in „nervous system“, by DM1 and DM2
Chronic nervous system disorder
only
Disorder of endoctrine
system only
SNOMED CT Concept
Clinical Finding
Disease
Metabolic disease
Disease by body site
Disorder of carbohydrate metabolism
Disorder ofbody system
Disorder of glucose metabolism Disorder of
nervous system
Disorder ofendocrine system
Diabetes mellitus
Diabetes mellitus type 1
Diabetes mellitus type 2
Chronic nervous system disorder
Chronic progressive paraparesis Type I diabetes mellitus
with hypoglycemic coma
Type II diabetes mellituswith hypoglycemic coma
Disease only
Metabolic disease only
Disorder ofbody
system only
Disorder of carbohydrate
metabolism only
Disorder of body
system only
Disorder of glucose metabolism only
Disorder of nervous
system only
Diabetes mellitus only
Diabetes mellitus type 1 only
Diabetes mellitus type 2 only
DICE
SLICE
ROLLUP
Structure of nervous system
Body system structure
Anatomical structure
Anatomical or aquired body
structure
Body structure
Finding site
Finding site
Finding site
<disease: Diabetes_mellitus>(Finding_site: some
Structure_of_nervous_system)[disease: {Diabetes_mellitus_only,
Diabetes_mellitus_type_1,Diabetes_mellitus_type_2}]
Summarizability: Disjointness and completness (with respect to dice point) of listed concepts used as “roll-up level instances”
checked by reasoner
64JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Unifying Native & Semantic Dimensions
■ Nodes:□ Native nodes: correspond to „entity“□ Concept nodes: correspond to „entity concept“ (domain concept)
■ Levels: □ Native level: consists of native nodes (from given dimensional-fact model)□ Concept level: consists of concept nodes
■ Overall picture:□ Native level relates to entity class (maybe an external ontology)□ May add concept levels (for concepts of the same entity class) above, and
local to some other concept node□ Which entity concepts (domain concepts) are included as concept nodes is
a design decision; nodes are potential entries in „data cube“
65JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Subsumption Hierarchies and Reasoning
■ Aim: Subsumption hierarchy (⊑) of concepts□ Guide users in narrowing analysis along concept hierarchies
■ Mapping to OWL: open world reasoning, md-concepts + individuals (used in concept definitions) + domain concepts□ Fact-based concepts and built-ins considered as “primitive”□ Limited expressiveness: rollsUpTo* cannot be “functional”□ Workaround: redundant “rollsUpToLevel*_X”
■ Mapping to SQL: Closed world reasoning for selected dimensions (time, drugAtc); “subset checking”, built-ins considered
66JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: MDO concept in OWL
■ MDO: InsInRuralDistrictLeadDocInUrbanDistr[leadDoc:doctor] (see above).
■ OWL: Ǝins.ƎrollsUpTo_district.Ǝentity.ruralDistrict ∏ ƎleadDoc.(ƎatLevel.{doctor} ∏ ƎrollsUpTo_district. Ǝentity.urbanDistrict)
67JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Concept Organisation
Formal Definition and Reasoning
drugATC
ins
leadDoc
doctor
drugPrescription
quantitycosts
district
province
insurantdistrictprovince
drug
atcChem-SubGr
atcPharm
atcTherap
InOAD
InStarterOAD
InAD
starterOADDrug
cheapDrug
drug
ruralDistrict urbanDistrict
highRuralDistrict highUrbanDistrict
district
youngInsurant
oldInsurant
insurant
middleAgedInsurant
babychild
youthrichInsurant
richAndOldInsurant
OrganizedMDO Concepts
ruralDistrict urbanDistrict
highRuralDistrict highUrbanDistrict
district
urbanProvince ruralProvince
province
PatInHighRuralDistrLeadDocInUrbanDistr
PatInRuralDistrLeadDocInUrbanDistr
urbanProvince ruralProvince
province
time
actDoc
PatInRuralDistrLeadDocInUrbanDistr
PatInHighRuralDistrLeadDocInUrbanDistr
InOADInStarterOAD
InAD
starterOADDrug
cheapDrug
drugurbanProvince
ruralProvince
province
ruralDistrict
urbanDistrict
highRuralDistrict
highUrbanDistrict
district
youngInsurant
oldInsurant typicalDM2AgedInsurant
insurant
middleAgedInsurant
baby
child
youth
richInsurantrichAndOldInsurant
Unorganized Business Terms
68JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
MDO concept in SQL: Relational View
■ Predefined views over DWH:□ dimRoleXrollUp(dimRoleX,dimRoleXSup) Note:transtive,reflexiv rollUp
□ dimRoleXatLevel(dimRoleX,level)□ dimSpaceXRollUp(dimRole1,dimRole1Sup,.. dimRolek,dimRolekSup)□ ….
■ MDO-concepts as “concept views”, naming conventions□ entity-concept(entityClassName)□ dimConceptName(dimName)□ mdConceptName(dimRole1Name,…,dimRolekName)□ compConcName(GoI_dimRole1Name,.., GoI_dimRolekName,
GoC_dimRole1Name,…GoC_dimRolekName)■ MDO-concepts as SQL-built-in:
□ Use predefined-views and concept views to define new concept view
69JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: MDO to SQL (1)
■ Mapping rule for INTERSECTION OF:□ MDO: INTERSECTION OF (C1, C2)□ SQL: SELECT * FROM C1 NJ C2
■ Example: □ MDO:
INTERSECTION OF (insRuralDistrict, leadDocUrbanDistrict)□ SQL:
SELECT * FROM insRuralDistrict NJ leadDocUrbanDistrict
NJ … NATURAL JOIN
70JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: <p>(c)[g] in SQL (2)
■ <dim1: n1, … dimk: nk>(concept)[dim1:l1,…dimk:lk]□ SQL-Interpretation:
SELECT * FROM (SELECT dim1,..,dimkFROM dim1RollUp NJ …. dimnRollUpWHERE dim1Sup=´n1´ AND … dimkSup=´nk´) NJconcept NJ (SELECT dim1,…, dimkFROM dim1RollUp NJ dim1AtLevel … NJ dimkAtLevelWHERE dim1AtLevel.level=`l1` AND …. =`lk`)
71JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
3. Ontology-based Measures & Scores
■ Md-concepts used □ to select the data to be included in the calculation of derived
measures & scores□ to define the context of context-specific measures & scores
■ Generic measures & scores□ to avoid repeated definitions of similar measures & scores□ „generic parameters“ for md-concepts
■ Templates for measure & score definitions, and SQL-builtIns■ Applied to point or set of points (defined by md-concepts)
□ <point>.measure or <point>(concept)[granularity]..measure□ (PntGoI,PntGoC).score, …
■ Measure and scores represented by measure and score views
72JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Measure & Score DefinitionsMeasure Score■ Signature
□ Set of points for which measure is defined
□ Indicated by (granularity-restricted) dimension space
■ Instructions□ Function: point to value□ Pre-defined templates □ Built-in (by SQL view)
■ Signature□ Set of pairs of points for
which score is defined□ Indicated by (granularity-
restricted) comparative dimension space
■ Instructions□ Function: 2 points to value□ Pre-defined templates□ Built-in (by SQL view)
73JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Measure and score application
■ md-concept.measure□ SQL-interpretation:
SELECT measure FROM concept NATURAL JOIN measure_view
■ concept..measure□ SQL-Interpretation:
SELECT *FROM concept NATURAL JOIN measure_view
measure_view (data cube):contains each point for which measure value exists + measure value
■ comparativeConcept.score□ SQL-interpretation:
SELECT scoreFROM comparativeConceptNATURAL JOIN score_view
■ compConcept..score□ SQL-interpretation
SELECT *FROM comparativeConceptNATURAL JOIN score_view
score_view (data cube):contains each point-pair (flattened) for which score value exits + score value
74JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: Sample Measure / Score Applications
■ <ins:Tyrol,time:2011>.drugCostsOfRuralIns■ <ins:Austria,time:2011>[ins:province].drugCostsOfRuralIns■ (<actDoc:Vienna,ins:NÖ,time:2010>(drugAtc:InOAD)[time:month]
(GoI.time=GoC.time)<actDoc:NÖ,ins:Vienna,time:2010>(drugAtc:InOAD)[time:month]..ratioOfTotalDrugCosts
Note: For simplicity, shorthand notation used with missing dimension roles – with respect to signature of measure/score – in point assumed to by “all”; missing dimension roles in granularity assumed to be at level of point in <p>(c)[g]-expression (supporting tool completes missing info by default at frontend)
75JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Measures■ Simple Arithmetic
□ +,-, (,), *, / over md-point
■ Aggregation□ Base measure m□ Qualifier q (a concept)□ Aggregation function f
■ 2-Step Aggregation□ Roll-up granularity g□ First level aggr. measure m
Scores■ Ratio,
MeanPercentileRank□ GoI base measure mi□ GoC base measure mc
■ 2 Step: Aggregation + Score□ Roll-up granularity g□ GoI aggr. measure mi□ GoC aggr. measure mc
Measure & Score Instructions: Templates
76JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example : Aggregation Measure
MDO Notation:sum(<self>(ins:RuralDistrict).. drugPrescriptionCosts)
SQL-Interpretation (measure view)SELECT dimRole1Sup,…dimRoleKSup, sum(drugPrescriptionCosts)FROM drugPrescriptionCosts_view NJ “ins:RuralDistrict”NJ drugPrescriptionSpaceRollUpGROUP BY dimRole1Sup,…, dimRolekSup
Aggregation Measure: drugCostsOfRuralInsSignature: drugPrescriptionSpaceMeasure: drugPresciptionCostsQualifier: ins:RuralDistrictAggregation function: sum
77JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: 2-Step Aggregation Measure
■ MDO notation:avg(<self>DrugPrescriptionSpace[ins:insurant, leadDoc:doctor, time:month,drugAtc:drug]..drugCostsOfRuralIns)
■ Sample application:<ins:Austria,leadDoc:GP,time:2012,drugAtc:all>..avgDrugCostsPerRuralIns
2-Step-Aggregation: avgDrugCostsPerRuralInsSignature: DrugPrescriptionSpace[ins:insurant..top]Roll-up granularity: (ins:insurant,leadDoc:doctor,time:month,drugAtc:drug) Measure: drugCostsOfRuralInsAggregation function: avg
Note: Granularity indications omitted, if not restricted
78JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example Score: Ratio
■ MDO notation:□ RATIO(<GoI.self>.drugCostsOfRuralIns,
<GoC.self>.drugCostsPrescByRuralDoctor)
■ Sample application:□ (<actDoc:Vienna,time:2010>,<ins:Vienna,time:2010>).r□ Rslt:= (<actDoc:Vienna,time:2010>.drugCostsOfRuralIns /
<ins:Vienna,time:2010>.drugCostsPrescByRuralDoctor)
Ratio: costRatioOfInsurantsMigrationSignature: DrugPrescriptionCompSpaceGoI measure: drugCostsOfRuralInsGoC measure: drugCostsPrescByRuralDoctor
79JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Generic Measures & Scores
■ Aims□ Avoid repeated definitions of same calculations□ Provide for flexibility in measure & score use□ Enable reasoning by providing common structures
■ Parameters for measures & scores□ Set on “instantiating” generic measure
■ Generic measure□ Qualifier as generic parameter
■ Generic score□ Parameter for qualifier of generic measure of GoI□ Parameter for qualifier of generic measure of GoC
80JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Generic Aggregation Measure
■ MDO notation:□ sum(<self>(%q)..drugPrescriptionCosts)
■ Sample application and interpretation: □ <ins: Tyrol,actDoc: GP,time: 2012>.totalCosts(ins:RuralDistrict)□ totalCosts(ins:RuralDistrict) == drugCostsOfRuralIns (see before)
Aggregation Measure: totalCosts( %q) Signature: DrugPrescriptionCompSpaceMeasure: drugPrescriptionCostsQualifier: %qAggregation function: sum
81JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Aggregation-AvgPercentileRank
Sample application and interpretation:□ (<actDoc:Tyrol,time:2011>,<actDoc:Austria,time:2011>.
s(%qi=(ins:RuralDistrict,%qc=(actDoc:UrbanDistrict))□ percentileRank ( avg
(<actDoc:Tyrol,time:2011>[ins:insurant]..tc(actDoc:RuralDistrict),<actDoc:Austria,time:2011>[ins:insurant]..tc(actDoc:UrbanDistrict).
AggregationAvgPercentileRank: MnPctLRkCostsPerIns(%qi,%qt) sSignature: DrugPrescriptonCompSpace[ins:insurant …
top]Roll-up granularity: ins:insurantGoI measure: totalCosts(%qi) tcGoC measure: totalCosts(%qc) tc
82JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Measure & Score Drivers Hierarchy
■ Change of value of one score may change value of other score□ Known from definition of measures and scores□ Background knowledge (e.g., relationships of base measures)
■ “Influence hierarchy” captured explicitly□ drives(measure1,measure2)Eg.: drives(ambulantTreatmentCosts,overallCosts)□ drives(measure, score)□ drives(score1,score2)
■ Knowledge used later to guide analysis process
83JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
SemCockpit: Built-in Measures & Scores
■ Principles□ Function defined as view table with columns for each dimension
role of md-point + measure-value□ For scores: view table with 2 points + score-value□ Based on dynamic SQL□ Point, qualifier, and possible other parameters are %placeholders□ Cubes, levels, dimension nodes , rollsUp* , mdo-concepts
(column names are dimensions) assumed to exist as tables/views(naming conventions: e.g., ‘dimRole:dimConceptName’)
■ Use of built-in measures & scores□ The same way as non-built-ins
84JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
4. Comparative OLAP
■ Build cubes for selected roll-up facts, using mdo-concepts to define points of cubes and to instantiate generic measures
■ Build likewise comparative cubes for comparative facts■ Define (comparative) analysis situations with some elements of
a (comparative cube) such as node of a point, level of a granularity, or qualifier, fixed and some elements variable
■ Explorative analysis („try and see“) by varying variables of (comparative) analysis situations
85JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Cubes and Facts
■ Cube defined by “MDO-query”:□ Cube space: <point>(concept)[granularityOrRange]□ List of measures m1(q1),…mn(qn), whose measure domains
subsume cube space, with qi instantiated (i=1..n)□ Optionally: m(q)/v to indicate that a NULL-value is to be
substituted by v□ Cube contains all facts for which one measure is not NULL
■ Fact: (<x1,…xn, v1.. vn)■ Comparative cubes and comparative facts:
□ Cube: 2 cube spaces (GoI, GoC) + join condition + score □ Join condition: GoC.Di =GoI.Dj in general: any predefined
comparison function / comparative concept
86JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Cube and comparative cube
(<time:2012, ins:UpperAustria>, <time:2011, ins:UpperAustria>)
..RatioOfDrugCosts(InOAD,InOAD)
(<time:2012, ins:UpperAustria>[ins:district],GoI:ins:district = GoC:ins:district,
<time:2011, ins:UpperAustria>[ins:district])..RatioOfDrugCosts(InOAD.InOAD)
87JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Comparative Analysis Situation
Groups of Interest (GoI) Groups of Comparison (GoC)Point (pntGoI) Point (pntGoC)Condition (conceptGoI) Condition(conceptGoC)Granularity (granGoI) Granularity (granGoC)
Join Condition (joinCond)Qualifier (qGoI) Qualifier (qGoC)
Score (%qGoI, %qGoC)
One comparison per (pntGoI,pntGoC)-pair according to Join Condition.
E.g:(<actDoc:GP,time: 2012>[actDoc:medSection,time:month], GoI.time=GoC.time<actDoc:Oculist,time:2012>[actDoc:medSection,time:month]).ratioAvgDrugCostsPerIns(InOAD,InOAD)
88JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Correlated Change■ Coordinates of points
□ Move down, up, aside, …
■ Granularities□ Drill down, roll up, …
■ Qualifiers□ Narrow, broaden, aside
■ Score□ Refocus
■ Dimension□ Split, merge
Change of GoI or GoC ■ Coordinates of points■ Granularities■ Qualifiers
Change GoI-GoC pairing ■ Join conditions
□ Shrink, expand, re-join
■ External slice□ Filter, extend, shift
Variation along “semantic relationships”
89JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Explorative Analysis
(<time:2012, ins:Linz-Stadt>, <time:2011, ins:Linz-Stadt>)
.RatioOfDrugCosts
A3
(<time:2012, ins:UpperAustria>, <time:2011, ins:UpperAustria>)
.RatioOfDrugCosts(ins:UrbanDistrict, ins:UrbanDistrict)
A2
(<time:2012, ins:UpperAustria>, <time:2011, ins:UpperAustria>)
.RatioOfDrugCosts
A1
correlated( narrow(ins:urbanDistrict) )
correlated( moveDownToFirst(ins) )
90JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Composite Analysis Situation
■ Consists of multiple analysis and comparative analysis situations (e.g., show scores, drill down to raw measures)
■ Linked by semantic relationships (e.g, drill-down to city)■ Relationships form a tree; successors depend on predecessors■ Analysis situations are parameterized■ Changes in parameters propagated down along relationships■ Composite analysis situation may be depicted in one diagram
or in multiple dependent diagrams■ Global “picture” of aggregation & details; coherent changeUsed in analysis phase (1): Search for measures and groups.
91JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Composite Analysis Situation
(<leadDoc:UpperAustria>[time:month],GoI.time:month= GoC.time:month,
<leadDoc:all>[time:month])..RatioOfDrugCosts(
ins:RuralDistrict, ins:RuralDistrict )
A1.3(<leadDoc:UpperAustria>[time:month],GoI.time:month= GoC.time:month,
<leadDoc:all>[time:month])..RatioOfDrugCosts(
ins:UrbanDistrict, ins:UrbanDistrict )
A1.2
(<leadDoc:UpperAustria>[time:year],GoI.time:year = GoC.time:year,
<leadDoc:all>[time:year])..RatioOfDrugCosts
A1.1
correlated( compose(narrow(ins:ruralDistrict),
drillDownTo(time:month) ) )
correlated( compose(narrow(ins:urbanDistrict),
drillDownTo(time:month) ) )
A1
92JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Analysis History
2nd STEP
4th STEP
(<time:2012, ins:Linz-Stadt>,<time:2011, ins:Linz-Stadt>)
.RatioOfDrugCosts
A3
(<time:2012, ins:UpperAustria>, <time:2011, ins:UpperAustria>)
.RatioOfDrugCosts(ins:UrbanDistrict, ins:UrbanDistrict)
A2
(<time:2012, ins:UpperAustria>, <time:2011, ins:UpperAustria>)
.RatioOfDrugCosts
A1
( ‹%pntGoI› %conceptGoI [%granGoI], %joinCond,
‹%pntGoC› %conceptGoC [%granGoC] )..%score( %qGoI, %qGoC )
A0
1st STEP
3rd STEP
correlated( narrow(ins:UrbanDistrict) )
backtrack
correlated( moveDownToFirst(ins) )
93JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
5. BI Analysis Graphs
■ Represent promising analysis paths (sequence of OLAP-steps, variations of controls on initial comparative analysis situation)
■ Nodes□ Analysis situations or comparative analysis situations□ Parameterized□ Simple or composite analysis situations
■ Directed Arcs: □ OLAP navigation between source and target analysis situation□ Express how parameters of source and target analysis situation
relate in terms of “semantic relationships” (roll-up, subsumption, but also “sibling”, “next”, …)
□ Optionally parameterized
94JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
WebML■ Content Model
□ Classes & attributes
■ Navigation Model: Unit□ Associated to class□ Contains object(s) of class□ Parameterized SQL query
over class
■ Navigation Model: Links□ Navigation between units□ Transport information,
navigate relationships
Analysis Graphs (AGs)■ Content Model
□ Cubes & measures
■ AGs: Analysis Situation□ Associated to cube□ Contains (rollUp) facts□ Parameterized MDO-query
over cube
■ AGs: Arcs□ Navigation between ASs□ Transport information,
navigate “relationships”
Analysis Graphs: Inspired by WebML [Ceri,et.al.]
95JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Analysis Graphs: Design
■ Typical analysis session is characterised by1. Determining an initial comparative analysis situation2. Visually inspect the result3. Modify parameters (usually based on semantic relationships) to
move to a new individual analysis situation4. Inspect result: (a) stop if satisfied, or (b) continue with (3), or (c)
backtrack to previous analysis situation, or (d) explore new steps
■ Analysis graphs capture this “analyses process knowledge” for later analysis to provide guidance □ Design graph incrementally, capturing each promising path□ Generalize/restrict variable domains, if variation is helpful or not
96JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: BI Analysis SessionAnalysis Graph
AG0 correlated(moveToPrev(time))
( ‹%pntGoI› %conceptGoI [%granGoI], %joinCond,
‹%pntGoC› %conceptGoC [%granGoC] )..%score( %qGoI, %qGoC )
A0
refocusScore( RatioOfAmbTreatmentCosts )
refocusScore( RatioOfDrugCosts )
correlated( moveDownTo(
actDocMedSec:oculist ) )
Invididual Analysis GraphAG0'
correlated(moveToPrev(time))
(‹time:2012›, ‹time:2011›).RatioOfTotalCosts
A0(1)
(‹time:2012, actDocMedSec:oculist›, ‹time:2011, actDocMedSec:oculist›)
.RatioOfTotalCosts
A0(2)
(‹time:2011›, ‹time:2010›).RatioOfTotalCosts
A0(3)
(‹time:2012›, ‹time:2011›).RatioOfAmbTreatmentCosts
A0(4)(‹time:2012›, ‹time:2011›).RatioOfDrugCosts
A0(5)
97JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Specialization of Analysis Situations
■ Based on variable domains of analysis situations:□ Point-variables (pntGoI, pntGoC): md-concept□ Concept-variables (conceptGoI, conceptGoC, qGoI, qGoC): set of
concepts, ANY, ^ concept□ Level-variables (granGoI, granGoC): level or range□ Join-condition: set of join-conditions (comparative concepts)□ Score variable: set of scores, ANY, ^ score
■ An analysis situation specializes another analysis situation, if each variable domain is the same or more restrictive
■ Individual analysis situation: all variables bound
98JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: BI Analyis Graph (Design)
refocusScore( RatioOfDrugCosts )
AG1
refocusScore( RatioOfAmbTreatmentCosts )
( ‹%pntGoI› %conceptGoI [%granGoI], %joinCond,
‹%pntGoC› %conceptGoC [%granGoC] )..%score( %qGoI, %qGoC )
A0
(‹time:%yearGoI є year›, %yearGoC = %yearGoI.PREVIOUS,
‹time:%yearGoC є year›).RatioOfTotalCosts
A1
(‹time:%yearGoI є year›, %yearGoC =
%yearGoI.PREVIOUS,‹time:%yearGoC є year›)
.RatioOfAmbTreatmentCosts
A2 (‹time:%yearGoI є year›, %yearGoC =
%yearGoI.PREVIOUS‹time:%yearGoC є year›)
.RatioOfDrugCosts
A3
correlated(moveToPrev(time))
refocusScore( RatioOfAmbTreatmentCosts )
refocusScore( RatioOfDrugCosts )
correlated( moveDownTo(
actDocMedSec:oculist ) )
AG0'
correlated(moveToPrev(time))A0(1) A1A0(4) A2A0(5) A3
(‹time:2012›, ‹time:2011›).RatioOfTotalCosts
A0(1)
(‹time:2012, actDocMedSec:oculist›, ‹time:2011, actDocMedSec:oculist›)
.RatioOfTotalCosts
A0(2)
(‹time:2011›, ‹time:2010›).RatioOfTotalCosts
A0(3)
(‹time:2012›, ‹time:2011›).RatioOfAmbTreatmentCosts
A0(4)(‹time:2012›, ‹time:2011›).RatioOfDrugCosts
A0(5)
99JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Analysis Graphs: Use
■ Typical use in analysis phase (2): proper analysis (repeated)1. Jump to predefined initial analysis situation or any other one2. Set open parameters for this analysis situation3. Evaluate the analysis situation and inspect the result4. Choose an outgoing arc to move to another analysis situation5. Set open parameter of the arc, if any6. Evaluate & inspect the result; stop if satisfied or continue with 4.
■ SemCockpit support□ Reasoning to check for specilisation of analysis situations and to
determine possible values for parameters □ Guidance rules to open/close paths and choose parameters
100JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
refocusScore( RatioOfDrugCosts )
AG1
refocusScore( RatioOfAmbTreatmentCosts )
( ‹%pntGoI› %conceptGoI [%granGoI], %joinCond,
‹%pntGoC› %conceptGoC [%granGoC] )..%score( %qGoI, %qGoC )
A0
(‹time:%yearGoI є year›, %yearGoC = %yearGoI.PREVIOUS,
‹time:%yearGoC є year›).RatioOfTotalCosts
A1
(‹time:%yearGoI є year›, %yearGoC =
%yearGoI.PREVIOUS,‹time:%yearGoC є year›)
.RatioOfAmbTreatmentCosts
A2 (‹time:%yearGoI є year›, %yearGoC =
%yearGoI.PREVIOUS‹time:%yearGoC є year›)
.RatioOfDrugCosts
A3
correlated(moveToPrev(time))correlated( moveDownTo(
ins:UpperAustria ) )
correlated( narrow( ins:ruralDistrict ) )
correlated( pick( ins:Linz-Land) )
refocusScore( RatioOfDrugCosts )
AG1'(‹time:2012›, ‹time:2011›).RatioOfTotalCosts
A1(1)
(‹time:2012›, ‹time:2011›)
.RatioOfDrugCosts
A3(1)(‹time:2012, ins:UpperAustria›, ‹time:2011, ins:UpperAustria›).RatioOfDrugCosts
A3(2)
(‹time:2012, ins:UpperAustria›, ‹time:2011, ins:UpperAustria›)
.RatioOfDrugCosts(ins:ruralDistrict, ins:ruralDistrict )
A3(3)
(‹time:2012, ins:UpperAustria›[ins:district], GoI.ins.district = GoC.ins.district,
‹time:2011, ins:UpperAustria›[ins:district]).RatioOfDrugCosts(
ins:ruralDistrict, ins:ruralDistrict )
A3(4)
(‹time:2012, ins:Linz-Land›, ‹time:2011, ins:Linz-Land›)
.RatioOfDrugCosts( ins:ruralDistrict, ins:ruralDistrict )
A3(5)
correlated( drillDownInHier(ins) )
Example: BI Analysis Graph (Use)
101JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
correlated( moveDownTo( ins:UpperAustria ) )
correlated( narrow( ins:RuralDistrict ) )
correlated( pick( ins:Linz-Land) )
refocusScore( RatioOfDrugCosts )
AG1'(‹time:2012›, ‹time:2011›).RatioOfTotalCosts
A1(1)
(‹time:2012›, ‹time:2011›)
.RatioOfDrugCosts
A3(1)(‹time:2012, ins:UpperAustria›, ‹time:2011, ins:UpperAustria›).RatioOfDrugCosts
A3(2)
(‹time:2012, ins:UpperAustria›, ‹time:2011, ins:UpperAustria›)
.RatioOfDrugCosts(ins:RuralDistrict, ins:RuralDistrict )
A3(3)
(‹time:2012, ins:UpperAustria›[ins:district], GoI.ins.district = GoC.ins.district,
‹time:2011, ins:UpperAustria›[ins:district]).RatioOfDrugCosts(
ins:RuralDistrict, ins:RuralDistrict )
A3(4)
(‹time:2012, ins:Linz-Land›, ‹time:2011, ins:Linz-Land›)
.RatioOfDrugCosts( ins:RuralDistrict, ins:RuralDistrict )
A3(5)
A3(2) A4A3(3) A5A3(4) A6A3(5) A7
correlated( drillDownInHier(ins) )
refocusScore( RatioOfDrugCosts )
A2
A1
correlated( moveDownTo( ins:?prov є province) )
correlated( narrow( ?qual є { ins:RuralDistrict, ins:UrbanDistrict } ) )
correlated( drillDownInHier(ins) )
correlated( pick( ins:?distr є district) )
AG2... (‹time:%yearGoI є year›,
%yearGoC = %yearGoI.PREVIOUS
‹time:%yearGoC є year›).RatioOfDrugCosts
A3
(‹time:%yearGoI є year, ins:?prov є province›, %yearGoC = %yearGoI.PREVIOUS,
‹time:%yearGoC є year, ins:?prov є province›).RatioOfDrugCosts
A4
(‹time:%yearGoI є year, ins:?prov є province›, %yearGoC = %yearGoI.PREVIOUS,
‹time:%yearGoC є year, ins:?prov є province›).RatioOfDrugCosts( ?qual, ?qual )
A5
i
( ‹time:%yearGoI є year, ins:?prov є province›[ins:district], %yearGoC = %yearGoI.PREVIOUS and
GoI.ins.district = GoC.ins.district, ‹time:%yearGoC є year, ins:?prov є province›[ins:district] )
..RatioOfDrugCosts( ?qual, ?qual )
A6
i
( ‹time:%yearGoI є year, ins:?distr є district›, %yearGoC = %yearGoI.PREVIOUS
‹time:%yearGoC є year, ins:?distr є year› ).RatioOfDrugCosts( ?qual, ?qual )
A7
102JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
AG2'
correlated( moveDownTo( ins:LowerAustria ) )
correlated( narrow( ins:UrbanDistrict ) )
refocusScore( RatioOfDrugCosts )
(‹time:2013›, ‹time:2012›).RatioOfTotalCosts
A1(1)
(‹time:2013›, ‹time:2012›)
.RatioOfDrugCosts
A3(1) (‹time:2013, ins:LowerAustria›, ‹time:2012, ins:LowerAustria›)
.RatioOfDrugCosts
A4(1)
(‹time:2013, ins:LowerAustria›, ‹time:2012, ins:LowerAustria›).
RatioOfDrugCosts(ins:UrbanDistrict, ins:UrbanDistrict)
A5(1)
......
refocusScore( RatioOfDrugCosts )
A2
A1
correlated( moveDownTo( ins:?prov є province) )
correlated( narrow( ?qual є { ins:RuralDistrict, ins:UrbanDistrict } ) )
correlated( drillDownInHier(ins) )
correlated( pick( ins:?distr є district) )
AG2... (‹time:%yearGoI є year›,
%yearGoC = %yearGoI.PREVIOUS
‹time:%yearGoC є year›).RatioOfDrugCosts
A3
(‹time:%yearGoI є year, ins:?prov є province›, %yearGoC = %yearGoI.PREVIOUS,
‹time:%yearGoC є year, ins:?prov є province›).RatioOfDrugCosts
A4
(‹time:%yearGoI є year, ins:?prov є province›, %yearGoC = %yearGoI.PREVIOUS,
‹time:%yearGoC є year, ins:?prov є province›).RatioOfDrugCosts( ?qual, ?qual )
A5
i
( ‹time:%yearGoI є year, ins:?prov є province›[ins:district], %yearGoC = %yearGoI.PREVIOUS and
GoI.ins.district = GoC.ins.district, ‹time:%yearGoC є year, ins:?prov є province›[ins:district] )
..RatioOfDrugCosts( ?qual, ?qual )
A6
i
( ‹time:%yearGoI є year, ins:?distr є district›, %yearGoC = %yearGoI.PREVIOUS
‹time:%yearGoC є year, ins:?distr є year› ).RatioOfDrugCosts( ?qual, ?qual )
A7
103JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: BI Analyis Graph (Design)
AG2'
correlated( moveDownTo( ins:LowerAustria ) )
correlated( narrow( ins:UrbanDistrict ) )
refocusScore( RatioOfDrugCosts )
(‹time:2013›, ‹time:2012›).RatioOfTotalCosts
A1(1)
(‹time:2013›, ‹time:2012›)
.RatioOfDrugCosts
A3(1) (‹time:2013, ins:LowerAustria›, ‹time:2012, ins:LowerAustria›)
.RatioOfDrugCosts
A4(1)
(‹time:2013, ins:LowerAustria›, ‹time:2012, ins:LowerAustria›).
RatioOfDrugCosts(ins:UrbanDistrict, ins:UrbanDistrict)
A5(1)
.........AGn
104JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
6. Guidance, Judgment, and Analysis Rules
■ Representation of former tacit knowledge of analyst□ Process-oriented: Guidance in use of BI analysis graph
(guidance rules)□ Static: Possible explanations of striking score values (judgment
rules)□ Analysis: Analyse selected comparative facts (analysis rules) and
report (reporting rules) interesting facts or initiate some action (action rules), e.g., start an analysis in some BI analysis graph
■ Defined upon analysis situations or comparative cubes■ Inheritance and overriding along „specialization“ hierarchies of
analysis situations and comparative cubes■ „Multi-granular“ evaluation along roll-up hierarchies of points
105JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Guidance, Judgement and Analysis Rules
■ Guidance rule□ Process-oriented knowledge on how to best proceed in analysis□ Tied to generic analysis situation□ Action: recommendation where to look next
■ Judgement rule□ Static knowledge about comparative fact, process-independent□ Applies to all facts of indicated cube, “semantic attachment”□ Action: judgement why a score may be low/high, recommendation
■ Analysis rule□ Evaluated for part of DWH, e.g. for data loaded in last ETL-cycle□ Action: “Management Summary”, or “Action invocation”
106JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Definition & Evaluation of Guidance Rules
■ A guidance rule is defined by□ a name□ a domain, given by a generic analysis situation □ an action, given by a navigation operator (plus variable binding or
domain restriction)□ a condition over a comparative fact (fact-oriented rule) or over all
facts of an analysis situation (set-oriented rule)
■ A guidance rule is evaluated□ when an individual analysis situation in the rule´s domain is met□ for each fact / once for the set of facts□ and the action is recommended, if the condition is met
107JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Guidance Rule
Rule applies to:(<ins:UpperAustria,actDoc:GP>,<ins:Styria,actDoc:GP>).s(ins:RuraDistrict,ins:Rural
District)=1.4
GuidanceRule: AvgCostRatioPerRuralInsInAustriaPoint of Interest: (ins: Austria, actDoc: GP)*Point of Comparison: (ins: Austria, actDoc: GP)*GoI Qualifier: ^ (ins:RuralDistrict)GoC Qualfier: ^ (ins:RuralDistrict)Join condition: GoI.time=GoI.timeScore: avgCostRatioPerInsurant sRule Condition: > 1.3Rule Action: Recommend correlated narrow “actDoc:RuralDistrict”
108JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Guidance Rules of Use Case
Guidance RulesFOR A0IF val < 0.8 or val > 1.2RECOMMEND correlated(
moveToPrev(time))
GR0 FOR A1IF val > 1.1RECOMMEND refocusScore(
RatioOfAmbTreatmentCosts)
GR1
FOR A3IF val < 0.9 or val > 1.1RECOMMEND correlated(
moveToPrev(time))
GR0 FOR A1IF val > 1.1DISADVISE refocusScore(
RatioOfDrugCosts)
GR2
FOR A4IF val > 1.1RECOMMEND correlated( narrow( ?qual є { ins:RuralDistrict,
ins:UrbanDistrict } ) )
GR3
FOR A6IF val > 1.2RECOMMEND correlated( pick(
ins:?distr := AutoSelect ))
GR5FOR A5IF val > 1.1RECOMMEND correlated( drillDownInHier(ins) )
GR4
109JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Guidance Rules: Conflict Resolution
■ Conflicts: Multiple guidance rules with different guidancea) Apply for an analysis situation (set-oriented rule)b) Apply for a comparative fact (fact-oriented rule)
■ Strategies for rules with different names:□ Follow all, parallel in cloned tasks or iteratively in a work stack□ Use rule priorities□ Use priorities for guidance actions
■ Strategies for rules with the same name:□ Use inheritance and overriding, based on subsumption of
rule domains (i.e., specialization of analysis situations)
110JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Guidance Rules over composite ASs
■ Considers multiple, related facts in rule condition■ Tied to the root AS, e.g., such a rule may guide from A1 to A2
(<actDoc:Tyrol>[time:year] GoI.time=GoC.time<acDoc:Austria>[time:year]).s(actDoc:Rural,actDoc:Rural)
A2
correlated (narrow(actDoc:Rural))
(<actDoc:Tyrol>[time:year] GoI.time=GoC.time<actDoc:Austria>[time:year]).s
A1_1
(<actDoc:Tyrol>[time:month]GoI.time=GoC.time <actDoc:Austria>[time:month]).s(ins:Rural,ins:Rural)
A1_2(<actDoc:Tyrol>[time:month],GoI.time=GoC.time <actDoc:Austria>[time:month]).s(ins:Urban,ins:Urban)
A1_3
A1
111JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Judgement Rules
■ Capture knowledge what a comparative fact may mean■ Defined for generic comparative cubes, i.e. comparative cubes
with a generic score; condition over score, action is some text■ Specialization hierarchy for generic comparative cubes: C´ is
more specific then C, if□ the comparative cube space of C´ is subsumed by that of C□ each qualifier of the generic score at C´ has the same or less
restrictive domain than it has at C
■ A judgment rule r is evaluated for a comparative fact f□ when f in the domain of r is retrieved and no more specific
judgment rule of the same name exists for f □ and the fact is annotated by the judgment if the condition is met
112JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Judgement Rules
Judgement RulesCubes for Analysis and Judgement Rules
([time:year, ins:district..province], GoI.year = GoC.year.PREVIOUS and
GoI.ins = GoC.ins and %qoi=%qoc[time:year, ins:district..province])
.RatioOfDrugCosts( %qoi ↑ InDrug,
%qoc ↑ InDrug )
C1
(‹time:2012›[ins:district..province], GoI.ins = GoC.ins and %qoi=%qoc‹time:2011›[ins:district..province])
.RatioOfDrugCosts(%qoi ↑ InOAD,
%qoc ↑ InOAD )
C2
FOR C1IF val > 0JUDGE “There is on average a general
increase of drug costs of about 5% per year.“
JR1
FOR C2IF val ≥ 1.07JUDGE „In 2012 a new more expensive
but also more effecitive starter OAD drug came into the market such that the total OAD drug costs has been increased above-average of about 11%.“
JR1
113JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Analysis Rules
■ Defined like judgment rules, but□ Evaluated only for indicated set of facts (e.g., those of last ETL-cycle)□ Rule action is report (reporting rules) or “real” (actionable knowledge)□ Positive and negative activation condition (“decision scope approach”)
■ Multi-granular evaluation□ Consider roll-up relationships between facts in rule evaluation□ Rule evaluation strategies: prerogative vs. presumed□ Explained for non-comparative case first
114JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: Data Warehouse SALES & LOSS
ProductLine
ProductCategory
Product
PRODUCT
food
long life short life
milk cream
SALES & LOSSlosssale
allAllProduct
……
State
AllLoc
LOCATION
City
all
AustriaGermany
Vienna
Month
Year
TIME
Day
2012
12/2012
2011
12/12/2012
AllTime
… …
all
…
(food)[Product, State]delist↯:l/s 40%–↯:l/s 10%
*
(Milk,Vienna,28-11-2012,5,20)
(shortLife,Austria,2012,800,2700)
* set of descendent points ofpoint (product:food, location:all)at granularity [Product, State]
Ex.: Base fact and roll-up fact
Ex.: Analysis rule
115JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: Mono-Granular Analysis Rule „Delist“
(food)[Product, State]delist↯:l/s 40%–↯:l/s 10%
* set of descendent points of point(product:food, location:all)
at granularity [Product, State]
*
(longLife)[Product, State]delist↯:l/s 15%–↯:l/s 10%
(shortLife)[Product, State]delist↯:l/s 40%–↯:l/s 20%
116JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: Multi-Granular Analysis Rule „Delist“
rolls-up-to
(food)[Product, State]delist↯:l/s 40%–↯:l/s 10%
(shortLife)[Product, State]delist↯:l/s 40%–↯:l/s 20%
(longLife)[Product, State]delist↯:l/s 15%–↯:l/s 10%
(food)[Product, City]delist↯:l/s 45%–↯:l/s 10%
(shortLife)[Product, City]delist↯:l/s 45%–↯:l/s 20%
(longLife)[Product, City]delist↯:l/s 25%–↯:l/s 10%
117JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Evaluation Strategies for Multi-Granular Rules
■ Two rules□ Same name (regarded together as “one” analysis rule)□ Domains at different granularities in “roll-up”-relationship□ Rules evaluated “top-down”
■ Prerogative Evaluation (for “action rules”)□ If rule applies positively or negatively for a roll-up fact, the rule is
not checked for its drill-down facts.
■ Presumed Evaluation (for “reporting rules”)□ If a rule applies positively or negatively for a roll-up fact, the rule is
checked for its drill-down facts, but only reported if the decision would be different (e.g., “Beer sales are down, except Heineken”)
118JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Evaluation Strategies: Comparison
prerogative evaluation of AR
district
province + o
+ -
-
+-o
pos. activation condition ist trueneg. activation condition ist truepos. and neg. activation condition is falseactivation condition is not evaluatedaction is performed
district
province +
+ -
o
+ -
-
+ -
+-o
pos. activation condition ist trueneg. activation condition ist truepos. and neg. activation condition is falseactivation condition is not evaluatedpoint is reported
presumed evaluation of AR
119JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: Multi-Granular, Prerogative Evaluation
(milk, Austria, l/s: 20%)
(milk, Vienna, l/s: 48%)
(cream, Austria, l/s: 50%)
(cream, Vienna, l/s: 15%)
instance-ofrolls-up-to
(cream, Salzburg, l/s: 52%)↯
↯
(shortLife)[Product, State]delist↯:l/s 40%–↯:l/s 20%
(shortLife)[Product, City]delist↯:l/s 45%–↯:l/s 20%
120JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Ex.: Multi-Granular, Presumed Evaluation
(milk, Austria, l/s: 20%)
(milk, Vienna, l/s: 48%)
(cream, Austria, l/s: 50%)
(cream, Vienna, l/s: 15%)
instance-ofrolls-up-to
(cream, Salzburg, l/s: 52%)↯–↯
↯
(shortLife)[Product, State]alertMg↯:l/s 40%–↯:l/s 20%
(shortLife)[Product, City]alertMg↯:l/s 45%–↯:l/s 20%
121JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Example: Analysis Rules (Use Case)
Analysis Rules
FOR C1ACTION start AG2 at A3IF val ≥ 1.1UNLESS val < 1.05)
AR1
FOR C1REPORT FACTIF val ≥ 1.1UNLESS val < 1.05
AR2
FOR C2ACTION start AG2 at A3IF val ≥ 1.2UNLESS val < 1.01
AR1
Action Rules
Reporting Rules
Cubes for Analysis and Judgement Rules
([time:year, ins:district..province], GoI.year = GoC.year.PREVIOUS and
GoI.ins = GoC.ins and %qoi=%qoc[time:year, ins:district..province])
.RatioOfDrugCosts( %qoi ↑ InDrug,
%qoc ↑ InDrug )
C1
(‹time:2012›[ins:district..province], GoI.ins = GoC.ins and %qoi=%qoc‹time:2011›[ins:district..province])
.RatioOfDrugCosts(%qoi ↑ InOAD,
%qoc ↑ InOAD )
C2
122JKU Linz Institut für Wirtschaftsinformatik – Data & Knowledge Engineering
Conclusion
■ Comparative Data Analysis□ Is-to-is comparison, exploratory, pre-cursor to data mining□ Hundreds of light variations of simple measures / scores□ Previous experience (knowledge) guides later analysis
■ Approach□ Define business terms through ontology; reuse existing ontology□ Define generic measures & scores based on concepts of ontology□ Map concepts and measures & scores to SQL□ Explorative analysis: use concepts and instantiated scores to compare
particular group of interest vs. group of comparison□ Capture process knowledge by analysis graphs and guidance rules,
and previous insights on comparisons by judgment and analysis rules