euler: a logic‐based toolkit for aligning & reconciling multiple taxonomic perspectives
DESCRIPTION
CIRSS (Center for Informatics Research in Science and Scholarship) Seminar talk given on Sept. 19, 2014 at GSLIS, UIUC. http://cirssweb.lis.illinois.edu/Events/eventDetails.php?id=214TRANSCRIPT
Euler: A Logic-‐Based Toolkit for Aligning and Reconciling Mul:ple Taxonomic
Perspec:ves
Mingmin Chen1 Shizhuo Yu1 Parisa Kianmajd1 Nico Franz2 Shawn Bowers3 Bertram Ludäscher 4
1 Dept. of Computer Science , University of California, Davis 2 School of Life Sciences, Arizona State University 3 Dept. of Computer Science, Gonzaga University
4 GSLIS & NCSA, University of Illinois at Urbana-‐Champaign
Outline • Meet Nico, Curator of Insects
• TAP: The Taxonomy Alignment Problem
• Euler/X – Logic Inside! (X in FOL, RCC, ASP)
• Related Projects
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 2
Meet Prof. Nico Franz: Curator of Insects @ ASU
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 3
What Nico et al. do for a living …
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 4
Perelleschus salpinflexus sec. Franz & Cardona-‐Duque (2013) DOI:10.1080/14772000.2013.806371
1 Input ar:cula:ons: Franz & Cardona-‐Duque. 2013. Descripaon of two new species and phylogeneac reassessment of Perelleschus Wibmer & O'Brien, 1986 (Coleoptera: Curculionidae), with a complete taxonomic concept history of Perelleschus sec. Franz & Cardona-‐Duque, 2013. 2013. Systema5cs and Biodiversity 11: 209–236. Merge analyses: Franz et al. 2014. Reasoning over taxonomic change: exploring alignments for the Perelleschus use case. PLoS ONE. (in press)
Use Case: Perelleschus sec. 2001 & 2006 1
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 5
T1: Perelleschus sec. 2001 • Phylogeneac revision • 8 ingroup species concepts • 2 outgroup concepts • 18 concepts total
T2: Perelleschus sec. 2006 • Exemplar analysis • 2 ingroup species concepts • 1 outgroup concept • 7 concepts total
Goal: Align two phylogenies with differen:al taxon sampling
Source: Nico Franz. Explaining taxonomy's legacy to computers – how and why? The Meaning of Names: Naming Diversity in the 21st Century, Museum of
Natural History, U of Colorado, 9/30/2014.
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 6
What Nico does for a living (cont’d): The Indoors Part
• Go fun places, find new bugs, study them … – “Bugs-‐R-‐Us” (see taxonbytes.org)
• Now: Compare, align and revise taxonomies, based on careful observaaon, “character” data, experase …
• Formally: – Input: T1 + T2 (taxonomies) + A (expert ar3cula3ons)
– Output: revised, “merged” taxonomy (-‐ies) T3
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 7
• Given: – Taxonomies T1 , T2
• incl. constraints (coverage, disjointness) – Set of articulations (an alignment) A
• Find: – Combined (“merged”) taxonomy T3 (= T1 + T2 + A)
• Is it a taxonomy? Or a DAG? – Optional:
• Final alignment (should be minimal)
Taxonomy Alignment Problem (TAP)
T1
T2
T3 A
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 8
Real Example: Turn this …
1.16
1.17
1.20
2.40
< OR ==
1.18
1.19
2.41
==
1.14
1.15
2.36
!
2.38
< OR ==
2.39
==
1.12
1.13
1.12L
!
2.37
==
1.11
2.42
==
2.43
==
1.27 2.50==
1.23
1.25
1.24
2.53
> OR !
2.52
> OR !
2.47
< OR ==
2.54
> OR !
1.22
2.46
==
1.21
2.45
==
2.44
< OR ==
1.26 2.49==2.48
==2.51
==
2.35
2.36L
Nodes
1 18
2 21
Edges
isa_1 17
isa_2 20
Art. 20
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 9
… into this! (Perellescus Alignment Result)
• T3 := T1 and T2 are “merged” – Blue dashed: overlaps è resolve via “zoom-in view”
1.16
1.14
2.40
2.44
2.47
1.11
2.382.35
1.20
1.23
2.52
2.53
2.54
1.172.41
1.222.46
1.252.48
1.122.36
1.262.49
1.132.37
1.182.42
1.192.43
1.152.39
1.212.45
1.12L2.36L
1.272.50
1.242.51
Nodes
Taxonomy 1 5Taxonomy 2 8
MERGED Taxa 13 Edges
Overlaps 10Input 24
INFERRED 5
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 10
So how does it work? • If you have 3 concepts A, B, and C. • Assume you know something about
– A óR1 B (e.g. R1: A is a subset of B) – B óR2 C (e.g., R2: B is disjoint from C)
• Now what can you say about this: – A óR3 C
• Yes ?? • … it follows that R3: A is disjoint from C!
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 11
Ar:cula:on Language (RCC-‐5) • How does the expert express the known (or assumed) relaaonship between taxa A and B?
• How can A and B be related? • Use basic set constraints (B5):
– A = B (equals EQ) (==) – A < B (proper part of PP) (<) – A > B (inverse proper part of IPP) (>) – A o B (paraally overlaps PO) (><) – A ! B (disjoint “region” DR) (!)
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 12
Taxonomies and Ar:cula:ons in Euler
There are 32 (= 25) possible disjunc:ons for represenang par:al informa:on.
A taxonomy T is a triple (N, ≼, ϕ) with names (taxa) N, a paraal order (is-‐a) ≼, and taxonomic constraints ϕ.
• Sibling Disjointness: sibling taxa do not overlap • (Parent) Coverage: The union of the children “covers” the
parent è no “missing” children
A B
(iv) par5al overlap
A B
(ii) proper part
B A
(iii) Inverse proper part
A B
(i) congruence
A B
(v) disjointness
An ar:cula:on is a relaaon (set-‐constraint) between taxa A and B. One, and only one, of the following base relaaons B5 must hold:
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 13
R32 lahce of 32 (=25) disjunc:ons over B5
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 14
= < > o !(TRUE)
= < > != < > o < > o != > o != < o !
= < > < > != > != < ! < > o= > o= < o > o !< o != o !
< >= >= < > !< != ! > o< o= o o !
><= !o
∅(FALSE)
= EQ(x,y) Equals< PP(x,y) Proper Part of> iPP(x,y) Inverse Proper Parto PO(x,y) Partially Overlaps! DR(x,y) Disjoint from
Level 1(BASE-5 relations)
Level 2
Level 3
Level 4
Level 5(tautology)
Level 0(contradiction)
• … Aristotle … • … Euler … • … • … Greg Whitbread …
• [BPB93] J. H. Beach, S. Pramanik, and J. H. Beaman. Hierarchic taxonomic databases.,Advances in Computer Methods for Systematic Biology: Artificial Intelligence, Databases, Computer Vision, 1993
• [Ber95] Walter G. Berendsohn. The concept of “potential taxa” in databases. Taxon, 44:207–212, 1995.
• [Ber03] Walter G. Berendsohn. MoReTax – Handling Factual Information Linked to Taxonomic Concepts in Biology. No. 39 in Schriftenreihe für Vegetationskunde. Bundesamt für Naturschutz, 2003.
• [GG03] M. Geoffroy and A. Güntsch. Assembling and navigating the potential taxon graph. In [Ber03], pages 71–82, 2003.
• [TL07] Thau, D., & Ludäscher, B. (2007). Reasoning about taxonomies in first-order logic. Ecological Informatics, 2(3), 195-209.
• [FP09] Franz, N. M., & Peet, R. K. (2009). Perspectives: towards a language for mapping relationships among taxonomic concepts. Systematics and Biodiversity, 7(1), 5-20.
• …
15
Some History
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014
What’s in a name? Euler Diagrams
• Project named after Euler Diagrams: IF A is-a B AND C and B are disjoint ------------------------------------ THEN: A and C are disjoint!
16 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014
Euler Diagrams asTrees (or Graphs) A containment hierarchy (taxonomy)
An equivalent graph (w/ transi5ve edges)
same informa:on
17 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014
Represent Phylogenies as Trees …
T1: Perelleschus sec. 2001 • Phylogeneac revision • 8 ingroup species concepts • 2 outgroup concepts • 18 concepts total
1.16
1.17 1.20
1.18 1.19
1.14
1.15
1.12
1.13 1.12L
1.11
1.27
1.23
1.25 1.24
1.22 1.21
1.26
2.41
2.42 2.43
2.35
2.36 2.38
2.37 2.36L 2.39 2.40
2.53
2.52
2.54
2.51
2.50
2.44
2.45 2.46 2.47
2.48
2.49
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 18
… for all taxonomies of interest …
1.16
1.17 1.20
1.18 1.19
1.14
1.15
1.12
1.13 1.12L
1.11
1.27
1.23
1.25 1.24
1.22 1.21
1.26
2.41
2.42 2.43
2.35
2.36 2.38
2.37 2.36L 2.39 2.40
2.53
2.52
2.54
2.51
2.50
2.44
2.45 2.46 2.47
2.48
2.49
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 19
… ready, rotate by 90o, set …
1.16
1.17 1.20
1.18 1.19
1.14
1.15
1.12
1.13 1.12L
1.11
1.27
1.23
1.25 1.24
1.22 1.21
1.26
2.41
2.40
2.35
2.37
2.36
2.39
2.38
2.53
2.52
2.47
2.51
2.50
2.48
2.422.43
2.44
2.452.46
2.49 2.54
2.36L
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 20
Go! An expert input alignment! Just add some Euler Reasoning …
1.16
1.17
1.20
2.40
< OR ==
1.18
1.19
2.41
==
1.14
1.15
2.36
!
2.38
< OR ==
2.39
==
1.12
1.13
1.12L
!
2.37
==
1.11
2.42
==
2.43
==
1.27 2.50==
1.23
1.25
1.24
2.53
> OR !
2.52
> OR !
2.47
< OR ==
2.54
> OR !
1.22
2.46
==
1.21
2.45
==
2.44
< OR ==
1.26 2.49==2.48
==2.51
==
2.35
2.36L
Nodes
1 18
2 21
Edges
isa_1 17
isa_2 20
Art. 20
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 21
Euler/X toolkit in a single screenshot (desktop version, IX-‐2014)
… et voilà! The merged T3 (=T1 & T2 & A)
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 23
The Euler reasoner(s) infer: -‐ Grey: “perfect match” (congruences) -‐ Green, Yellow: “keepers” from T1, T2 -‐ Red edges: deduced subset/“sub-‐class”relaaons -‐ Blue edges: deduced overlaps
1.16
1.14
2.40
2.442.47
1.11
2.38
2.35
1.20
1.23
2.52
2.53
2.54
1.172.41
1.222.46
1.252.48
1.122.36
1.262.49
1.132.37
1.182.42
1.192.43
1.152.39
1.212.45
1.12L2.36L
1.272.50
1.242.51
1.16
1.14
2.402.44
2.47
1.11
2.38
1.12
2.35
2.36
2.36L
1.12L
1.20
1.23
2.52
2.53
2.54
1.172.41
1.252.48
1.262.49
1.132.37
1.182.42
1.192.43
1.152.39
1.222.46
1.212.45
1.272.50
1.242.51
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 24
But wait: PW1 …
1.16
1.14
2.402.44
2.47
1.11
2.38
1.122.35
1.12L
2.36
1.20
1.23
2.52
2.53
2.54
2.36L
1.172.41
1.252.48
1.262.49
1.132.37
1.182.42
1.192.43
1.152.39
1.222.46
1.212.45
1.272.50
1.242.51
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 25
… PW2
1.16
1.14
2.40
2.442.47
1.11
2.38
1.122.36
2.36L
2.35
1.12L
1.20
1.23
2.52
2.53
2.54
1.172.41
1.252.48
1.262.49
1.132.37
1.182.42
1.192.43
1.152.39
1.222.46
1.212.45
1.272.50
1.242.51
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 26
… PW3
1.16
1.14
2.40
2.442.47
1.11
2.38
2.35
1.20
1.23
2.52
2.53
2.54
1.172.41
1.222.46
1.252.48
1.122.36
1.262.49
1.132.37
1.182.42
1.192.43
1.152.39
1.212.45
1.12L2.36L
1.272.50
1.242.51
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 27
… PW4
1.16
1.14
2.40
2.44
2.47
1.11
2.38
1.12
2.35
2.36
1.12L
1.20
1.23
2.52 2.53
2.54
2.36L
1.172.41
1.252.48
1.262.49
1.132.37
1.182.42
1.192.43
1.152.39
1.222.46
1.212.45
1.272.50
1.242.51
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 28
… PW5
1.16
1.14
2.40
2.442.47
1.11
2.38
1.122.36
2.36L
2.35
1.12L
1.20
1.23
2.52
2.53
2.54
1.172.41
1.252.48
1.262.49
1.132.37
1.182.42
1.192.43
1.152.39
1.222.46
1.212.45
1.272.50
1.242.51
Hmmm… depending on input alignment: PW1
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 29
1.16
1.14
2.40
2.442.47
1.11
2.38
2.35
1.20
1.23
2.52
2.53
2.54
1.172.41
1.222.46
1.252.48
1.122.36
1.262.49
1.132.37
1.182.42
1.192.43
1.152.39
1.212.45
1.12L2.36L
1.272.50
1.242.51
… and PW2 are the only solu:ons!
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 30
What happened?
TAP: Possible Outcomes
1.a 1.bisa
1.cisa
2.d
=
2.e<
<
2.f<isa
isa
Input Alignment
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 31
TAP: Possible Outcomes
1.a 1.bisa
1.cisa
2.d
=
2.e<
<
2.f<isa
isa
Input Alignment
{A1, A2, A3, A4}
{A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4}
{A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4}
{A1} {A2} {A3} {A4}
{ }
Inconsistent! è Diagnosis (Reiter) = Black-‐Box Provenance
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 32
TAP: Possible Outcomes
1.a 1.bisa
1.cisa
2.d
=
2.e<
<
2.f<isa
isa
Input Alignment
{A1, A2, A3, A4}
{A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4}
{A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4}
{A1} {A2} {A3} {A4}
{ }
Inconsistent! è Diagnosis (Reiter) = Black-‐Box Provenance
1.b2.e
1.c
1.a2.d
2.f
Ambiguous! è Mul5ple Possible Worlds
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 33
TAP: Possible Outcomes
1.a 1.bisa
1.cisa
2.d
=
2.e<
<
2.f<isa
isa
Input Alignment
{A1, A2, A3, A4}
{A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4}
{A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4}
{A1} {A2} {A3} {A4}
{ }
Inconsistent! è Diagnosis (Reiter) = Black-‐Box Provenance
1.b2.e
1.c
1.a2.d
2.f
Ambiguous! è Mul5ple Possible Worlds
1.c2.f
1.b
1.a2.d
2.e
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 34
TAP: Possible Outcomes
1.a 1.bisa
1.cisa
2.d
=
2.e<
<
2.f<isa
isa
Input Alignment
{A1, A2, A3, A4}
{A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4}
{A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4}
{A1} {A2} {A3} {A4}
{ }
Inconsistent! è Diagnosis (Reiter) = Black-‐Box Provenance
1.b2.e
1.c
1.a2.d
2.f
Ambiguous! è Mul5ple Possible Worlds
1.c2.f
1.b
1.a2.d
2.e
1.b1.a
2.e
2.d1.c
2.fB. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 35
• FO reasoning about taxonomies (MFOL)
• Earlier: CleanTax – Prover9/Mace4
• Now: Euler – ASP Reasoners (DLV,
Clingo) – Specialized reasoners
(PyRCC) – … – X = ASP, RCC, …
Euler/X Toolkit and Workflow
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 36
Reducing Ambiguity
Possible Worlds (PWs) View
Aggregate View (AV) Cluster View
(CV)
Explore!
37 B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014
Common Outcome: Inconsistency!
1.a 1.bisa
1.cisa
2.d
=
2.e<
<
2.f<isa
isa
Input Alignment
{A1, A2, A3, A4}
{A1, A2, A3} {A1, A2, A4} {A1, A3, A4} {A2, A3, A4}
{A1, A2} {A1, A3} {A2, A3} {A1, A4} {A2, A4} {A3, A4}
{A1} {A2} {A3} {A4}
{ }
Inconsistent! è Diagnosis (Reiter) = Black-‐Box Provenance
• Need to debug the input araculaaons è (black-‐box) diagnosis!
• Focus: – How do we efficiently compute the diagnosac lauce?
• Also: – How to visualize..
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 38
A Hybrid Diagnosis Approach Combining Black-‐Box and White-‐Box Reasoning
Mingmin Chen1 Shizhuo Yu1 Nico Franz2 Shawn Bowers3 Bertram Ludäscher 4
1 Department of Computer Science , University of California, Davis 2 School of Life Sciences, Arizona State University
3 Department of Computer Science, Gonzaga University 4 GSLIS & NCSA, University of Illinois at Urbana-‐Champaign
Example Instance (from syntheac benchmark suite)
• Here: N = 10 taxa in T1, T2 • Euler/X finds:
inconsistent! • è diagnos:c lahce of 210
= 1024 nodes è Find minimal inconsistent
subset (MIS) è maximal consistent subset
(MCS) .. è show to user!
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 40
Visualizing Diagnoses
N = 10 araculaaons è 210 = 1024 node diagnosac lauce B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 41
Bener Idea: Just show MIS, MCS
N = 4 araculaaons è 24 = 16 node diagnosac lauce, but 3 MCS and 2 MIS are enough!
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 42
Visualizing Diagnoses
.. but 4 MCS and 1 MIC tell it all!
1024 node lauce
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 43
Visualizing Diagnoses Example from RuleML’14 paper: N=12 è 4096 nodes .. but 7 MCS and 5 MIC tell it all!
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 44
Black-Box Inconsistency Analysis (Diagnostic Lattice)
• Then: – Repair: find & revise minimal inconsistent subsets (Min-Incons) – Expand: find maximal consistent subsets (Max-Cons) & revise outs
What happens if you can’t have all (here: 4) articulations together?
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 45
• Black-‐box Analysis (Hiung Set algo.) yields a Diagnosis (lauce) – for n=4 araculaaons, there are 168 possible diagnoses – depending on expected “red/green areas” è explore space differently
• |araculaaons| = n è |possible diagnoses| = |monotonic Boolean funcaons| = Dedekind Number (n): 2, 3, 6, 20, 168, 7581, 7828354, ...
Inconsistency Analysis (Diagnostic Lattice)
• The Min-Incons (MIS) and Max-Cons (MCS) sets determine all others
è Repair MIS and/or Expand MCS
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 46
Improving Diagnosis • Reiter’s “black-‐box” (model-‐based) diagnosis helps debug the araculaaons
• Limited scalability (inherent complexity) • But every bit helps:
– Hiung Set Algorithm (“logarithmic extracaon”)
• Our idea: – Exploit “white-‐box” reasoning informaaon è RULES to the rescue
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 47
Key Idea: exploit white-‐box info • We use Answer Set Programming (ASP) to solve Taxonomy Alignment Problem (TAP)
• Inconsistency = “False” is derived in the head: False :-‐ <denial of integrity constraint>
• Apply provenance trick from databases J – What araculaaons contribute to a derivaaon of “False” ? – Eliminate those that don’t! è an example of reusing inferences across separate black-‐box tests!
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 48
The Provenance “Trick”
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 49
Hybrid Provenance
A3: c < f Black-‐box Provenance
1.a 1.bisa
1.cisa
2.d
=
2.e<
<
2.f<isa
isa
Input Alignment
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 50
Hybrid Provenance
A3: c < f Black-‐box Provenance
r7: d = e ∪ f
a = e ∪ f
A1: a = d
r3: a = b ∪ c
f < c
r4: b ∩ c = ∅ r8: e ∩ f = ∅ A2: b < e
A1+A2 + … => f < c
White-‐box Provenance
1.a 1.bisa
1.cisa
2.d
=
2.e<
<
2.f<isa
isa
Input Alignment
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 51
The Hybrid Approach
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 52
Hybrid Approach
What ar5cula5ons contribute to some inconsistency?
Good old black-‐box (HST)
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 53
Benchmark Results
• White-‐box < Hybrid < Black-‐box (runames) • Note: white-‐box does not give you a diagnosis • Potassco < DLV
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 54
Benchmark DLV
• White-‐box < Hybrid < Black-‐box (runames) • Potassco < DLV
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 55
Benchmark Clingo
• White-‐box < Hybrid < Black-‐box (runames) • Potassco < DLV
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 56
Summary: Hybrid Diagnosis • ASP rules can be used to efficiently solve real-‐world taxonomy reasoning problems
• Reiter’s diagnosis useful to debug inconsistent alignments
• Adding a “white-‐box” provenance approach speeds up state-‐of-‐the-‐art HST algorithm by elimina:ng independent ar:cula:ons
• Future work: – Further improvements, including parallelism:
• Trade-‐off with sharing inferences across parallel instances
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 57
Related Projects
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 58
The Data Life Cycle
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 59
Data Quality & Curation Workflows • Collections & occurrence data is
all over the map – … literally (off the map!)
• Issues: – Lat/Long transposition,
coordinate & projection issues – Data entry/creation, “fuzzy”
data, naming issues, bit rot, data conversions and transformations, schema mappings, … (you name it)
• Filtered-Push Collaboration
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 60
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 61
Filtered-Push: Kurator (Data Curation Workflows)
Tianhong Song
Lei Dou (former member)
Sven Köhler
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 62
From Tool Users to Tool Makers
Screen capture… back to the original definition
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 63
Theory meets Prac:ce
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 64
Under the hood: Logic (ASP)
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 65
Summary & Invita:on • Building open source tools for
– Euler: Reasoning about taxonomies (& data integraaon) – Kurator: Data Curaaon workflows
• … and other scienafic workflows
• Topic not covered: – (Game) Theory of Provenance (DAIS talk @CS, 10/7/2014)
• Looking for: – new collaborators, students, ..
• Let’s meet! – [email protected]
B. Ludäscher Euler: Reasoning about Taxonomies CIRSS Seminar 9/19/2014 66