trio: a system for data, uncertainty, and lineage jennifer widom et al stanford university
Post on 21-Dec-2015
215 views
TRANSCRIPT
![Page 1: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/1.jpg)
Trio: A System for Data, Uncertainty, and Lineage
Jennifer Widom et al
Stanford University
![Page 2: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/2.jpg)
2
Depiction of Trio Project
![Page 3: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/3.jpg)
3
Some Context
TrioTrio Project Project They’re building a new kind of DBMS in
which:1.1. DataData2.2. AccuracyAccuracy3.3. LineageLineage
are all first-class interrelated concepts
![Page 4: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/4.jpg)
4
The “Trio” in Trio
1. Data Student #123 is majoring in Econ: (123,Econ)
Major
2. Uncertainty Student #123 is majoring in Econ or CS:
(123, Econ ∥ CS) Major With confidence 60% student #456 is a CS major:
(456, CS 0.6) Major
3. Lineage 456 HardWorker derived from:
(456, CS) Major“CS is hard” some web page
![Page 5: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/5.jpg)
5
Depiction
Data
Uncertainty
Lineage(“sourcing”)
![Page 6: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/6.jpg)
6
Original Motivation for the Project
New Application Domains
• Many involve data that is uncertain (approximate, probabilistic, inexact, incomplete,
imprecise, fuzzy, inaccurate,...)
• Many of the same ones need to track the lineage (provenance) of their data
Neither uncertainty nor lineage is supported in current database systems
Neither uncertainty nor lineage is supported in current database systems
![Page 7: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/7.jpg)
7
Importance of Lineage
[technical] Lineage:1. Enables simple and consistent
representation of uncertain data
2. Correlates uncertainty in query results with uncertainty in the input data
3. Can make computation over uncertain data more efficient
[fluffy] Applications use lineage to reduce or resolve uncertainty
![Page 8: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/8.jpg)
8
Sample Applications
Information extraction• Find & label entities in unstructured text• Often probabilistic
Information integration• Combine data from multiple sources• Inconsistencies
Scientific experiments• Inexact/incomplete data• Many levels of “derived data products”
![Page 9: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/9.jpg)
9
Sample Applications
Sensor data management• Approximate readings• Missing readings• Levels of data aggregation
Deduplication (“data cleaning”)• Object linkage, entity resolution• Often heuristic/probabilistic
Approximate query processing• Fast but inexact answers
![Page 10: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/10.jpg)
10
The “Usual” DBMS Features
(From first lecture of any Intro to Databases class)
1. Efficient,
2. Convenient,
3. Safe,
4. Multi-User storage of and access to
5. Massive amounts of
6. Persistent data
![Page 11: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/11.jpg)
11
Completeness vs. Closure
All sets-of-instances
Representablesets-of-instances
Op1
Op2
Proposition:Proposition: An incomplete representationis still interesting if it’s expressive enoughand closed under all required operations
Completeness:Completeness:blue=yellow
Closure:Closure:arrow stays
in blue
![Page 12: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/12.jpg)
12
Operations: Semantics
Easy and natural (re)definition for any standard
database operation (call it Op)
Closure:Closure:up-arrow
always exists
Note: Completeness Completeness Closure Closure
D
I1, I2, …, In J1, J2, …, Jm
D’
possibleinstances
Op on eachinstance
rep. ofinstances
Op’ directimplementation
![Page 13: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/13.jpg)
13
Incompleteness
person day
Jennifer Monday
Mike Tuesday
person day
Jennifer Monday
Instance1Instance2
person day
Mike Tuesday
Instance3
person day
Jennifer Monday
Mike Tuesday ??
?? generates 4th instance:empty relation
![Page 14: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/14.jpg)
14
Non-Closure Under Join
person day
Mike{Monday,Tuesday
}
day food
Monday chicken
Tuesday fish⋈
Result has two possible instances:person day food
MikeMonda
y chicken
Instance1
person day food
MikeTuesda
y fish
Instance2
Not representable with or-sets and ?
![Page 15: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/15.jpg)
15
Another “Trio” in Trio
1. Data Model Simplest extension to relational model that’s
sufficiently expressive
2. Query Language Simple extension to SQL with well-defined
semantics and intuitive behavior
3. System A complete open-source DBMS that people
want to use
![Page 16: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/16.jpg)
16
Another “Trio” in Trio
1. Data Model Uncertainty-Lineage Databases (ULDBs)
2. Query Language TriQL
3. System Trio-One — built on top of standard DBMS
![Page 17: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/17.jpg)
17
Remainder of Talk
1. Data Model Uncertainty-Lineage Databases (ULDBs)
2. Query Language TriQL
3. System Trio-One — built on top of standard DBMS
4. Demo
![Page 18: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/18.jpg)
18
Quote from Jennifer
We are not about machine learning or probabilistic reasoning!
We are about efficient and convenient storage, manipulation, and retrieval of large data sets (with uncertainty and lineage in them)
![Page 19: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/19.jpg)
19
Running Example: Crime-Solving
Saw (witness, color, car) // may be uncertain
Drives (person, color, car) // may be uncertain
Suspects (person) = πperson(Saw ⋈ Drives)
![Page 20: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/20.jpg)
20
In Standard Relational DBMS
Drives (person, color, car)
Jimmy red Toyota
Billy blue Honda
Frank red Mazda
Frank green
Mazda
Saw (witness, color, car)
Amy red Mazda
Betty blue Honda
Carol green Toyota
Suspects
Frank
Billy
Create Table Suspects asSelect personFrom Saw, DrivesWhere Saw.color = Drives.color And Saw.car = Drives.car
Create Table Suspects asSelect personFrom Saw, DrivesWhere Saw.color = Drives.color And Saw.car = Drives.car
![Page 21: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/21.jpg)
21
Data Model: Uncertainty
An uncertain database represents a set of possible instances
• Amy saw either a Honda or a Toyota
• Jimmy drives a Toyota, a Mazda, or both
• Betty saw an Acura with confidence 0.5 or a Toyota with confidence 0.3
• Hank is a suspect with confidence 0.7
![Page 22: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/22.jpg)
22
Their Model for Uncertainty
1. Alternatives
2. ‘?’ (Maybe) Annotations
3. Confidences
![Page 23: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/23.jpg)
23
Our Model for Uncertainty
1. Alternatives: uncertainty about value
2. ‘?’ (Maybe) Annotations
3. ConfidencesSaw (witness, color, car)
Amy red, Honda ∥ red, Toyota ∥ orange, Mazda
Three possible
instances
![Page 24: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/24.jpg)
24
Six possibleinstances
Our Model for Uncertainty
1. Alternatives
2. ‘?’ (Maybe): uncertainty about presence
3. Confidences
?
Saw (witness, color, car)
Amy red, Honda ∥ red, Toyota ∥ orange, Mazda
Betty blue, Acura
![Page 25: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/25.jpg)
25
Our Model for Uncertainty
1. Alternatives
2. ‘?’ (Maybe) Annotations
3. Confidences: weighted uncertainty
Six possible instances, each with a probability
?
Saw (witness, color, car)
Amy red, Honda 0.5 ∥ red, Toyota 0.3 ∥ orange, Mazda 0.2
Betty blue, Acura 0.6
![Page 26: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/26.jpg)
26
Models for Uncertainty
Our model (so far) is not especially new
We spent some time exploring the space of models for uncertainty
Tension between understandability and expressiveness– Our model is understandable
– But it is not complete, or even closed under common operations
![Page 27: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/27.jpg)
27
Our Model is Not Complete or Closed
Saw (witness, car)
Cathy
Honda ∥ Mazda
Drives (person, car)
Jimmy, Toyota ∥ Jimmy, Mazda
Billy, Honda ∥ Frank, Honda
Hank, Honda
Suspects
Jimmy
Billy ∥ Frank
Hank
Suspects = πperson(Saw ⋈ Drives)
???
Does not correctlycapture possibleinstances in theresult
CANNOT
![Page 28: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/28.jpg)
28
to the Rescue
Lineage (provenance): “where data came from”• Internal lineage
• External lineage
In Trio: A function λ from data elements to other data elements (or external sources)
Lineage
![Page 29: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/29.jpg)
29
Example with Lineage
ID Saw (witness, car)
11
Cathy
Honda ∥ Mazda
ID Drives (person, car)
21
Jimmy, Toyota ∥ Jimmy, Mazda
22
Billy, Honda ∥ Frank, Honda
23
Hank, Honda
ID Suspects
31
Jimmy
32
Billy ∥ Frank
33
Hank
Suspects = πperson(Saw ⋈ Drives)
???
λ(31) = (11,2),(21,2)λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)
λ(33) = (11,1), 23
Correctly captures possible instances inthe result
![Page 30: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/30.jpg)
30
Example with Lineage
ID Saw (witness, car)
11
Cathy
Honda ∥ Mazda
ID Drives (person, car)
21
Jimmy, Toyota ∥ Jimmy, Mazda
22
Billy, Honda ∥ Frank, Honda
23
Hank, Honda
ID Suspects
31
Jimmy
32
Billy ∥ Frank
33
Hank
Suspects = πperson(Saw ⋈ Drives)
???
![Page 31: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/31.jpg)
31
Trio Data Model
1. Alternatives
2. ‘?’ (Maybe) Annotations
3. Confidences
4. Lineage
ULDBs are closed and complete
Uncertainty-Lineage Databases (ULDBs)Uncertainty-Lineage Databases (ULDBs)
![Page 32: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/32.jpg)
32
Formal Definition of ULDB
![Page 33: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/33.jpg)
33
Formal definition of Completeness
![Page 34: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/34.jpg)
34
Proof of Completeness
Proof: Construct R with x-relations S1, .., Sn, corresponding to R1, .. ,Rn
Construct an extra relation PW that encodes the possible instances. PW contains exactly one x-tuple: (1)|(2)|...|(m).
Each Si is constructed as follows,
For every Pj , each tuple t in Ri forms a maybe x-tuple with just one alternative with value t.
Duplicates within and across possible instances are preserved in Si.
We add (j) in PW to the lineage of alternatives in tuples copied from Pj .
This now exactly encodes the data in each of the possible instances.
![Page 35: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/35.jpg)
35
Continuous, Proof …
The correct lineage is obtained as follows,
We look at the lineage j in Pj and mimic it in the x-tuples it contributes in S1 through Sn.
For example, if j(t1) = {t2} in Pj, where t1 is a subset of R1 and t2 is a subset of R2, then the x-tuple that t2 gave in S2 is added to the lineage of the x-tuple from t1 in S1.
As a final step, we remove the extra relation PW but retain its symbols as external lineage.
Therefore, each possible LDB of D now has the same schema as each Pj , and represents exactly the same data and internal lineage.
![Page 36: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/36.jpg)
36
ULDBs: Minimality
A ULDB relation R represents a set of possible
instances
• Does every tuple in R appear in some possible instance? (no extraneous tuples)
• Does every maybe-tuple in R not appear in some possible instance? (no extraneous ‘?’s)
• Also
Data-minimalityData-minimality
Lineage-minimalityLineage-minimality
![Page 37: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/37.jpg)
37
Data Minimality Examples
Extraneous ‘?’
. . .
10
Billy, Honda ∥ Frank, Mazda
. . .
. . .
20
Billy ∥ Frank
. . .
?λ(20,1)=(10,1); λ(20,2)=(10,2)
extraneous
![Page 38: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/38.jpg)
38
Data Minimality Examples
Extraneous tuple
Diane Mazda ∥ Acura
Dianeextraneous?
?? Diane AcuraDiane Mazda
![Page 39: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/39.jpg)
39
Querying ULDBs
Simple extension to SQL
Formal semantics, intuitive meaning
Query uncertainty, confidences, and lineage
TriQLTriQL
![Page 40: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/40.jpg)
40
Simple TriQL Example
ID Saw (witness, car)
11
Cathy
Honda ∥ Mazda
ID Drives (person, car)
21
Jimmy, Toyota ∥ Jimmy, Mazda
22
Billy, Honda ∥ Frank, Honda
23
Hank, Honda
ID Suspects
31
Jimmy
32
Billy ∥ Frank
33
Hank
???
λ(31)=(11,2),(21,2)λ(32,1)=(11,1),(22,1); λ(32,2)=(11,1),(22,2)
λ(33)=(11,1),23
Create Table Suspects asSelect personFrom Saw, DrivesWhere Saw.car = Drives.car
Create Table Suspects asSelect personFrom Saw, DrivesWhere Saw.car = Drives.car
![Page 41: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/41.jpg)
41
Formal Semantics
Relational (SQL) query Q on ULDB D
DD
D1, D2, …, DnD1, D2, …, Dn
possibleinstances
Q on eachinstance
representationof instances
Q(D1), Q(D2), …, Q(Dn)Q(D1), Q(D2), …, Q(Dn)
D’D’implementation of Q
operational semanticsD + ResultD + Result
![Page 42: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/42.jpg)
42
TriQL: Querying Confidences
Built-in function: Conf()
SELECT person FROM Saw, Drives WHERE Saw.car = Drives.car AND Conf(Saw) > 0.5 AND Conf(Drives) >
0.8
![Page 43: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/43.jpg)
43
TriQL: Querying Lineage
Built-in join predicate: Lineage()
SELECT Suspects.person FROM Suspects, Saw WHERE Lineage(Suspects,Saw) AND Saw.witness = ‘Amy’
X ==> Y shorthand for Lineage(X,Y)
![Page 44: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/44.jpg)
44
Operational Semantics
Over standard relational database:
For each tuple in cross-product of X1, X2, ..., Xn
1. Evaluate the predicate
2. If true, project attr-list to create result tuple
SELECT attr-listFROM X1, X2, ..., Xn
WHERE predicate
![Page 45: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/45.jpg)
45
Operational Semantics
Over ULDB:For each tuple in cross-product of X1, X2, ..., Xn
1. Create “super tuple” T from all combinations of alternatives
2. Evaluate predicate on each alternative in T ; keep only the true ones
3. Project attr-list on each alternative to create result tuple
4. Details: ‘?’, lineage, confidences
SELECT attr-listFROM X1, X2, ..., Xn
WHERE predicate
![Page 46: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/46.jpg)
46
Operational Semantics: Example
SELECT personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness, car)
Cathy Honda ∥ Mazda
Drives (person, car)
Jim ∥ Bill Mazda
Hank Honda
![Page 47: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/47.jpg)
47
Operational Semantics: Example
SELECT personFROM Saw, DrivesWHERE Saw.car = Drives.car
(Cathy,Honda,Jim,Mazda) (Cathy,Honda,Bill,Mazda) (Cathy,Mazda,Jim,Mazda) (Cathy,Mazda,Bill,Mazda)∥ ∥ ∥
Saw (witness, car)
Cathy Honda ∥ Mazda
Drives (person, car)
Jim ∥ Bill Mazda
Hank Honda
![Page 48: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/48.jpg)
48
Operational Semantics: Example
SELECT personFROM Saw, DrivesWHERE Saw.car = Drives.car
(Cathy,Honda,Jim,Mazda) (Cathy,Honda,Bill,Mazda)∥ ∥(Cathy,Mazda,Jim,Mazda) (Cathy,Mazda,Bill,Mazda)∥
Saw (witness, car)
Cathy Honda ∥ Mazda
Drives (person, car)
Jim ∥ Bill Mazda
Hank Honda
![Page 49: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/49.jpg)
49
Operational Semantics: Example
SELECT personFROM Saw, DrivesWHERE Saw.car = Drives.car
(Cathy,Honda,Jim,Mazda) (Cathy,Honda,Bill,Mazda) (Cathy,Mazda,∥ ∥ Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
Saw (witness, car)
Cathy Honda ∥ Mazda
Drives (person, car)
Jim ∥ Bill Mazda
Hank Honda
![Page 50: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/50.jpg)
50
Operational Semantics: Example
(Cathy,Honda,Jim,Mazda) (Cathy,Honda,Bill,Mazda) (Cathy,Mazda,∥ ∥ Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
SELECT personFROM Saw, DrivesWHERE Saw.car = Drives.car
(Cathy,Honda,Hank,Honda) (Cathy,Mazda,Hank,Honda)∥
Saw (witness, car)
Cathy Honda ∥ Mazda
Drives (person, car)
Jim ∥ Bill Mazda
Hank Honda
![Page 51: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/51.jpg)
51
Operational Semantics: Example
(Cathy,Honda,Jim,Mazda) (Cathy,Honda,Bill,Mazda) (Cathy,Mazda,∥ ∥ Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
SELECT personFROM Saw, DrivesWHERE Saw.car = Drives.car
(Cathy,Honda,Hank,Honda) (Cathy,Mazda,Hank,Honda)∥
Saw (witness, car)
Cathy Honda ∥ Mazda
Drives (person, car)
Jim ∥ Bill Mazda
Hank Honda
![Page 52: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/52.jpg)
52
Operational Semantics: Example
(Cathy,Honda,Jim,Mazda) (Cathy,Honda,Bill,Mazda) (Cathy,Mazda,∥ ∥ Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
SELECT personFROM Saw, DrivesWHERE Saw.car = Drives.car
(Cathy,Honda,Hank,Honda) (∥ Cathy,Mazda,Hank,Honda)
Saw (witness, car)
Cathy Honda ∥ Mazda
Drives (person, car)
Jim ∥ Bill Mazda
Hank Honda
![Page 53: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/53.jpg)
53
Additional TriQL Constructs
• “Horizontal subqueries” Refer to tuple alternatives as a relation
• Aggregations: low, high, expected
• Unmerged (horizontal duplicates)
• Flatten, GroupAlts
• NoLineage, NoConf, NoMaybe
• Query-computed confidences
• Data modification statements
![Page 54: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/54.jpg)
54
Trio-Specific Additional Features
• Lineage tracing
• On-demand confidence computation
• Coexistence checks
• Extraneous data removal
Interrelated algorithms
![Page 55: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/55.jpg)
55
The Trio System
Version 1 (“Trio-One”)On top of standard DBMS
Surprisingly easy and complete, reasonably efficient
![Page 56: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/56.jpg)
56
Trio-One Overview
Standard relational DBMS
Trio API and translator(Python)
Trio API and translator(Python)
Command-lineclient
Command-lineclient
TrioMetadat
a
TrioExplorer(GUI client)
TrioExplorer(GUI client)
Trio Stored
Procedures
EncodedData
TablesLineageTables
Standard SQL• Partition and “verticalize”• Shared IDs for alternatives• Columns for confidence,“?”• One per result table• Uses unique IDs• Encodes formulas
• Table types• Schema-level lineage structure• Conf()• Lineage()
• DDL commands• TriQL queries• Schema browsing• Table browsing• Explore lineage• On-demand confidence computation
![Page 57: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/57.jpg)
57
Strengths
First DBMS with uncertainty and lineage
Has many applications like I showed earlier
Done by Stanford!
![Page 58: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/58.jpg)
58
Weaknesses
Paper has lots of definitions rather than explanations.
Proofs are written in whole text rather multiple lines with math symbols.
![Page 59: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/59.jpg)
59
Future Work
More forms of uncertainty• Continuous uncertainty (intervals, Gaussians)• Correlated uncertainty• Incomplete relations
More forms of lineage• External lineage• Update lineage
Confidence-based queries• Threshold; “Top-K”
![Page 60: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/60.jpg)
60
Future Work
Conjunctive lineage sufficient for most operations• Disjunctive lineage for duplicate-elimination
• Negative lineage for difference
General case after several queries: • Boolean formula
![Page 61: Trio: A System for Data, Uncertainty, and Lineage Jennifer Widom et al Stanford University](https://reader031.vdocument.in/reader031/viewer/2022013011/56649d5d5503460f94a3be86/html5/thumbnails/61.jpg)
Search “stanford trio”