pqlite : an overly s implistic q uery l anguage for d ata provenance

Introduction Current Work Design & Implementation Conclusions PQLite: Provenance Query Languag PQLite: An Overly Simplistic Query Language for Data Provenance [email protected] [email protected] CMPS203 Final Project University of California, Santa Cruz Jack Baskin School of Engineering Michael {Leece, Sevilla}

Upload: sulwyn

Post on 13-Jan-2016




0 download


PQLite : An Overly S implistic Q uery L anguage for D ata Provenance. Michael {Leece, Sevilla}. [email protected] [email protected] CMPS203 Final Project University of California, Santa Cruz Jack Baskin School of Engineering. Overview. Introduction Current Work - PowerPoint PPT Presentation


Page 1: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

PQLite:An Overly Simplistic Query Language for

Data Provenance

[email protected]@soe.ucsc.edu

CMPS203 Final ProjectUniversity of California, Santa CruzJack Baskin School of Engineering

Michael {Leece, Sevilla}

Page 2: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Overview• Introduction• Current Work• Design and Implementation• Conclusions

Page 3: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language



• Provenance: history + ancestry of an object [1]– Processes– Data

• Provenance Aware Storage (PASS)– Transparent collection

• PQL: Path Query Language– Useful for provenance



Ancestry Graph

Page 4: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language



• Security• File System Search• The Cloud• New Hierarchical File Systems• Yan Li’s Photo Album



Page 5: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language



• Obtained PASSv2 • Ran PQL query on provenance database– Infinite loops– {}

PQL Broken

PQL Broken

Page 6: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

PQL BrokenPQL UndocumentedOverview

PQL BrokenPQL UndocumentedOverview

Current Work

• Obtained PASSv2 • Ran PQL query on provenance database– Infinite loops– {}

• “The problem with PQL and Sage is that the implementation… is really slow, and it’s perhaps too easy to generate PQL queries that do not return any data.”

– PASS Team

PQL Broken

PQL Broken

Page 7: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

PQL BrokenPQL UndocumentedOverview

PQL BrokenPQL UndocumentedOverview

Current Work

PQL Undocumented

PQL Undocumented

Page 8: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

PQL BrokenPQL UndocumentedOverview

PQL BrokenPQL UndocumentedOverview

Current Work



Waldo Database


Waldo Database


PASSv2 ModulesPASSv2 Modules

Kernel SpaceKernel Space

VFSVFSLasagna FSLasagna FS

App1App1 App2App2

User SpaceUser Space


Page 9: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

• What we have– [ P ] 1.0 INODE 4 INODE 12[ P ] 1.0 NAME 9 "/file.txt"[ P ] 1.0 TYPE 4 "FILE"[ P ] 1.0 FREEZETIME 8 TIME 1329510432.493134083[ P ] 1.0 FREEZETIME 8 TIME

1329510618.420311721[ P ] 1.0 FREEZETIME 8 TIME 1329510676.040716382[AP ] 1.1 INPUT 12 --> 2.1[AP ] 1.2 INPUT 12 --> 8.1[AP ] 1.3 INPUT 12 --> 16.2[ PT] 2.0 ARGV 4 [1]"cat"[ PT] 2.0 ENV 64 [2]"SHELL=/bin/bash" [3]"TERM=xterm" [4]"XDG_SESSION_COOKIE=06c3f2775eb071081dfacb984bf6c364-1329508695.722050-291519720" [5]"USER=root" [6]"LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:" [7]"MAIL=/var/mail/root" [8]"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" [9]"PWD=/test" [10]"LANG=en_US.UTF-8" [11]"SHLVL=1" [12]"HOME=/root" [13]"LOGNAME=root" [14]"LESSOPEN=| /usr/bin/lesspipe %s" [15]"LESSCLOSE=/usr/bin/lesspipe %s %s" [16]"_=/bin/cat" [17]"OLDPWD=/"[ ] 2.0 EXECTIME 8 TIME 1329510428.104272662[ P ] 2.0 TYPE 4 "PROC"[ ] 2.0 PID 4 INT 13739[ P ] 2.0 NAME 8 "/bin/cat"[A ] 2.0 FORKPARENT 12 --> 14762.0[ P ] 2.0 FREEZETIME 8 TIME 1329510428.104272662

• What we want– A list of files or processes that are one-step ancestors of


Use Case

Use Case

Page 10: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Use Case (cont.)

Waldo Database


Waldo Database


Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"

Query Parser


Dump Parser

Ancestry Graph

1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…

Label Map

Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]

Abstract Syntax Tree


[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]

Use Case

Page 11: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Use Case (cont.)

Waldo Database


Waldo Database


Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"

Query Parser


Dump Parser

Ancestry Graph

1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…

Label Map

Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]

Abstract Syntax Tree


[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]

Use Case

Page 12: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Use Case (cont.)

Waldo Database


Waldo Database


Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"

Query Parser


Dump Parser

Ancestry Graph

1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…

Label Map

Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]

Abstract Syntax Tree


[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]

Use Case

Page 13: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Select Statement

Language Specification

Page 14: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Select Statement

Language Specification

Page 15: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation


Language Specification

Page 16: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation


Language Specification

Page 17: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

Use CaseLanguage SpecificationUse CaseLanguage Specification

Design & Implementation

Use Case (cont.)

Waldo Database


Waldo Database


Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt"

Query Parser


Dump Parser

Ancestry Graph

1 -> file.txt2 -> jazz.jpg3 -> bacon.txt…

Label Map

Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")]

Abstract Syntax Tree


[(MyNode "/usr/bin/pico" 1,1,[2]),(MyNode "/usr/bin/vi” 2,3,[17,16,15]),(MyNode "/bin/cat" 1,4,[0])]

Use Case

Page 18: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

What We Did WellLessons LearnedReferences

What We Did WellLessons LearnedReferences


• Functional– It works. (PQLite > PQL)

• Easy to use– Intuitive (SQL-like) way of querying a provenance

graph– Getting stuff we care about

What we did well

What We Did Well

Page 19: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

What We Did WellLessons LearnedReferences

What We Did WellLessons LearnedReferences


• Infinite recursion in parsing– Left recursion in a recursive descent parser– Refined syntax

• Began coding too soon• Monads are useful– IO(), Maybe, State, Parsec

Lessons Learned

Lessons Learned

Page 20: PQLite : An Overly  S implistic  Q uery  L anguage for  D ata Provenance

IntroductionCurrent WorkDesign & ImplementationConclusions

PQLite:Provenance Query Language

What We Did WellLessons LearnedReferences

What We Did WellLessons LearnedReferences


1) Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie. Provenance-Aware Storage Systems. (PDF) Harvard University Computer Science Technical Report TR-18-05, July 2005

2) Stephanie Jones, Christina Strong, Darrell D. E. Long, Ethan L. Miller, Tracking Emigrant Data via Transient Provenance, Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP '11), June 2011.

3) Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A. Holland, Peter Macko, Diana Maclean, Daniel Margo, Margo Seltzer, and Robin Smogor.Layering in Provenance Systems. In proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, June 2009.

4) PQL Language Guide and Reference

