irmin: a git-like database librarygazagnaire.org/pub/2015.01.oups.pdf · irmin: a git-like database...

30
Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge [email protected] @samoht (Github) @eriangazag (Twitter) http://openmirage.org/

Upload: others

Post on 08-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Irmin: a Git-like database library

OUPS, Jan 2015

Thomas GazagnaireUniversity of Cambridge

[email protected]@samoht (Github)

@eriangazag (Twitter)http://openmirage.org/

Page 2: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Summary

1. Distributed Systems (and Git)

2. Persistent data-structures (and Git)

3. Irmin (and Git)

Page 3: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Distributed Systems (and Git)

Page 4: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

(Distributed) Version Controlled Systems

(D)VCS are used by (almost) all software projects on the planet

I Huge open-source ecosystem (tig, gitk, . . . )

I Very successful online hosting: https://github.com

I 3.4M usersI 16.7M projects

Page 5: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

(Distributed) Version Controlled Systems

Usual DVCS properties:

I Track history of changes:

I In source filesI Made by humans

I Distributed: no global state, every user has its own local state

I Works offline

I Concurrent: multiple independent agents

I Slow change rate

I Command-line tool

I Semi-automatic merges: users resolve conflicts manually

Page 6: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

(Distributed) Version Controlled SystemDatabase

What would a similar model for data would look like?

I Track history of changes

I In source file structured dataI Made by humans applications

I Distributed: no global state, every user has its own local state

I Works offline

I Concurrent: many multiple independent agents

I Slow High change rate

I Command-line tool Library

I Semi-automatic merges

Page 7: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

(Distributed) Version Controlled Database

Branch consistency:

I A branch per process stateI Synchronization between branchesI Explicit and local merges

branch A

branch B

branch C branch A

branch B

branch C

process A process C

sync

sync

merge merge

Page 8: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

(Distributed) Version Controlled Database

Transactions:

I Record reads and writesI A temporary branch per transactionI Committing the transaction ∼ merging the branchI EAGAIN ∼ conflict

start t1

end t1

start t2 start t3

Page 9: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Interlude: merges

Page 10: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Yes, but what about merges?

I The nightmare of every Git user:

$ git merge X

Auto-merging <PATH>

CONFLICT (content): Merge conflict in <PATH>

Automatic merge failed; fix conflicts and then

commit the result.

Page 11: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

User-defined Merges

We resolve and deal with conflicts progammatically:

I The data in the database has a structure (i.e. a type)I The merge functions are provided by the application developerI Having the full history (eg. git log) helps a lot:

I computation of the least common ancestor(s)I 3-way merge as in Git/Mercurial

old

x y

?

Page 12: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

User-defined Merges

module Merge: sig

type ’a result = [ ‘Ok of ’a | ‘Conflict of string ]

val ok: ’a -> ’a result

val conflict: string -> ’a result

val (>>|): ’a result -> (’a -> ’b result) -> ’b result

type ’a t = old:’a -> ’a -> ’a -> ’a result

val pair: ’a t -> ’b t -> (’a * ’b) t

val set: (module Set.S with type t = ’a) -> ’a t

val alist: (’a -> ’b option t) -> (’a * ’b) list t

end

Page 13: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

User-defined Merges

module type Contents: sig

type t

val read: Mstruct.t -> t

val write: t -> Cstruct.t -> Cstruct.t

[..]

type path

val merge: path -> t option Merge.t

end

Page 14: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

User-defined Merges

Distributed counters:

6

5 9

?

(incr; incr; incr)(incr; decr; decr)

module Counter = struct

type t = int

let incr i = i+1

let decr i = 1-1

let merge _path ~old x y = ok (x + y - old)

end

Page 15: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

User-defined Merges

A more complex example: Mergeable Queues [JFLA’15]I old

A

B

C

D

I 1

A

B

C

D

E

I 2

B

C

D

F

G

I 1

B

C

D

E

I 2

F

G

I 1

B

C

D

E

I 2

F

G

Page 16: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Persistent Data-structures (and Git)

Page 17: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Persistent Data-structures (Okasaki)Purely functional data-structures

I structures are immutableI share substructures when possible: allow to have multiple

versions of a structure with reasonable space usage

a b

x

a

y

a

a

x

a b

a

/a/a = x/b = ..

/a/a = x/b = ..

/a/a = y/b = ..

I the Bible: Okasaki (lots of useful data-structures)

Page 18: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Persistent Data-structures (Git)

Git Blobs

I raw user-contentsI filename is the hash of the contents

I deduplication, implicit hash-consing

type t = string

let write blob =

let key = SHA1.digest blob in

let oc = open_out (filename key) in

(* [filename x] is ‘.git/objects/<hash(x)>‘ *)

output_string oc (compress blob)

Page 19: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Persistent Data-structures (Git)

Git Trees

I model the filesystemI filename is the hash of the marshaled tree

I use the hash of blobs (leafs) and other tree (sub-nodes)I deduplication, implicit hash-consingI implicit immutable prefix-tree

type entry = {perm: [‘Normal | ‘Exec | ‘Dir | ..];

name: string;

node: SHA1.t;

}type t = entry list

Page 20: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Persistent Data-structures (Git)

Git Commits

I model the historyI filename is the hash of the marshaled commit

I use the hash of trees (contents) and other commits (parents)I deduplication, implicit hash-consingI implicit immutable partial-order (Hasse diagramm)

type t = {tree : SHA.Tree.t;

parents : SHA.Commit.t list;

author : User.t;

committer: User.t;

message : string;

}

Page 21: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Persistent Data-structures (Git)

Git references

I The tip of each branch is kept in a mutable store

I The name of the current branch:

$ cat .git/HEAD

ref: refs/heads/master

I The hash of the corresponding commit:

$ cat .git/refs/heads/master

dba4f04b8cc9ad418e745b287d2d690e170fa0af

Page 22: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Persistent Data-structures

I Git exposes immutable datastructuresI Irmin uses user-defined merge functionsI Local mutable state is kept in a (small) local store

Branch CBranch A

Object DAG

Mutable references

Immutable

Page 23: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Irmin

Page 24: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Irmin Highlights

I On-disk format fully compatible with the Git format

I Full log of read/write access to the databaseI Usual Git tools works for auditing

I classical REST API

I Easy to build new Javascript UII Easy to integrate with 3-tiers applications

I Very portable

I Run with MirageI Can run directly inside the browser

Page 25: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Irmin backends

Append-only backends to store objects

module type AO_MAKER =

functor (K: Hash.S) ->

functor (V: Tc.S0) -> sig

type t

type key = K.t

type value = V.t

val create: config -> (’a -> task) -> (’a -> t)

val read: t -> key -> value option

val mem: t -> key -> bool

val add: t -> value -> key

end

Page 26: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Irmin backends

Read-write backends to store references

module type RW_MAKER =

functor (K: Hum.S) ->

functor (V: Hash.S) ->

type t

type key = K.t

type value = V.t

val create: config -> (’a -> task) -> (’a -> t)

val read: t -> key -> value option

val mem: t -> key -> bool

val update: t -> key -> value -> unit

val remove: t -> key -> unit

end

Page 27: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Irmin backends

Current:

I In-memoryI GitI Bin protI HTTP REST clientI Obj

Future:

I Encryption proxyI DHTI DNS

Page 28: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Irmin Roadmap

ocaml-git (Git formats and protocols in pure OCaml):

I last release in opam: 1.4.5

I support serialization of all Git objects (including pack files)I support fetch/push client protocols (git://, git+ssh and

smart HTTP)

I missing

I compression of pack files (and git gc)I server-side protocols

Imin:

I last release in opam: 0.9.2I 1.0.0 should be available at the end of the month

I with full integration with MirageI proper documentation for the REST API

Page 29: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Irmin Users

I Xenstore + Irmin (Citrix)

I Version Controlled IMAP server (demo available):

opam install imaplet-lwt

I Distributed Logs (pre-pre-alpha):https://github.com/samoht/dog

I You?

Page 30: Irmin: a Git-like database librarygazagnaire.org/pub/2015.01.OUPS.pdf · Irmin: a Git-like database library OUPS, Jan 2015 Thomas Gazagnaire University of Cambridge thomas@gazagnaire.org

Thanks for your attention!Questions?