git internals
DESCRIPTION
An explanation about the organization of a Git repo, the type of objects it contains inside and the relations between them.TRANSCRIPT
GIT InternalsPedro Melo <{mailto,xmpp}:[email protected]>
A short GIT History
• 2002 ⇒ Apr 2005: The BitKeeper Wars
• Apr 2005: Episode IV - A New Hope
• July 2005: Hamano is the new maintainer
• Late 2008: GitHub hits the spotlight
“I’m an egotistical bastard, and I name all my projects after myself. First Linux, now git.” – Linus
A short GIT History
• 2002 ⇒ Apr 2005: The BitKeeper Wars
• Apr 2005: Episode IV - A New Hope
• July 2005: Hamano is the new maintainer
• Late 2008: GitHub hits the spotlight
“I’m an egotistical bastard, and I name all my projects after myself. First Linux, now git.” – Linus
Personal take on
GIT rules (without !!)
• Track content, not changes
• Simple repository
• Complex software
• Its easier to update the software, complex to update all the repos so far
Git Mantra: http://bit.ly/git-phylosophy
In other words, I'm right. I'm always right, but sometimes I'm more right than other times. And dammit, when I say "files don't matter", I'm really really Right(tm).
Linus
Strong Points
• Non-Linear development
• Distributed Development
• Centralized development is a subcase
• Efficiency
• Toolkit Design
Objects
• Git repositories store objects
• Stored in the Object Database
• Inside the Git directory
• .git at the root of your project
• Four major object types
• Objects are compressed for storage (zlib)
• SHA1 of header+content ⇒ ID
The Blob
• Files are stored as blobs
• Only content, no metadata
blob [content_size]\0Your content goes here after the header
I like pizza with apples
Meet the blob
The tree
• Trees store directories
• Mode, type, pointer and name
• Recursive, trees can contain trees
• Stored as a simple text file
tree [content_size]\0100644 blob b5f21a README 100644 blob afe433 Makefile.PL 040000 tree a42cd0 lib
Meet the tree
The commit
• The object that makes history
• Pointer to a tree and the parent(s) commits if any
• Author, committer and commit message
commit [content_size]\0tree 23edfcauthor Pedro Melo <[email protected]> 1243036800committer Pedro Melo <[email protected]> 1243036800
commit without a parent
usually called first commit
Meet the commit...
commit [content_size]\0tree fde45cparent 3454dfauthor Pedro Melo <[email protected]> 1243036932committer Pedro Melo <[email protected]> 1243036932
and we fixed that nasty bug
after all, they do tend to crop up
...and its child the other commit
The tag
• A name for a particular commit
• Can contain a message
• Optionally GPG signed
• Allows for cryptographically secure releases
tag [content_size]\0object 123fectype committag v1tagger Pedro Melo <[email protected]> 1243037423
made it to 1.0!
Meet the tag
Git Data Model Recap
• Immutable objects
• A file per object
• Repacked into object packs for efficiency
• Organized as a directed acyclic graph
proj/ Makefile.PL lib/ Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Makefile.PLlib/
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Makefile.PLlib/
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Makefile.PLlib/
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Makefile.PLlib/
Cool.pm
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Makefile.PLlib/
Cool.pm
Makefile.PLlib/
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Makefile.PLlib/
Cool.pm
Makefile.PLlib/
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Makefile.PLlib/
Cool.pm
Makefile.PLlib/
References
• “Names” for commits
• Mutable, they point to a specific commit and move to a new one after each commit
• A branch is a reference, a name to a commit
• Special HEAD reference: points to a reference
Nam
e
Cool.pm
proj/ Makefile.PL lib/ Cool.pm
Makefile.PLlib/
Makefile.PLlib/
Cool.pm
Makefile.PLlib/
mas
ter
HEA
D
mas
ter
HEA
D
mas
ter
HEA
D
test
mas
ter
HEA
Dte
st
mas
ter
HEA
Dte
st
mas
ter
HEA
D
test
mas
ter
HEA
D
test
mas
ter
test
Merge
mas
ter
test
Merge
mas
ter
test
Rebase
mas
ter
test
Rebase
mas
ter
test
Rebase
mas
ter
test
Rebase + Merge
mas
ter
test
Rebase + Merge
Non-SCM uses for Git
• Leverage strengths
• immutable
• over network pulls only missing objects
• fast checkout (compare to copy, less to read)
• easy rollback
Beware of weak points
• Always stores full copy of files
• not good for backups of DB dumps
• Full history ⇒ more disk space
• this might chance as “shallow clones” gain funcionality...
Content distribution
• Updates done in a master, central repository
• Hierarchy of slave repositories
• Fast sync between repositories, fast checkout
• Can be automated with hooks
• Useful if you have lots of static files, faster than rsync
Read-only filesystem
• Design web server that fetch objects directly from the object database
• Compact storage, efficient retrieval
• Packs of objects also very VM friendly, mmap ready
• Some solutions already available OSS
Wiki/Ticketing backend
• Use git repository as storage for wiki or ticketing systems
• Good match for distributed developement
• Several solutions already available OSS
• ... but similar to SCM usages
That’s all folks!
• I’ll be around #codebits, feel free to ask me stuff
• If you want a git as a SCM demo, lets get organized and I’ll do a impromptu presentation, or even private lapdan^H^H^H^H^Hdemos
• After #codebits <{mailto,xmpp}:[email protected]
About Git
About Mehttp://simplicidade.org/notes/
@pedromelo{mailto,xmpp}:[email protected]
skype:melopthttp://github.com/melo
http://www.slideshare.net/melopt
http://git-scm.com/Git Internals: http://peepcode.com/products/git-internals-pdf
Git book: http://progit.org/