how to store large binary files in git repositories

Post on 15-Apr-2017

21.227 Views

Category:

Software

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

!

How to store large binaryfiles in git repositories

Storing large binary files inGit repositories seems tobe a bottleneck for manyGit users.

Because of it'sdecentralized nature,changes in large binaryfiles can cause Gitrepositories to grow bythe size of the file afterevery commit.

Luckily there aremultiple3rd party workaroundsthat try to solve theproblem.

Here are seven alternative approaches forhandling large binary files in Git repositories.

1. Git Annex

Git-annex works by storing the contents of filesbeing tracked by it to separate location.What'sstored in the repo, is a symlink to the to the keyunder the separate location.

In order to share the large binary files betweena team, tracked files need to be stored to adifferent backend.

Pros

• Supports multipleremotes that you canstore the binaries.

• Can be usedwithoutsupport from hostingprovider.

Cons

•Users need to learnseparate commands for

day-to-day work

2. Git Large File Storage(Git LFS)

In Git LFS, instead of writing large blobs to aGit repository, only a pointer file is written. Theblobs are written to a separate server using theGit LFS HTTP API. The API endpoint can beconfigured based on the remote which allowsmultiple Git LFS servers to be used.

Git LFS requires a specific serverimplementation to communicate with, and usesfilters, meaning that you only need to specifythe tracked files with one command.

Pros

•Github behind it.

• Ready binariesavailable tomultipleoperating systems.

• Easy to use.

• Transparent usage.

Cons

• Requires a customserver implementation

to work.

• API not stable yet.

• Performance penalty.

3. Git-bigfiles

Git-bigfiles makes life bearable for peopleusing Git on projects with very large files,merging back asmany changes as possible intoupstreamGit.

Git-bigfiles is a fork of Git, however, the projectseems to have been untouched for some time.

Pros

• If the changes were tobe backported, theywould be supported bynative Git operations.

Cons

• The project is dead.

• Fork of Git mightcause compatibility

issue.

• Only allowsconfiguring threshold offile size when tracking a

large file.

4. Git-fat

Git-fat works in a similar manner as git lfs.Large files can be tracked using filters in`.gitattributes` file. Large files are stored to anyremote that can be connected through rsync.

Pros

• Transparent usage.

Cons

• Supports only rsyncas backend.

5. Git-media

Git media is probably the oldest of thesolutions available. It also uses a filterapproach, and supports Amazon's S3, localfilesystem path, SCP, atmos andWebDAV asthe backend for storing large files.

Pros

• Supports multiplebackends

• Transparent usage

Cons

•No longer developed.

• Ambiguous commands(e.g. git update-index --

really refresh).

• Not fullyWindowscompatible.

6. Git-bigstore

Git-bigstore was initially implemented as analternative to git-media. It also works bystoring a filter property to `.gitattributes` forcertain file types.

Git-bigstore supports Amazon S3, GoogleCloud Storage, or Rackspace Cloud account asbackends for storing binary files. git-bigstoreclaims to improve the stability whencollaborating betweenmultiple people.

Pros

• Requires only Python2.7+

• Transparent usage.

Cons

•Only cloud basedstorages supported at

themoment.

Git-sym is the newest player in the field,offering an alternative to how large files arestored and linked in git-lfs, git-annex, git-fatand git-media. Instead of calculating thechecksums of the tracked large files, git-symrelies on URIs.

The benefits of git-sym are performance aswell as ability to symlink whole directories,though because of its nature, themaindownfall is that it does not guarantee dataintegrity.

Because of its nature, themain downfall is thatit does not guarantee data integrity. Git-sym isused using separate commands. Git-sym alsorequires Rubywhichmakes it more tedious toinstall onWindows.

Pros

• Performancecompared to solutionsbased on filters.

• Support for multiplebackends.

Cons

•Does not guaranteedata integrity.

• Complex commands.

!

Howhave you solved theproblem of storing largefiles in git repositories?

top related