hg version control bioinformaticians

30
Giovanni Dall'Olio, IBE (UPF-CEXS) Introduction to version control and hg for our bioinformatics group

Upload: giovanni-dallolio

Post on 01-Dec-2014

1.281 views

Category:

Technology


0 download

DESCRIPTION

a short talk I gave to my group to explain the basics of HG and version control

TRANSCRIPT

Page 1: Hg version control bioinformaticians

Giovanni Dall'Olio,IBE (UPF-CEXS)

Introduction to version control and hg for our bioinformatics

group

Page 2: Hg version control bioinformaticians

What is hg?

● Programmers use software to keep track of all the versions of the code they write. These are called Version Control Systems (VCS)

● There are many software to make VCS; the most renown are cvs, subversion, git, hg, bazaar

● Git, hg and bazaar are newer and based on an improved paradigm called Distributed Version Control System (DVCS)

Page 3: Hg version control bioinformaticians

How will hg be useful for us?

● Keep versions of the scripts we create● also for the datasets, results, etc..

● Have a common and official version of the pipeline and the scripts, on bitbucket.org

● Everybody will work on his computer on his version of the scripts; every once in a while, he will merge it with the official version

Page 4: Hg version control bioinformaticians

Installing hg

● Hg can run on any operating system● On linux, install it through your software center

● sudo apt-get install mercurial

● On other OS, go to http://mercurial.selenic.com/ and download the installer

Page 5: Hg version control bioinformaticians

Initial hg configuration

● Hg stores its configuration in a file called:● ~/.hgrc on Unix● C:\Documents and Settings\your_name\.hgrc

● Open it and write your username:

[ui]username = Giovanni Dall'Olio <[email protected]>

Page 6: Hg version control bioinformaticians
Page 7: Hg version control bioinformaticians

The basic operations of a VCS

● Creating a repository● Can be equivalent to 'start keeping track of the

version of the files in this project'

● Adding files to the repository● Files are not tracked unless you say so

● Committing changes● Saving a version of the actual state of the files

● Pushing the changes and merging them with the standard version

Page 8: Hg version control bioinformaticians

Creating a repository

● Create a new directory and create the repo with:● hg init

Page 9: Hg version control bioinformaticians

Effect of creating a new repo

● An hidden directory (.hg) will be created● From now on, it will be possible to give other hg

commands

Page 10: Hg version control bioinformaticians

Adding files to the repo

● By default, no files are added to the repository● It means that if you create a new file in the

directory, hg will ignore it

Page 11: Hg version control bioinformaticians

Creating a file

Page 12: Hg version control bioinformaticians

Files are not added automatically to the repo

● The command:● hg log file.txt

● should return the historial of changes of the file file.txt. Since it is not in the repo yet, nothing is shown

Page 13: Hg version control bioinformaticians

hg add

● To add a file to the repository, use hg add● This will mean that the software should record

all the changes on that file

Page 14: Hg version control bioinformaticians

Committing changes

● The most important operation in VCS is the commit

● This operation saves the status of the files tracked and associate it with a version

● One commit → one version

Page 15: Hg version control bioinformaticians

Committing a change

● We have added the file file.txt to the repo● This is a change compared to the previous

version (where this file was not present)● So we have to record it with a commit

Page 16: Hg version control bioinformaticians

Our first commit

Page 17: Hg version control bioinformaticians

Effects of adding a file and committing

● From now on, all the changes made to the file will be tracked

Page 18: Hg version control bioinformaticians

What is being 'committed'?

● Every time you commit a new version, hg stores the set of changes since the previous version

● Other old VCS stored a copy of all the files for each version● => very big disk space occupation

● By storing only the changes, hg occupies less space and makes it easier to compare versions

Page 19: Hg version control bioinformaticians

Hg diff

● The hg diff command will show the differences between the file and its last saved version

Page 20: Hg version control bioinformaticians

Hg log

● Hg log will show the history of the changes in the repository

Page 21: Hg version control bioinformaticians

Hg log

Page 22: Hg version control bioinformaticians

The story continues..

● The basic operations in a VCS are adding files to the tracking, and commit changes

● Next week we will see how to keep a copy of our repository on a remote server, and how to collaborate with other people

● Now I will show you some example of using a version control system

Page 23: Hg version control bioinformaticians

Example: backup

● Imagine that for error, you remove a file or a directory from your project

● With a VCS, you can revert to the previous version and get the files back

Page 24: Hg version control bioinformaticians

Example: tracking code

● VCS have been developed to track changes in the code● Return to the point where you have made a mistake

or a typo● Implementing a parallel version of the code, like

trying a different library or approach (branching)● Remember what you have been doing, when you

have to change code written months ago

Page 25: Hg version control bioinformaticians

Example: releasing a software

● Mr. Werewolf publishes a software to predict when the moon will be full

● The code gets adopted by the werewolf community. Papers got published using it

● At a certain point, another werewolf discover a bug in the code. It will be possible to seek the version where the error occurred and identify all the versions affected

Page 26: Hg version control bioinformaticians

Example: tracking data

● Version control can be applied to a dataset ● Example: Mr Dracula wants to write a paper on

the quality of the blood in his neighborhood. Every time he gets new data, he commits a change

Page 27: Hg version control bioinformaticians

Tracking everything else

● VCS can be applied to many kinds of file● Usually they do not support binary files● OpenOffice documents can be tracked (they

are XML)

Page 28: Hg version control bioinformaticians

Tracking huge files

● Hg stores the differences between two versions● Storing all the 1000g will take:

● Some gigabytes to store a compressed version of the files

● Less space to store the following commits (but these commits will take time)

● Maybe it is not worth to put gigabytes of data under version control● No solution to date● Some hg extensions for big files

Page 29: Hg version control bioinformaticians

How frequently should I commit?

● Everybody has his/her own phylosophy● Some people prefer to commit every smallest

change● Others prefer to make only a big commit every day

● As a general rule:● The biggest the commit is, the most difficult is to

integrate it if there are conflicts● It's up to you to decide

Page 30: Hg version control bioinformaticians

How to write the perfect commit messages

● One or two sentences● Avoid generic messages

● “new changes”, “fixed bugs”

● Use tags like 'Fix', 'Add', 'Config', etc..:● “Fix: error when reading file”● “Add: new function for plotting results”

● Cite the files changed if you think it may be useful:● Implemented new sorting algorithm for sorting.py