21st September 2015

Using Git for version control – the benefits

git-for-version-control

Having been a regular user of CVS (Concurrent version system) for many years for source control, the transition to using Git was quite tricky at first as the concept is quite different. But it is well worth making the change as the benefits, in my opinion, are manyfold.

So in this article, I am going to outline some of the terminology of Git, how it differs from CVS and what the benefits are that I have discovered. I will attempt to do this using understandable terminology (if possible!).

A disclaimer: this is my understanding of Git which may differ from the ‘official’ descriptions of how it actually works.

Your own repository

The first difference that you notice when moving to Git from CVS is that in Git each user has their own source repository. In CVS, there will be one remote repository on a server. A user connects to this server and checks out a project which in effect copies the entire project to the user’s computer. This is known as centralised or Client-Server version control.

Git is what is known as a Distributed Revision Control system. There is no central repository, each user commits changes to their own local repository and then changes can be pushed to (or pulled from) other repositories.

This feels a bit alien at first to a user who is used to CVS and some programmers feel more comfortable with a central repository. This can still be done with Git – a remote repository can be set up on a server and users can then push to and pull from it into their own repository.

So what are the benefits of having your own local repository? Well, the main benefits I have seen are in performance. Interrogating local history is pretty much instantaneous and committing changes is also very quick as no server connection is required. The other big win is that a user can quite happily work offline, committing changes, creating branches etc. and then changes can be pushed to a remote repository at a later date when the user has a connection.

Where are the version numbers?

CVS gives each committed changed file a version number (e.g. 1.2, 1.3 etc.). Git has done away with version numbers for files and instead, each commit is given a unique id (using the SHA1 hashing algorithm if you were interested!) instead of based on what has been changed in the commit.

This means that instead of changes being individually registered at the file level, the changes in Git are always registered together. So (for example) if three files are changed and committed in Git then these three files show up in the history together as one change.

The full view

When you view the change history in Git, you are shown a list of the commits in date order (most recent first). As stated above, each commit can be a group of changes and so the history, therefore, is the entire history of the project shown in one clear view. The alternative (in CVS) is to show history individual file changes which don’t always give the full picture of what has changed. Git does supply filtering options so you can see individual file changes if required but normally it is most useful to see everything in order. The history view also shows when merges between different branches (remote and local) by using lines which makes the view look a bit like a London Underground map!

Here is an example of a Git history view for a project (using the eGit plugin from Eclipse) here.

The history view shows all the commits that have happened to the project. HEAD refers to the version of the project that you are currently working on (in the aptly named working directory). Green boxes show local branches (master being the main local branch) and grey boxes show remote branches (more on branches in a second). Yellow boxes are labels. To make the history clearer you can assign any label you require to a commit.

In this example, you can also easily see where different branches have split from each other and been merged together.

Clicking on a commit shows the user what source files have been changed in this individual commit and further clicks can then show the exact changes made by comparing source with earlier versions.

All this information makes the history view in Git extremely powerful with all the information about every change available from the one place.

Easy branching

Branching refers to making a copy of the current project and then working in this copied version so that any changes done do not affect any other users (until the changes are merged back in). This is really useful when multiple programmers are working on a project and want to make changes to the source without breaking each other’s versions of the source code.

Branching is available In CVS but is quite a cumbersome process as the entire project is copied to a new location on the remote server.

Git handles branching somewhat differently and a lot more efficiently which makes the whole branching process a lot more usable.

Firstly, when switching to a new Branch, Git only replaces source files that have actually changed. When you first create a branch this is nothing and so the creation of a new branch is instantaneous in Git. When switching between branches, if you have changed three files, then only these three files are updated in the working directory so this process is extremely fast compared to CVS where switching branch requires all the source code of the project to be rechecked out which can take an eternity.

There are two types of Branches in Git: local branches and remote-tracking branches.

Local branches are just stored in your local repository and so can be made cheaply and deleted when finished with. Remote tracking branches are branches that are created on a remote server and then pulled into your local repository to update changes. Remote tracking branches should be used when multiple programmers need to work on a change and share the code independently of any master branch. For changes made by a single user that a local branch is sufficient.

Because created branches in Git is a quick and easy process, the recommended way of working is that every new change requirement should be assigned its own local branch and then merged back into the main branch when the changes are done. This branch will then show up in the history and can be labelled with a change number so it is immediately clear what change the branch relates to.

It is also possible to create a branch from any commit in the history. This will create a snapshot of the project at the point where this commit was made. This is a really powerful tool which means you can store the point where a live delivery was made and then revert back to the previous version instantly if any problems are found.

GIT Terminology

Git comes with its own new terminology which can sound a bit strange until you get used to the terms. Here is my attempt to explain some of the most common terms.

Fast Forward

This is when a merge is made between two branches (for example merging a local branch back into the master branch) and there are no conflicting changes between the version of the source code i.e. you have made a change to a source file in a branch and no changes have been made to the same file in the branch you are merging into. This is the ideal situation which results in your changes been added to the end of the history with a straight line on the Underground Map.

Cherry Pick

This is a really useful feature of Git that has no real equivalent in CVS. This allows you to take just the changes from one commit in a branch (remember this can be multiple files) and add just these changes and nothing else to another branch.

Pull

The act of taking code from a remote repository and adding it to your local repository.

Push

The reverse of pull. This pushes your local changes to a remote repository.

Staging

A commit in Git will commit all current changed files into a single commit. This may not be what you want as you may have changes from two different tasks in your working directory that you wish to commit as two separate changes. Git supplies a staging area where you can add just some of the changes you have made and then commit just these changes to the repository. So this is an extra optional part of the commit process.

Stashing

Another really useful feature of Git. If you have made some changes which you are not ready to commit yet and you need to move to another branch then this feature allows you take all the current changes and stash them in a local Git folder with a useful label. These changes are then not lost when switching to the new branch. When switching back again you can then un-stash the changes to bring them back into your working directory.

Rebase

This is an alternative to merging changes in Git. If you have made one or more commits in a branch and another user has made one or more commits in another branch then when merging this changes, the commits are shown in date order which results in the London underground map as seen above. This can get messy if lots of changes have been committed with lots of branches. Rebasing tidies up all this mess. When you rebase your changes onto another branch, Git rewinds all your changes so that you end up with the common ancestor of the changes in both branches. Then the changes in the other branch are all added first and finally, your changes are added in afterwards. This means that all your commits are now seen at the top of the history and you get a nice clean history flow.

Note: rebasing is not recommended when using remote-tracking branches as it rewrites the dates that commits happen which can end up with different users having out of sync history with each other. However, when just using local branches, this is a nice clean option to use to keep the history tidy.

In summary

So hopefully I have convinced you that Git is a great option to use for version control with its many benefits over CVS. If not then try it. I am sure you soon see all the benefits that Git has to offer with its multitude of features (of which I have only scratched at the surface).

Posted by Paul on 21st September 2015.