Code Craft : Part II – Version Control is Seat Belts for Programmers

When I was starting out as a professional programmer, I took the basic precaution of occasionally backing up my code so that if I took a “wrong turn”, I could get back to something that kind of sort of worked. I used to do this the old-fashioned way of creating a daily folder and copying the code into it.

But, it turns out, a day is a lot of work to lose. When things did go wrong – which happened regularly – I’d only go back to the previous day’s code as a last resort. Usually, I’d try and fix the problem, which took up a lot of time and typically had disappointing results.

Also, my hard drive very quickly filled up with back-ups if I didn’t get into the habit of deleting older copies. Maybe I changed 5 lines of code that day; making an entire copy of 500,000 lines of code every time is pretty wasteful. And if I made back-ups more often, the drive would fill up faster. In the 1990s, disk space was still expensive.

The effect of making infrequent back-ups on the way I worked was quite profound. When you risk losing a day’s work when you try something new, you take less risks. Fear tends to stifle creativity and innovation.

Really, I should have been making back-ups far more frequently – at least every hour or so – and the only way for that to be practical on a PC with a 100MB hard drive is to not back-up all the source code every single time, but only the parts that have changed.

I was several days into attempting to write something that enabled this when a more experienced programmer told me that such tools already existed. (That happens a lot in computing.)

His team were using what he called a “version control system” or VCS – in this case a tool called CVS (Concurrent Versions System). CVS was relatively new at the time (it was first released in 1990), but I later learned that version control systems had been around since the early 1970s.

A code project was copied to a central repository for the team to access, and they could “check in” any changes they made to source code files, and CVS stored the changes as a “delta”, keeping a history of all revisions to every file in the repository. Using the original source files and the deltas, CVS could recreate any version of the code from any point in its history.

I very quickly realised that this was super-useful. Not only could you get back to any version of the code with ease, without filling your hard drive up with copies, but you could also see the entire history of the code and analyse how it has evolved. Think of a version history as being a bit like a computer program’s own personal diary, logging every interesting change that’s been made – potentially going back years. Much can be learned by reading diaries.

I’ve been using version control systems ever since. And over the next 25 years, they have become very widespread. Most professional programmers use version control these days. So it’s curious – and a little alarming – that many schools and universities don’t teach students how to use them (or even tell students they exist).

The most popular VCS in use today is Git. Git is what we call a distributed version control system (DVCS). As well as a central repository of source code files, it also allows programmers to keep their own local repository, into which they can track changes they make on their own computer, before “pushing” those changes to the central repository to share with the other programmers on the team.

A simple workflow for version control with Git might go something like this (using the Git command line program in Bash):

  • Initialise a folder on your computer to be a local Git code repository

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths
$ git init
Initialized empty Git repository in C:/python_projects/maths/.git/

  • In my maths folder, I create a Python script called sqrt.py. 
  • If I want this file to be version-controlled, I need to add it to the Git repository.

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git add sqrt.py

  • sqrt.py is put into a “staging area” that contains all of the file changes (files added, files modified, files deleted) for my first commit to my local Git repository. Let’s commit this with a meaningful message that helps identify what version of the code this is.

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git commit -m “This is my first commit”
[master (root-commit) 3e39188] This is my first commit
1 file changed, 14 insertions(+)
create mode 100644 sqrt.py

  • If I make a change to sqrt.py
  • …and then commit that change…

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git commit -m ‘Changed input to be square rooted’ –all
[master 75b5aef] Changed input to be square rooted
1 file changed, 1 deletion(-)

  • …we add a new version of the source file to our local repository. We can see the version history of our repository using Git’s log command.

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git log

commit f113f51030eab07943b9e8f9493d17a2209544d2

Author: Jason Gorman <jason.gorman@codemanship.com>
Date: Wed Oct 2 08:39:12 2019 +0100

Changed input to be square rooted

commit 3e391889c26574357b35f687413f2eb5d9e4f2c1
Author: Jason Gorman <jason.gorman@codemanship.com>
Date: Wed Oct 2 08:17:51 2019 +0100

This is my first commit

  • If I then make a boo-boo in this code…
  • …I can get back to either of those versions by using Git’s reset command.

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git reset –hard f113f51030eab07943b9e8f9493d17a2209544d2
HEAD is now at f113f51 Changed input to be square rooted

  • And I can go back to any version in the code’s history if I want. I just tell it which version – the long identifier Git assigns to each commit – I want to go back to. Ultimate undo-ability!

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git reset –hard 3e391889c26574357b35f687413f2eb5d9e4f2c1
HEAD is now at 3e39188 This is my first commit

Remember that Git is what we call a distributed version control system (DVCS), so the version history of my sqrt.py file is stored in a local repository on my computer. I can also create a shared remote repository – for example, on github.com – so other programmers can access the files and their histories and contribute to my maths project.

  • First, I create a new repository using my GitHub account. I’ve called it pymaths.

new_github_repo

  • Then I copy the remote repository’s unique URL

github_repo_url

  • I can now add this remote repository for use with my local repository

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git remote add origin https://github.com/jasongorman/pymaths.git

  • Now I can push the commits I made to my local repository to the remote repository, where other programmers can access them.

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git push origin master
Enumerating objects: 3, done.
Counting objects: 100% (3/3), done.
Delta compression using up to 4 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 365 bytes | 365.00 KiB/s, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/jasongorman/pymaths.git
* [new branch] master -> master

Now we can see that our commits are showing in the pymaths GitHub repository. (Bear in mind that I reset the code back to the original commit, so that’s the one showing as current.)

pymaths

When multiple programmers are contributing to a repository, they need a way to get changes other people have made which they can merge into their own working directories.

Let’s say someone else on my team adds a function for calculating factorials.

They push their change to the pymaths repository. To merge their changes into my local copy, I can use the Git pull command.

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git pull origin master
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/jasongorman/pymaths
* branch master -> FETCH_HEAD
3e39188..23578b7 master -> origin/master
Updating 3e39188..23578b7
Fast-forward
sqrt.py | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)

That overwrites my local copy of sqrt.py with the new version from pymaths, which is fine if I haven’t also made pending changes to that file. What if we’ve both changed that file? That can lead to what’s called a merge conflict.

Imagine my team mates adds a function for calculating the ceiling of a number, and pushes that change to pymaths.

And at the same time, I add a function to my local copy for calculating the floor of a number and commit that to my local repository.

When I pull the changes from the remote repository, Git will attempt to merge the two versions of the file automatically, but in this case it fails.

User@DESKTOP-KSHARRN MINGW64 /c/python_projects/maths (master)
$ git pull origin master
remote: Enumerating objects: 5, done.
remote: Counting objects: 100% (5/5), done.
remote: Compressing objects: 100% (1/1), done.
remote: Total 3 (delta 1), reused 3 (delta 1), pack-reused 0
Unpacking objects: 100% (3/3), done.
From https://github.com/jasongorman/pymaths
* branch master -> FETCH_HEAD
87973fb..421547e master -> origin/master
Auto-merging sqrt.py
CONFLICT (content): Merge conflict in sqrt.py
Automatic merge failed; fix conflicts and then commit the result.

To resolve the conflict, I just need to edit the auto-merged file, and then commit and push the finished version to pymaths.

In a sense, version control is like seat belts for programmers. It gives us a level of safety when we’re creating and evolving our programs that means we can invent and try new things with much greater confidence that – whatever happens – there’s a way back if it goes wrong.

Version control systems like Git make it much easier for programmers to collaborate on the same code projects, even if they are on other sides of the world. They have built-in features that help us to manage conflicting changes, and are an essential ingredient in individual and team efforts of all sizes.

Here are some basic good habits for version control that I’ve been successfully applying for 25 years:

  1. Unless it’s something genuinely trivial that you’re going to throw away, always start by putting your code project under version control
  2. Check in your changes frequently – at least every hour (I do it many times an hour, usually). The less often you do it, the more work you might lose.
  3. When working with others, merge their changes frequently into your local copy so you can keep up to date with what’s in the repository, and spot conflicts early when they’re easier to fix.
  4. Use meaningful commit messages to help you and other programmers easily identify what’s changed in that version of the code.
  5. Very importantly, don’t check in code that doesn’t work. If you break the program, and check in your changes, and then your team mates merge those changes into their copies, then you’ve just done the programming equivalent of giving everyone in the team your cold.

But how do we know it works before we commit? Well, we test it. Yes. Every single time before we commit. If it fails our tests, then we don’t commit it.

“Gee. Testing the program every time we want to commit our changes? And you say we should be committing frequently? That sounds like we’ll be spending all our time just testing our code!”

Yup. You’ll be testing and re-testing your program many times an hour. And in the next blog post I’ll be showing you how that can be achieved using fast-running automated unit tests.

 

 

 

 

 

 

 

 

 

 

Author: codemanship

Founder of Codemanship Ltd and code craft coach and trainer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s