Code Craft : Part III – Unit Tests are an Early Warning System for Programmers

Before I was introduced to code craft, my way of checking that the programs I wrote worked was to run them and use them and see if they did what I expected them to do.

Consider this command line program I wrote that does some simple maths:

I can run this program with different inputs to check if the results of the calculations are correct.

C:\Users\User\PycharmProjects\pymaths>python maths.py sqrt 4.0
The square root of 4.0 = 2.0

C:\Users\User\PycharmProjects\pymaths>python maths.py factorial 5
5 factorial = 120

C:\Users\User\PycharmProjects\pymaths>python maths.py floor 4.7
The floor of 4.7 = 4.0

C:\Users\User\PycharmProjects\pymaths>python maths.py ceiling 2.3
The ceiling of 2.3 = 3.0

Testing my code by using the program is fine if I want to check that it works first time around.

These four test cases, though, don’t give me a lot of confidence that the code really works for all the inputs my program has to handle. I’d want to cover more examples, perhaps using a list to remind me what tests I should do.

  • sqrt 0.0 = 0.0
  • sqrt -1.0 -> should raise an exception
  • sqrt 1.0 = 1.0
  • sqrt 4.0 = 2.0
  • sqrt 6.25 = 2.5
  • factorial 0 = 0
  • factorial 1 = 1
  • factorial 5 = 120
  • factorial – 1 -> should raise an exception
  • factorial 0.5 -> should raise an exception
  • floor 0.0 = 0.0
  • floor 4.7 = 4.0
  • floor -4.7 = -5.0
  • ceiling 0.0 = 0.0
  • ceiling 2.3 = 3.0
  • ceiling -2.3 = -2.0

Now, that’s a lot of test cases (and we haven’t even thought about how we handle incorrect command line arguments yet).

To run the program and try all of these test cases once seems like quite a bit of work, but if it’s got to be done, it’s got to be done. (The alternative is not doing all these tests, and then how do we know our program really works?)

But what if I need to change my maths code? (And if we know one thing about code, it’s that it changes). Then I’ll need to perform these tests again. And if I change the code again, I have to do the tests again. And again. And again. And again.

If we don’t re-test the code after we’ve changed it, we risk not knowing if we’ve broken it. I don’t know about you, but I’m not happy with the idea of my end users being lumbered with broken software. So I re-test the software every time it changes.

It took me about 5-6 minutes to perform all of these tests using the command line. That’s 5-6 minutes of testing every time I need to change my code. And maybe 5-6 minutes of testing doesn’t sound like a lot, but this program only has about 40 lines of code. Extrapolate that testing time to 1,000 lines of code. Or 10,000 lines. Or a million.

Testing programs by using them – what we call manual testing – simply doesn’t scale up to large amounts of code. The time it takes to re-test our program when we’ve changed the code becomes an obstacle to making those changes safely. If it takes hours or days or even weeks to re-test it, then change will be slow and difficult. It may even be impractical to change it at all, and far too many programs lots of people rely on end up in this situation. The time taken to test our code has a profound impact on the cost of making changes.

Studies have shown that the effort required to fix a bug rises dramatically the longer that bug goes undiscovered.

Cost-of-Correcting-Defects-Boehm-and-Basili

If it takes a week to re-test our program, then the cost of fixing the bugs that testing discovers will be much higher than if we’d been alerted a minute after we made that error. The average programmer can introduce a lot of bugs in a week.

Creating good working software depends heavily on our ability to check that the code’s working very frequently – almost continuously, in fact. So we have to be able to perform our tests very, very quickly. And that’s not possible when we perform them manually.

So, how could we speed up testing to make changes quicker and easier? Well we’re computer programmers – so how about we write a computer program to test our code?

A few things to note about my test code:

  • Each test case has a unique name to make it easy to identify which test failed
  • There are two helper functions that ask if the actual result matches the expected result – either an expected output, or an expected exception that should have been raised
  • The script counts the total number of tests run and the number of tests passed, so it can summarise the result of running this suite of tests
  • My test code isn’t testing the whole program from the outside, like I was doing at the command line. Some code just tests the sqrt function, some just tests the factorial function, and so on. Tests that only test parts of a program are often referred to as unit tests. A ‘unit’ could be an individual function or a method of a class, or a whole class or module, or a group of these things working together to do a specific job. Opinions vary, but what we mostly all agree is that a unit is a discrete part of a program, and not the whole program.

The advantages of testing units instead of whole programs are important:

  1. When a test fails, it’s much easier to pinpoint the source of the problem
  2. Less code is executed in order to check a specific piece of logic works, so unit tests tend to run much faster
  3. By invoking functions directly, there’s usually less code involved in writing a unit test

When I run my test script, if all the tests pass, I get this output:

Running math tests…
Tests run: 16
Passed: 16 , Failed: 0

Phew! All my tests are passing.

This suite of tests ran in a fraction of a second, meaning I can run them as many times as I like, as often as I want. I can change a single line of code, then run my tests to check that change didn’t break anything. If I make a boo-boo, there’s a high chance my tests will alert me straight away. We say that these automated tests give me high assurance that – at any point in time – my code is working.

This ability to re-test our code after just a single change can make a huge difference to how we program. If I break the code, very little has little has changed since the code was last working, so it’s much easier to pinpoint what’s gone wrong and much easier to fix it. If I’ve made 100 changes before I re-test the code, it could be a lot of work to figure out which change(s) caused the problem. I have found, after 25 years of writing unit tests, that I need to spend very little time in my debugger.

If any tests fail, I get this kind of output:

Running math tests…
sqrt of 0.0 failed – expected 1.0 , actual 0
sqrt of -1.0 failed – expected Exception to be raised
Tests run: 16
Passed: 14 , Failed: 2

It helpfully tells me which tests failed, and what the expected and actual results were, to make it easier for me to pin down the cause of the problem. Since I only made a small change to the code since the tests last all passed, it’s easy for me to fix.

Notice that I’ve grouped my tests by the function that they’re testing. There’s a bunch of tests for the sqrt function, a bunch for factorial, and more for floor and for ceiling. As my maths program grows, I’ll add many more tests. Keeping them all in one big module will get unmanageable, so it makes sense to split them out into their own modules. That makes them easier to manage, and also allows us to run just the tests for, say, sqrt, or just the tests for factorial – if we only changed code in those parts of the program – if we want to.

Here I’ve split the tests for sqrt into their own test module, which we call a test fixture. It can be run by itself, or can be invoked as part of the main test suite along with the other test fixtures.

The two helper functions I wrote that check and record the result of each test – assert_equals and assert_raises – could be reused in other suites of tests, since they’re quite generic. What I’ve created here could be the beginnings of a reusable library for writing test scripts in Python.

As my maths program grows, and I add more and more tests, there’ll likely be more helper functions I’ll find useful. But, in computing, before you set out to write a reusable library to help you with something, it’s usually a good idea to check if someone’s already written one.

For a problem as common as automating program tests, you won’t be surprised that such libraries already exist. Python has several, but the most commonly used test automation library actually comes as part of Python’s standard modules – unittest (formerly known as PyUnit.)

Here’s the sqrt tests I write translated into unittest tests.

There’s a lot to unittest, but this test fixture uses just some of its basic features.

To create a test fixture, you just need to declare a class that inherits from unittest.TestCase. Individual tests are methods of your fixture class that start with test_ – so that unittest knows it’s a test – and they accept no parameters, and return no data.

The TestCase class defines many useful helper methods for making assertions about the result of a test. Here, I’ve used assertEqual and assertRaisesRegex.

assertEqual takes an expected result value as the first parameter, followed by the actual result, and compares the two. If they don’t match, the test fails.

assertRaisesRegex is like my own assert_raises, except that it also matches the error message the exception is raised with using regular expressions – so we can check that it was the exact exception we expected.

I don’t need to write a test suite that directly invokes this test fixture’s tests. The unittest test runner will examine the test code, find the test fixtures and test methods, and build the suite out of all the tests it finds. This saves me a fair amount of coding.

I can run the sqrt tests from the command line:

C:\Users\User\PycharmProjects\pymaths\test>python -m unittest sqrt_test.py
…..
———————————————————————-
Ran 5 tests in 0.002s

OK

If any tests fail, unittest will tell me which tests failed and provide helpful diagnostic information.

C:\Users\User\PycharmProjects\pymaths\test>python -m unittest sqrt_test.py
F…F
======================================================================
FAIL: test_sqrt_0 (sqrt_test.SqrtTest)
———————————————————————-
Traceback (most recent call last):
File “C:\Users\User\PycharmProjects\pymaths\test\sqrt_test.py”, line 8, in test_sqrt_0
self.assertEqual(1.0, sqrt(0.0))
AssertionError: 1.0 != 0

======================================================================
FAIL: test_sqrt_minus1 (sqrt_test.SqrtTest)
———————————————————————-
Traceback (most recent call last):
File “C:\Users\User\PycharmProjects\pymaths\test\sqrt_test.py”, line 13, in test_sqrt_minus1
lambda: sqrt(1))
AssertionError: Exception not raised by <lambda>

———————————————————————-
Ran 5 tests in 0.002s

FAILED (failures=2)

I can run all of the tests in my project folder at the command line using unittest‘s test discovery feature.

C:\Users\User\PycharmProjects\pymaths\test>python -m unittest discover -p “*_test.py”
…………….
———————————————————————-
Ran 16 tests in 0.004s

OK

The test runner finds all tests in files matching ‘*_test.py’ in the current folder and runs them for me. Easy as peas!

You may have noticed that my tests are in a subfolder C:\Users\User\PycharmProjects\pymaths\test, too. It’s a very good idea to keep your test code separate from the code they’re testing, so you can easily see which is which.

Note how each test method has a meaningful name that identifies the test case, just like the test names in my hand-rolled unit tests before.

Note also that each test only asks one question – Is the sqrt of four 2? Is the factorial of five 120? And so on. When a test fails, it can only really be for one reason, which makes debugging much, much easier.

When I’m programming, I put in significant effort to make sure that as much of my code is tested by automated unit tests as possible. And, yes, this means I may well end up writing as much unit test code as solution code – if not more.

A common objection inexperienced programmers have to unit testing is that they have to write twice as much code. Surely this takes twice as long? Surely we could add twice as many features if we didn’t waste time writing unit test code?

Well, here’s the funny thing: as our program grows, we tend to find – if we rely on slow manual testing to catch the bugs we’ve introduced – that the proportion of the time we spend fixing bugs grows too. Teams who do testing the hard way often end up spending most of their time bug fixing.

timespent

Because bugs can cost exponentially more to fix the longer they go undiscovered, we find that the effort we put in up-front to write fast tests that will catch them more than pays for itself later on in time saved.

Sure, if the program you’re writing is only ever going to be 100 lines long, extensive unit tests might be a waste (although I would still write a few, as I’ve found even on relatively simple programs some unit testing has saved me time). But most programs are much larger, and therefore unit tests are a good idea most of the time. You wouldn’t fit a smoke alarm in a tiny Lego house, but in a real house that people live in, you might be very grateful of one.

One final thought about unit tests. Consider this code that calculates rental prices of movies based on their IMDb ratings:

This code fetches information about a video, using its IMDb ID, from a web service. Using that information, it decides whether to charge a premium of £1 because the video has a high IMDb rating or knock off £1 because the video has a low IMDb rating.

If we wrote a unittest test for this, when it runs our code will connect to an external web service to fetch information about the video we’re pricing. Connecting to web services is slow in comparison to things that happen entirely in memory. But we want our unit tests to run as fast as possible.

How could we test that prices are calculated correctly without connecting to this external service?

Our pricing logic requires movie information that comes from someone else’s software. Could we fake that somehow, so a rating is available for us to test with?

What if, instead of the price method connecting directly to the web service itself, we were to provide it with an object that fetches video information for it? i.e., what if we made fetching video information somebody else’s problem? The object is passed in as a parameter of Pricer‘s constructor like this.

Because videoInfo is passed as a constructor parameter, Pricer only knows what that object looks like from the outside. It knows it has to have a fetch_video_info method that accepts an IMDb ID as a parameter and returns the title and IMDb rating of that video.

Thanks to Python’s duck typing – if it walks like a duck and quacks like a duck etc – any object that has a matching method should work inside Pricer, including one that doesn’t actually connect to the web service.

We could write a class that provides whatever title and IMDb rating we tell it to, and use that in a unit test for Pricer.

When I run this test, it checks the pricing logic just as thoroughly as if we’d fetched the video information from the real web service. How video titles and ratings are obtained has nothing to do with how rental prices are calculated. We achieved flexibility in our design by cleanly separating those concerns. (Separation of Concerns is fancy software architecture-speak for “make it someone else’s problem”.)

The object that fetches video information is passed in to the Pricer. We call this dependency injection. Pricer depends on VideoInfo, but because the dependency is passed in as a parameter from the outside, the calling code can decide which implementation to use – the stub, or the real thing.

A stub is a kind of what we call a test double. It’s an object that looks like the real thing from the outside, but has a different implementation inside. The job of a stub is to provide test data that would normally come from some external source – like video titles and IMDb ratings.

Test doubles require us to introduce flexibility into our code, so that objects (or functions) can use each other without knowing exactly which implementation they’re using – just as long as they look the same as the real thing from the outside. This not only helps us to write fast-running unit tests, but is good design generally. What if we need to fetch video information from a different web service? Because we provide video information by dependency injection, we can easily swap in a different web service with no need to rewrite Pricer.

This is what we really mean by ‘separation of concerns’ – we can change one part of the program without having to change any of the other parts. This can make changing code much, much easier.

Let’s look at one final example that involves an external dependency. Consider this code that totals the number of copies of a song sold on a digital download service, then sends that total to a web service that compiles song charts at the end of each day.

How can we unit test that song sales are calculated correctly without connecting to the external web service? Again, the trick here it to separate those two concerns – to make sending sales information to the charts somebody else’s problem.

Before we write a unit test for this, notice how this situation is different to the video pricing example. Here, our charts object doesn’t return any data. So we can’t use a stub in this case.

When we want to swap in a test double for an object that’s going to be used, but doesn’t return any data that we need to worry about, we can choose from two other kinds of test double.

A dummy is an object that looks like the real thing from the outside, but does nothing inside.

In this test, we don’t care if the sales total for the song is sent to the charts. It’s all about calculating that total.

But what if we do care if the total is sent to the charts once it’s been calculated? How could we write a test that will fail if charts.send isn’t invoked?

A mock object is a test double that remembers when its methods are called so we can test that call happened. Using the built-in features of the unittest.mock library, we can create a mock charts object and verify that send is invoked with the exact parameter values we want.

In this test, we create an instance of the real Charts class that connects to the web service, but we replace its send method with a MagicMock that records when it’s invoked. We can then assert at the end that when sales_of is executed, charts.send is called with the correct song and sales total.

 

So there you have it. Unit tests – tests that test part of our program, and execute without connecting to any external resources like web services, file systems, databases and so on – are fast-running tests that allow us to test and re-test our program very frequently, ensuring as much as possible that our code’s always working.

As you’ll see in later posts, good, fast-running unit tests are an essential foundation of code craft, enabling many of the techniques we’ll be covering next.

 

 

Author: codemanship

Founder of Codemanship Ltd and code craft coach and trainer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s