Do Your Unit Tests Know *Too* Much?

A common pitfall of extensive unit testing reported by many teams is that, as the number of tests builds up, changing the implementation under test forces them to rewrite many, many tests. In this scenario, the test code becomes a barrier to change instead of its enabler.

Having witnessed this quite a few times first-hand, I’ve got a few observations I want to share.

First of all, it’s actually usually a problem caused by unmanaged dependencies in our implementation code that causes changes to ripple out to large numbers of tests.

Imagine we wrote a unit test for every public method or function in our implementation, and we decide to change the way one of them works. If that breaks a hundred of our unit tests, my first thought might be “Why is this method/function referenced in a hundred tests?” Did we write a hundred distinct tests for that one thing, or is it used in the set-up of a hundred tests? That would imply that there’s no real separation of concerns in that part of the implementation. A lack of modularity creates all kinds of problems that show up in test code – usually in the set-ups.

The second thing I wanted to mention is duplication in test code. There’s a dangerous idea that’s been gaining in popularity in recent years that we shouldn’t refactor any duplication out of our tests. The thinking is that it can make our tests harder to understand, and there’s some merit to this when it’s done with little thought.

But there are ways to compose tests, reuse set-ups, assertions and whole tests that clearly communicate what’s going on (many well described in the xUnit Test Patterns book). Inexperienced programmers often struggle with code that’s composed out of small, simple parts, and it almost always comes down to poor naming.

Composition – like separation of concerns – needs to be a black box affair. If you have to look inside the box to understand what’s going on, then you have a problem. I’m as guilty of sloppy naming as anyone, and that’s something I’m working on to improve at.

There’s one mechanism in particular for removing duplication from test code that I’ve been championing for 20 years – parameterised tests. When we have multiple examples of the same rule or behaviour covered by multiple tests, it’s a quick win to consolidate those examples into a single data-driven test that exercises all of those cases. This can help us in several ways:

  • Removes duplication
  • Offers an opportunity to document the rule instead of the examples (e.g., fourthFibonacciNumberIsTwo(), sixthFibonacciNumberIsFive() can become fibonacciNumberIsSumOfPreviousTwo() )
  • It opens a door to much more exhaustive testing with surprisingly little extra code

Maybe those 100 tests could be a handful of parameterised tests?

The fourth thing I wanted to talk about is over-reliance on mock objects. Mocks can be a great tool for achieving cohesive, loosely-coupled modules in a Tell, Don’t Ask style of design – I’m under the impression that’s why they were originally invented. But as they give with one hand, they can take away with the other. The irony with mocks is that, while they can lead us to better encapsulation, they do so by exposing the internal interactions of our modules. A little mocking can be powerful design fuel, but pour too much of it on your tests and you’ll burn the house down.

And the final thing I wanted to highlight is the granularity of our tests. Do we write a test for every method of every class? Do we have a corresponding test fixture for every module in the implementation? My experience has been that it’s neither necessary nor desirable to have test code that sticks to your internal design like cheese melted into a radio.

At the other extreme, many teams have written tests that do everything from the outside – e.g., from a public API, or at the controller or service level. I call these “keyhole tests”, because when I’ve worked with them it can feel a little like keyhole surgery. My experience with this style of testing is that, for sure, it can decouple your test code from internal complexity, but at a sometimes heavy price when tests fail and we end up in the debugger trying to figure out where in the complex internal – and non-visible – call stack things went wrong. It’s like when the engineer from the gas company detects a leak somewhere in your house by checking for a drop in pressure at the meter outside. Pinpointing the source of the leak may involve ripping up the floors…

The truth, for me, lies somewhere in between melted cheese and keyhole surgery. What I strive for within a body of code is – how can I put this? – internal APIs. These are interfaces within the implementation that encapsulate the inner complexity of a particular behaviour (e.g, the cluster of classes used to calculate mortgage repayments). They decouple that little cluster from the rest of the implementation, just as their dependencies are decoupled from them in the same S.O.L.I.D. way. And their interfaces tend to be stable, because they’re not about the details. Tests written around those interfaces are less likely to need to change, but also more targeted to a part of the design instead of the whole call stack. So when they fail, it’s easier to pinpoint where things went wrong. (Going back to the gas leak example, imagine having multiple test points throughout the house, so we can at least determine what room the leak is in.)

Even for a monolith, I aim to think of good internal architecture as a network of microservices, each with its own tests. By far the biggest cause of brittle tests that I’ve seen is that your code quite probably isn’t like that. It’s a Big Ball of Mud. That can be exacerbated by leaving all the duplication in the test code, and/or by over-reliance on mock objects, plus a tendency to try to write tests for every method or function of every class.

You want tests that run fast and pinpoint failures, but that also leave enough wiggle room to easily refactor what’s happening behind those internal APIs.

Just 3 1-day Client Site Workshops Left

A quick plug for the remaining 1-day client site workshops I’m running to celebrate 10 years of Codemanship.

These code craft training workshops are for groups of up to 20 developers, and can be delivered anywhere in the UK for the never-to-be-repeated price of £1,995. That’s as little as £99.75 per person.

Dates remaining are Thurs Aug 15th, Tues Aug 27th and Thurs Aug 29th. You can choose from an action-packed TDD workshop, refactoring workshop or unit testing workshop. All hands-on and jammed full of practical and useful ideas.

Booking’s easy: whip out that company credit card and visit our Eventbrite page.

Codemanship Exercise “Play-Through” – CD Warehouse Part #4

In yesterday’s post, I implemented the feature to allow record labels to send batches of CDs in to the warehouse. Our design now is starting to grow to a point where compromises have to be made in the design – meaning that, at times, we have to choose between, say, making sure classes have a single responsibility and making sure classes fully encapsulate their data.

It’s very important to make our code as clean as we can, but it’s also important to remember that perfection on anything more complex than the most trivial problems isn’t possible. We might aim for perfection, but we have to accept we’ll never reach it.

This is a nuanced message. Many developers see it as justification for not aiming for perfection – indeed, for not even trying to keep their code clean in some cases. Other developers, myself included, hear “Aim for the stars and you’ll reach the Moon”.

Perfectionism has been given a bad rap in software development. Managers in particular often have an irrational fear of developers “taking quality too far”. Anyone who’s dedicated a few years to looking at people’s code will know that nobody out there is producing software that’s too good.

Keeping code clean is hard. And the more the code grows and ages, the harder it gets: like trying to stay in shape in your forties is harder than in your thirties. Developers usually greatly underestimate how much attention and effort needs to go into keeping code easily maintainable, and teams typically under-invest in it.

Having said all that, it is possible to over-invest: to devote too much thought and effort to studying and refactoring our designs. Like in life: we need to invest in our health and fitness if we want to live a long and full existence. But we also need to get on with life – living, loving, reading, traveling, dancing, painting, snowboarding, and whatever else is the thing that gets us up in the morning.

Software’s a bit like that. We need to invest in giving it a long useful life. But we also need to get shit done. Finding that balance continues to be a subject of great interest (and frustration) to me. How much refactoring is too much? A how little refactoring is too little? Ultimately, what matters is the end result – how much value did we deliver, and for how long?

Anyway, enough navel gazing. Let’s pick up where we left off yesterday and get this shit done.

We’re nearly at the end of our list of tests for the CD warehouse.

  • Buy a CD
    • Payment accepted
    • Payment rejected
    • Out of stock
  • Search for a CD (by title and artist)
    • Warehouse has matching title
    • No match found
  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles
  • Review a CD
    • Just a rating
    • Rating and review

Just two more tests to pass. Okay. Now we know what CDs a customer has bought, we can move on to reviewing a CD. There are a couple of happy paths in our test list:

  • Review a CD
    • Just a rating
    • Rating and review

The simplest case is when they just leave a rating. Let’s write a failing test for that.

To pass this test, we simply push the new review on to the CDs reviews list.

Now let’s add text to our review in our next failing test.

This is easy to pass.

So that’s us done, then. Isn’t it? We’ve crossed off the last two tests on our list.

But there may be one or two little wrinkles we need to think about. Firstly, the requirements said that customers could leave reviews for CDs they’ve bought. What if we try to leave a review for a CD we didn’t buy?

We currently don’t record who bought a CD, and our model has no concept of customer. So there’s a couple more tests we need to add to our list. First, when a CD is bought, we need to make a record of it for the customer who bought it.

Passing this test is simples.

We’re on a green light again. Let’s review the code. I’m not happy about CD accessing the customer’s CD collection directly (or the test, for that matter). Let’s extract some methods and move them to Customer to encapsulate this.

Now that we know what CDs a customer has bought from us, we can add an edge case for when a customer tries to leave a review for a CD that’s not in their collection.

And to pass this, we just need to check if the customer has that CD.

Now we’re back on a green light, let’s review this new code. I’m not happy about the message chain I’ve left: review.customer.has(this). Let’s encapsulate that in the Review class. (This refactoring is called Hide Delegate).

Okay. We’re nearly there. One last question for our real customer: what are these ratings for? As it stands, a CD can have a list of reviews with ratings. But do we expect our users to go through the list adding them up to calculate an average rating? I think the software should do that. Let’s add one more failing test.

To pass this test, we just sum the ratings and divide by their number.

Okay. Back on a green light. Let’s do a mini code review. Well, I’m not convinced my code for calculating the average rating’s entirely obvious. Let’s try extracting a function for calculating the sum.

Next on our checklist is duplication. There is some duplicated set-up in the tests, but only a small amount, and we’re on the edge of the Rule of Three here, in that I could invest time in extracting set-up methods, but would that investment pay off considering that we’re actually done? I’m going to leave it in their, favouring readability.

All the methods and classes in our code are pretty simple – no loops inside loops inside IFs or anything like that.

But I do now think – yet again – that CD has more than one job now. Let’s extract a class for handling reviews.

In my design now, CD does one job, and simply acts as a facade to the other responsibilities, shielding any clients from knowledge of the little subsystem it’s now composed of.

Now we should check through the methods in our classes for any examples of Feature Envy we may have added. Nothing to report there. (Remember, there are examples of Feature Envy that we knowingly introduced as side-effects of splitting apart cohesive responsibilities so that classes do just one job.)

Next, let’s check that the dependencies in a classes are swappable by dependency injection. The quickest way to do this in JavaScript is to look for imports. Aside from a factory class, none of the other classes in our model have direct imports. All of their dependencies are given to them from the outside. And we can see that all the imports are at the top of call stack, in the tests. e.g.

Good modular designs wear their dependencies on the outside.

Finally, let’s check that non of our modules are referencing things they don’t use. In JavaScript, this again means checking the imports.

In this test module, WebStorm tells me that OutOfStockError is unused. Let’s organise our imports and get rid of it.

And that, folks, is my play-through of the CD warehouse exercise from the Codemanship TDD course. No doubt, you’ll have noticed some things while we did it. For example, in some places I chose a defensive style of programming, raising errors if pre-conditions for buying a CD and for leaving a review weren’t met. In other places, I assumed that the client code (e.g., the UI) wouldn’t invoke methods on the model if the precondition was broken (e.g., asking for the average of zero CD ratings).

This was based on edge cases that were either discussed with the customer or not discussed. If we agreed that only people who’ve bought a CD can review it, there’s multiple ways we can enforce that in the code. The way I prefer is to not give the end user that choice if it’s not valid. This can be done by good UI design, for example. Maybe the user is only presented with a “Review” button for CDs that are in their collection.

Anyhoo, that’s a discussion for another time.


If you’d like to see the complete JS code for my CD warehouse, it’s here.


Codemanship Exercise “Play-through” – CD Warehouse Part #3

In yesterday’s post, I continued with the CD warehouse exercise from the Codemanship TDD course, adding a feature to search the catalogue by artist and title.

In this post, I’ll be working on code to receive batches of CDs into the warehouse from record labels.

There are three examples for this use case in my test list:

  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles

In the first example, we’re receiving copies of a CD that’s already in our catalogue, so we would expect to add those copies to the existing stock. Let’s write a failing test for that.

To pass this test, we just need to search the catalogue for a matching CD and then add the copies to its stock.

Now our test is passing, it’s time for a mini code review. Catalogue now breaks our principle that classes should only do one job. I can easily imagine wanting to change how we receive CDs independently of how we search the catalogue, so this class needs to be broken up.

I’ve extracted a containing class that uses the catalogue, and updated the test to change the target of the receive() method call.

There’s another thing that’s bugging me about the receive() method – it accesses CD’s data directly to add copies to the stock. Although that isn’t Feature Envy, strictly speaking, it most certainly isn’t encapsulation. Let’s fix that.

With that fixed, we can move on to the next test – receiving copies of a CD that isn’t in the catalogue.

To pass this test, we need to check if the CD’s not in the catalogue (i.e., search() returns null), create a new CD if it isn’t and add it to the catalogue before adding the received copies to its stock.

We’re on a green light again, which means it’s time for another mini code review. The first thing that jumps out at me is the distinctly unencapsulated way Warehouse adds the new CD to the catalogue. Let’s fix that.

The next thing that catches my eye is the hardwired dependencies on CD and Description. Now, objects have to be created somewhere, so some code has to use new. But if our goal is to have objects use each other polymorphically, then I have a simple rule: objects that create instances of other classes shouldn’t use them, and objects that use instance of other classes shouldn’t create them. You can create an instance or use an instance, but not both.

It’s also good practice to maintain object creation in one place (D.R.Y.), and by itself so changing – say – the constructor of CD doesn’t impact any other logic.

In fact, we already have a candidate for where that might be: the add() method I extracted and moved to Catalogue. We could move object creation to there and return the new CD to Warehouse.

This simplifies Warehouse and removes its direct dependencies on the implementations of CD and Description. But it just pushed our problem down the call stack to Catalogue, which now has those direct dependencies.

This breaks my swappability rule: Catalogue creates an instance of CD, and it uses CD (by calling cd.matches()). Let’s extract the creation into its own factory method.

And then let’s move that factory method into its own class that we can inject into Catalogue so it doesn’t need to have those hardwired implementation dependencies.

(And, yes, I’m not happy with a class name ending in “Factory”, either. Suggestions for a better name on a postcard, please.)

Some of you may be thinking “Doesn’t Catalogue have two jobs now?” Arguably, yes: searching the contents for CDs and adding CDs to the contents. But both jobs revolve around the same data. If we split these responsibilities apart, we’d have to unencapsulate the contents. Eliminating one code smell will introduce another. It’s very much in the nature of software that we’ll have to make design compromises – we can’t tick all of the boxes all of the time. I’m going to leave this as it is fr now, and we’ll see how the design evolves.

One thing that has occurred to me at this point is: what happened to the CD’s price – where is that set? There’s an extra test here we need to add.

To pass this test, we add price to our factory method.

And pass that data through from Warehouse.

Back on a green light, time for a mini code review. Not much has changed in our source code. Two things strike me; first, we have a Data Clump emerging, with the parameters title, artist and price now appearing in several method signatures. Also, three parameters is right on the edge of what I’d consider to be a Long Parameter list. Making that last test pass illustrated why this can be a problem; I found myself adding the same parameter multiple time to multiple methods. What happens the next time we have to add another piece of data to that Data Clump? Let’s simplify these signatures by preserving the whole object.

While we’re in refactoring mode, let’s simplify the test.

(Yes, I jumped the gun on the Rule of Three – but it was such obvious duplication, with no additional readability, that it was bugging me.)

We have one more test on our list for this feature. How does the warehouse receive batches of multiple CDs?

To pass this test, we can simply repeat the logic of receiving a single CD in a forEach loop.

Time for one more mini code review before we take a break. It’s important to remember that on any non-trivial design, a perfect score is rarely possible. I’ve made compromises, based on my own judgement and 37 years of programming hard knocks, that I’m happy to live with. Importantly, I thought about them, and I know why I made those compromises (e.g., introducing Feature Envy because I wanted to split up a cohesive class that was doing two distinct jobs – like Catalogue and Warehouse, or the reverse decision to keep searching and adding in Catalogue to avoid un-encapsulating its contents).

As the code currently stands, I’m good to go. No doubt if I leave the design in to soak overnight, it may taste different in the morning.

Let’s cross off the test cases we passed today from our list, and take the night off.

  • Buy a CD
    • Payment accepted
    • Payment rejected
    • Out of stock
  • Search for a CD (by title and artist)
    • Warehouse has matching title
    • No match found
  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles
  • Review a CD
    • Just a rating
    • Rating and review

Coming next: Reviewing a CD, and a final code review.

Codemanship Exercise “Play-through” – CD Warehouse Part #2

In my previous post, I started tackling the CD warehouse exercise from the Codemanship TDD course in JavaScript.

The focus of the exercise is on applying basic software design principles while test-driving a design. I encourage developers on the course to stop at every passing test – every “green light” – and do a mini code review where they ask 7 questions:

  • Is the code easy to understand?
  • Is there duplication (I encourage them to apply the Rule of Three for this exercise)?
  • Is there unnecessary complexity?
  • Do methods/functions and classes/modules do more than one job?
  • Are classes/modules encapsulated?
  • Are dependencies between classes/modules easily swappable?
  • Do classes/modules depend on things they don’t use?

At each green light, we inspect the code we’ve added or changed for each of the principles, and if the code is found wanting in any of these aspects, we refactor before moving on.

I started my take on the exercise by identifying use cases in the problem description (looking for user actions like “Buy a CD” and “Review a CD”), then making a list of test cases for each use case.

I began working through this list one test case at a time, crossing off each one once I’ve:

a. Passed the test, and

b. Refactored if necessary

I often find making a list of tests before I start coding very helpful. It forces me to analyse the problem a bit, and also gives me a rough roadmap for tackling a problem in a test-driven way. (This is the Test List pattern described in Kent Beck’s book Test-Driven Development by Example.)

This is where we got to yesterday.

  • Buy a CD
    • Payment accepted
    • Payment rejected
    • Out of stock
  • Search for a CD (by title and artist)
    • Warehouse has matching title
    • No match found
  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles
  • Review a CD
    • Just a rating
    • Rating and review

The next test on our list is searching for a CD by artist and title where the warehouse contains a matching CD. Let’s write a failing test.

I’ve set up my catalogue to contain just one CD – the one we’re searching for – which is the simplest test case I could think of.

In yesterday’s tests, I found the implementations to be quite trivial, so I didn’t triangulate them through a series of examples, I just wrote the Obvious Implementation (another of the patterns described in Kent Beck’s book). My gut is telling me that for this behaviour, there’s a bit more to it, so I’ll pass this test by doing something very simple.

Now the test is passing, it’s time to do a mini code review. I go through my checklist and it looks to me like we’re good to move on to the next example. Remember that one of the goals of TDD is to get feedback as frequently as possible, making fewer design decisions in each feedback cycle. If you feel that you’re making too many design decisions before seeing the test pass, maybe take smaller steps. This is a skill that requires a lot of practice. Even after two decades of doing TDD, I still regularly find myself backing out of a test that’s taking too long (and frequent commits are essential for that ability) and trying to break the problem down a bit more.

To pass both of these tests, I’ll need to do something a bit more sophisticated in the implementation.

This is fine for now, and both tests pass. Time for another mini code review.

There’s obvious duplication in the tests, but as we’re strictly applying the Rule of Three, we won’t do anything about that just now. But what does jump out at me is the Feature Envy that the search() method has for the fields of CD. Let’s extract that code into its own method.

And then let’s move that new method to where it clearly belongs.

One point that I think a lot of developers miss is that refactorings can have side effects – that is, eliminating one code smell (like Feature Envy) can introduce new code smells. Moving the matches() method – which has nothing to do with buying a CD – to the CD class now means that CD has more than one distinct responsibility.

I can imagine wanting to change or vary how we match CDs (e.g., partial instead of exact matches) independently of how we buy them. Let’s split this class up to separate the responsibilities.

Now we have to look at the nature of the dependency between CD and Description. It’s not swappable by dependency injection, so CD depends on that specific implementation – it’s hardwired. Let’s introduce it as a constructor parameter.

Okay. All the tests are passing and my code so far is clean, so I can move on to the next failing test, which is for when there’s no matching CD in the catalogue.

When I run this test, it surprisingly passes. I would have expected an array index out of bounds error, coming as I am from stricter languages, but JavaScript seems just fine referencing the first item in an empty list. The result is null. A discussion about the merits or otherwise of this language feature is for another day. For practical purposes, it seems to work, and this test – although not failing – is still a valid and useful test – what does it do when no matching CDs are found? So I’ll keep it as part of my suite.

That crosses off two more test cases from my list, and we’ll park it there for today.

  • Buy a CD
    • Payment accepted
    • Payment rejected
    • Out of stock
  • Search for a CD (by title and artist)
    • Warehouse has matching title
    • No match found
  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles
  • Review a CD
    • Just a rating
    • Rating and review

One final thought; folk often observe when they watch me doing TDD that it can sometimes feel like more refactoring than writing and passing tests. This is normal and healthy, just as long as we’re making good progress. I’ve found that developers usually underestimate how much refactoring’s required to keep code clean. Don’t skimp!

Coming next, receiving batches of CDs…





Codemanship Exercise “Play-through” – CD Warehouse Part #1

It’s an extremely hot day in London, and I’m staying indoors until the temperature falls below 30 degrees C. Too hot for chores or putting up more of my flatpack backlog, I thought it might be fun to do a “play-through” blog post about one of the practical exercises from the Codemanship TDD workshop.

The exercises comes just after we discuss basic software design principles on the first day of the course. At this point, we’ve covered Red-Green-refactor, and dwelled on the refactoring discipline in previous exercises. Now we’re scaling things up a little from FizzBuzz-scale problems to a more business-like set of requirements.

From the book (page 101):

Test-drive some code that manages the stock and orders of a CD
warehouse. Customers can buy CDs, searching on the title and the
artist. Record labels send batches of CDs to the warehouse. Keep a
stock count of how many copies of each title are in the warehouse.
Customers can only order titles that are in stock. Use dependency
injection to fake credit card payment processing, so we can get on
with our CD warehouse design without worrying about how that
will be done.
Customers can leave reviews for CDs they’ve bought through the
warehouse, which gives each title an integer rating from 1- 10 and
the text of their review if they want to say more.

I almost always start by listing use cases, and enumerating key scenarios for each one. (This is essentially the Test List pattern from Kent Beck’s Test-Driven Development by Example book.) Buried in this text is a handful of use cases we need to satisfy.

  • Buy a CD
    • Payment accepted
    • Payment rejected
    • Out of stock
  • Search for a CD (by title and artist)
    • Warehouse has matching title
    • No match found
  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles
  • Review a CD
    • Just a rating
    • Rating and review

In this exercise, we’re going to work are way through this list of test cases, doing the simplest thing possible to pass each test, and flesh out a design that not only passes all of these tests, but also satisfies some basic design principles:

  1. The code must be easy to understand
  2. The code must be low in duplication (in this exercise, we’ll apply the Rule of Three – if we see three examples of duplicated code, we’ll refactor it to make it D.R.Y.)
  3. The code must be simple (long methods, deep-nested if’s and loops, etc, will be refactored to make them simpler)
  4. The modules should do one job only
  5. The modules should Tell, Don’t Ask (I instruct students to look for a code smell called Feature Envy)
  6. Module dependencies should be easily swappable by dependency injection
  7. Modules should only depend on things they actually use (in Java, this means presenting client-specific interfaces, in JavaScript it means not importing things a module isn’t using.)

The exercise is a TDD exercise, so we’ll start by writing a failing test, quickly get that test passing in the simplest way we can think of, and then – and this is the point of this exercise – stop and do a little code review.

Yes, you read that right. Not on a pull request. Not at the end of a sprint. A mini code review on every green light.

We’ll pause, go through all the code we’ve added or changed, and apply our checklist. Any code that doesn’t tick all the boxes should be refactored until it does.

I’m going to tackle this exercise in JavaScript, running on Node.js, because I could do with the practice.

Let’s start with “Buy a CD – payment accepted”. First, I’ll write an empty Mocha test for that scenario.

I’ve interpreted the requirement to mean that when you successfully buy a copy of a CD, the outcome is that the stock count of that CD is reduced by one. (Many pairs interpret managing stock count as a use case, but it really isn’t. A use case is triggered by a user action, like buying a CD. The stock changes as an outcome.)

Let’s fill in the blanks, starting with the assertion:

And then let’s work backwards to the set-up via the action:

Remember that the payment being accepted is part of the set-up for this scenario, so – as per the instructions – we do need to provide that information somehow. Since, in real operation, that data will come from an external source (the payments processor), we use a stub to provide it in the test.

This is our first failing test. Passing it is easy.

This brings us to our first green light, which means it’s time to do a mini code review.

  • Is my code easy to understand so far? Well, I think so. But what do you think?
  • Is there any duplication we need to worry about yet? Not that I can see.
  • Is the code simple? It’s early days, but so far, yes.
  • Does each part do one job? See above. For this single test, there’s only one job to do.
  • Can I see any Feature Envy? Nope.
  • Are the dependencies swappable? There’s only one so far – in the class CD in the payments object – and that’s being passed in from outside. So, tick.
  • Does CD depend on the implementation of StubPayments? Nope. But, right now, I have got all my code in a single file with the test code. That needs fixing, so I’ll split them into their own files, and move the source code into the root folder away from the tests.

Let’s move on to the next test on our list, “Buy a CD – payment rejected”.

This is our next red light. To pass this, we now need to have the CD ask the payment processor to give us a response, as defined by the stub in our test.

The stub returns whatever response the test tells it to. Stubs should never get any smarter than that.

So, we’re on our second green light. Time for a mini code review. One thing that jumps out at me is this line here in CD’s buy() method:

payments.process doesn’t read as a question. We could make the IF statement clearer by introduced a variable:

The rest of the code ticks the other boxes, but we can see there’s some duplication emerging in our test code.

In this exercise, we’re applying the Rule of Three – if we see three examples of duplication, we refactor. So let’s hang fire on this for now.

Time for the next test.

Passing this test is easy.

Another green light means it’s time for another mini code review. The buy() method of CD has two branches, which is right on the edge of my personal tolerance for complexity. I think I can live it with it, but another IF statement and I’d take action.

Now, about that duplicated test set-up code: we have three examples, which means it’s time to do something about it.

First, let’s tackle the most obvious duplication.

We must take great care when removing duplicated set-up code from tests that we don’t end up with a situation where developers have to keep referring to code outside of each test to understand the scenario. In this case, the credit card details aren’t important. (If we added test cases for bogus cards etc, then it would be different.) So I think we can safely factor our the credit card declaration without impacting readability.

With the rest, it’s not so straightforward. We need to know what the initial stock of the CD is. We need to know what response the payments processor will give. If we wrote more and more tests with this set-up, I would argue for factoring out shared set-up code to limit how much of our test code is bound to the implementation. But, as this is the third and final of these tests, I don’t think it’s worth it. So I’m living the test code as it is and moving on to the next use case.

Coming in Part #2 – Searching for CDs…


Code Craft is Seat Belts for Programmers

Every so often we all get a good laugh when some unfortunate new hire or intern at a major tech company accidentally “deletes Google” on their first day. It’s easy to snigger (because, of course, none of us has ever messed up like that).

The fact is, though, that pointing and laughing when tech professionals make mistakes doesn’t stop mistakes getting made. It can also breed a toxic work culture, where people learn to avoid mistakes by not taking risks. Not taking risks is anathema to innovation, where – by definition – we’re trying stuff we’ve never done before. Want to stifle innovation where you work? Pointing and laughing is a great way to get there.

One of the things I like most about code craft is how it can promote a culture of safety to try new things and take risks.

A suite of good, fast-running unit tests, for example, makes it easier to spot our boos-boos sooner, so we can un-boo-boo them quickly and without attracting attention.

Continuous Integration offers a level of un-doability that makes it easier and safer to experiment, safe in the knowledge that if we mess it up, we can get back to the last version that worked with a simple hard reset.

The micro-cycles of refactoring mean we never stray far from the path of working code. Combine that with fast-running tests and frequent commits, and ambitious and improbable re-architecting of – say – legacy code becomes a sequence of mundane, undo-able and safe micro-rewrites.

And I can’t help feeling – when I see some poor sod getting Twitter Heat for screwing up a system in production – that it was the deficiency in their delivery pipeline that allowed it to happen that was really at fault. The organisation messed up.

Software development’s a learning process. Think about when young children – or people of any age – first learn to use a computer. The fear of “breaking it” often discourages them from trying new things, and this hampers their learning process. never underestimate just how much great innovation happens when someone says “I wonder what happens if I do this…” Remove that fear by fostering a culture of “what if…?” shielded by systems that forgive.

Code craft is seat belts for programmers.