Code Craft’s Value Proposition: More Throws Of The Dice

Evolutionary design is a term that’s used often, not just in software development. Evolution is a way of solving complex problems, typically with necessarily complex solutions (solutions that have many interconnected/interacting parts).

But that complexity doesn’t arise in a single step. Evolved designs start very simple, and then become complex over many, many iterations. Importantly, each iteration of the design is tested for it’s “fitness” – does it work in the environment in which it operates? Iterations that don’t work are rejected, iterations that work best are selected, and become the input to the next iteration.

We can think of evolution as being a search algorithm. It searches the space of all possible solutions for the one that is the best fit to the problem(s) the design has to solve.

It’s explained best perhaps in Richard Dawkins’ book The Blind Watchmaker. Dawkins wrote a computer simulation of a natural process of evolution, where 9 “genes” generated what he called “biomorphs”. The program would generate a family of biomorphs – 9 at a time – with a parent biomorph at the centre surrounded by 8 children whose “DNA” differed from the parent by a single gene. Selecting one of the children made it the parent of a new generation of biomorphs, with 8 children of their own.

Biomorphs generated by the evolutionary simulation at

You can find a recreation and more detailed explanation of the simulation here.

The 9 genes of the biomorphs define a universe of 118 billion possible unique designs. The evolutionary process is a walk through that universe, moving just one space in any direction – because just one gene is changing with each generation – with each iteration. From simple beginnings, complex forms can quickly arise.

A brute force search might enumerate all possible solutions, test each one for fitness, and select the best out of that entire universe of designs. With Dawkins’ biomorphs, this would mean testing 118 billion designs to find the best. And the odds of selecting the best design at random are 1:118,000,000,000. There may, of course, be many viable designs in the universe of all possible solutions. But the chances of finding one of them with a single random selection – a guess – are still very small.

For a living organism, that has many orders of magnitude more elements in their genetic code and therefore an effectively infinite solution space to search, brute force simply isn’t viable. And the chances of landing on a viable genetic code in a single step are effectively zero. Evolution solves problems not by brute force or by astronomically improbable chance, but by small, perfectly probable steps.

If we think of the genes as a language, then it’s not a huge leap conceptually to think of a programming language in the same way. A programming language defines the universe of all possible programs that could be written in that language. Again, the chances of landing on a viable working solution to a complex problem in a single step are effectively zero. This is why Big Design Up-Front doesn’t work very well – arguably at all – as a solution search algorithm. There is almost always a need to iterate the design.

Natural evolution has three key components that make it work as a search algorithm:

  • Reproduction – the creation of a new generation that has a virtually identical genetic code
  • Mutation – tiny variances in the genetic code with each new generation that make it different in some way to the parent (e.g., taller, faster, better vision)
  • Selection – a mechanism for selecting the best solutions based on some “fitness” function against which each new generation can be tested

The mutations from one generation to the next are necessarily small. A fitness function describes a fitness landscape that can be projected onto our theoretical solution space of all possible programs written in a language. Programs that differ in small ways are more likely to have very similar fitness than programs that are very different. Make one change to a working solution and, chances are, you’ve still got a working solution. Make 100 changes, and the risk of breaking things is much higher.

Evolutionary design works best when each iteration is almost identical to that last, with only one or two small changes. Teams practicing Continuous Delivery with a One-Feature-Per-Release policy, therefore, tend to arrive at better solutions than teams who schedule many changes in each release.

And within each release, there’s much more scope to test even smaller changes – micro-changes of the kind enacted in, say, refactoring, or in the micro-iterations of Test-Driven Development.

Which brings me neatly to the third component of evolutionary design: selection. In nature, the Big Bad World selects which genetic codes thrive and which are marked out for extinction. In software, we have other mechanisms.

Firstly, there’s our own version of the Big Bad World. This is the operating environment of the solution. A Point Of Sale system is ultimately selected or rejected through real use in real shops. An image manipulation program is selected or rejected by photographers and graphic designers (and computer programmers writing blog posts).

Real-world feedback from real-world use should never be underestimated as a form of testing. It’s the most valuable, most revealing, and most real form of testing.

Evolutionary design works better when we test our software in the real world more frequently. One production release a year is way too little feedback, way too late. One production release a week is far better.

Once we’ve established that the software is fit for purpose through customer testing – ideally in the real world – there are other kinds of testing we can do to help ensure the software stays working as we change it. A test suite can be thought of as a codified set of fitness functions for our solution.

One implication of the evolutionary design process is that, on average, more iterations will produce better solutions. And this means that faster iterations tend to arrive at a working solution sooner. Species with long life cycles – e.g., humans or elephants – evolve much slower than species with short life cycles like fruit flies and bacteria. (Indeed, they evolve so fast that it’s been observed happening in the lab.) This is why health organisations have to guard against new viruses every year, but nobody’s worried about new kinds of shark suddenly emerging.

For this reason, anything in our development process that slows down the iterations impedes our search for a working solution. One key factor in this is how long it takes to build and re-test the software as we make changes to it. Teams whose build + test process takes seconds tend to arrive at better solutions sooner than teams whose builds take hours.

More generally, the faster and more frictionless the delivery pipeline of a development team, the faster they can iterate and the sooner a viable solution evolves. Some teams invest heavily in Continuous Delivery, and get changes from a programmer’s mind into production in minutes. Many teams under-invest, and changes can take weeks or months to reach the real world where the most useful feedback is to be had.

Other factors that create delivery friction include the maintainability of the code itself. Although a system may be complex, it can still be built from simple, single-purpose, modular parts that can be changed much faster and more cheaply than complex spaghetti code.

And while many BDUF teams focus on “getting it right first time”, the reality we observe is that the odds of getting it right first time are vanishingly small, no matter how hard we try. I’ll take more iterations over a more detailed requirements specification any day.

When people exclaim of code craft “What’s the point of building it right if we’re building the wrong thing?”, they fail to grasp the real purpose of the technical practices that underpin Continuous Delivery like unit testing, TDD, refactoring and Continuous Integration. We do these things precisely because we want to increase the chances of building the right thing. The real requirements analysis happens when we observe how users get on with our solutions in the real world, and feed back those lessons into a new iteration. The sooner we get our code out there, the sooner can get that feedback. The faster we can iterate solutions, the sooner a viable solution can evolve. The longer we can sustain the iterations, the more throws of the dice we can give the customer.

That, ultimately, is the promise of good code craft: more throws of the dice.


Refactoring To Closures in Kotlin & IntelliJ

I spent yesterday morning practicing a refactoring in Kotlin I wanted to potentially demonstrate for a workshop, and after half a dozen unsuccessful attempts, I found a way that seems relatively safe. I thought it might be useful to document it here, both for myself for the future and anyone else who might be interested.

My goal here is to encapsulate the data used in this function for calculating quotes for fitted carpets. The solution I’m thinking of is closures.

How do I get from this to closures being injected into quote() safely? Here’s how I did it in IntelliJ.

  1. Use the Function to Scope… refactoring to extract the body of, say, the roomArea() function into an internal function.

2. As a single step, change the return type of roomArea() to a function signature that matches area(), return a reference to ::area instead of the return value from area(), and change quote() to invoke the returned area() function. (Phew!)

3. Rename roomArea() to room() so it makes more sense.

4. In quote(), highlight the expression room(width, length) and use the Extract Parameter refactoring to have that passed into quote() from the tests.

5. Now we’re going to do something similar for carpetPrice(), with one small difference. Next, as with roomArea(), use the Function to Scope refactoring to extract the body of carpetPrice() into an internal function.

6. Then swap the return value with a reference to the ::price function.

7. Now, this time we want the area to be passed in as a parameter to the price() function. Extract Parameter area from price(), change the signature of the returned function and update quote() to pass it in using the area() function. Again, this must be a single step.

8. Change the Signature of carpetPrice() to remove the redundant area parameter.

9. Rename carpetPrice() to carpet() so it makes more sense.

10. Extract Parameter for the expression carpet(pricePerSqrMtr, roundUp) in quote() called price()


If you want to have a crack at this yourself, the source code is at , along with two more examples (an OO/Java version of this, plus another example that breaks all the rules of both Simple Design and modular design in Kotlin.


Do Your Unit Tests Know *Too* Much?

A common pitfall of extensive unit testing reported by many teams is that, as the number of tests builds up, changing the implementation under test forces them to rewrite many, many tests. In this scenario, the test code becomes a barrier to change instead of its enabler.

Having witnessed this quite a few times first-hand, I’ve got a few observations I want to share.

First of all, it’s actually usually a problem caused by unmanaged dependencies in our implementation code that causes changes to ripple out to large numbers of tests.

Imagine we wrote a unit test for every public method or function in our implementation, and we decide to change the way one of them works. If that breaks a hundred of our unit tests, my first thought might be “Why is this method/function referenced in a hundred tests?” Did we write a hundred distinct tests for that one thing, or is it used in the set-up of a hundred tests? That would imply that there’s no real separation of concerns in that part of the implementation. A lack of modularity creates all kinds of problems that show up in test code – usually in the set-ups.

The second thing I wanted to mention is duplication in test code. There’s a dangerous idea that’s been gaining in popularity in recent years that we shouldn’t refactor any duplication out of our tests. The thinking is that it can make our tests harder to understand, and there’s some merit to this when it’s done with little thought.

But there are ways to compose tests, reuse set-ups, assertions and whole tests that clearly communicate what’s going on (many well described in the xUnit Test Patterns book). Inexperienced programmers often struggle with code that’s composed out of small, simple parts, and it almost always comes down to poor naming.

Composition – like separation of concerns – needs to be a black box affair. If you have to look inside the box to understand what’s going on, then you have a problem. I’m as guilty of sloppy naming as anyone, and that’s something I’m working on to improve at.

There’s one mechanism in particular for removing duplication from test code that I’ve been championing for 20 years – parameterised tests. When we have multiple examples of the same rule or behaviour covered by multiple tests, it’s a quick win to consolidate those examples into a single data-driven test that exercises all of those cases. This can help us in several ways:

  • Removes duplication
  • Offers an opportunity to document the rule instead of the examples (e.g., fourthFibonacciNumberIsTwo(), sixthFibonacciNumberIsFive() can become fibonacciNumberIsSumOfPreviousTwo() )
  • It opens a door to much more exhaustive testing with surprisingly little extra code

Maybe those 100 tests could be a handful of parameterised tests?

The fourth thing I wanted to talk about is over-reliance on mock objects. Mocks can be a great tool for achieving cohesive, loosely-coupled modules in a Tell, Don’t Ask style of design – I’m under the impression that’s why they were originally invented. But as they give with one hand, they can take away with the other. The irony with mocks is that, while they can lead us to better encapsulation, they do so by exposing the internal interactions of our modules. A little mocking can be powerful design fuel, but pour too much of it on your tests and you’ll burn the house down.

And the final thing I wanted to highlight is the granularity of our tests. Do we write a test for every method of every class? Do we have a corresponding test fixture for every module in the implementation? My experience has been that it’s neither necessary nor desirable to have test code that sticks to your internal design like cheese melted into a radio.

At the other extreme, many teams have written tests that do everything from the outside – e.g., from a public API, or at the controller or service level. I call these “keyhole tests”, because when I’ve worked with them it can feel a little like keyhole surgery. My experience with this style of testing is that, for sure, it can decouple your test code from internal complexity, but at a sometimes heavy price when tests fail and we end up in the debugger trying to figure out where in the complex internal – and non-visible – call stack things went wrong. It’s like when the engineer from the gas company detects a leak somewhere in your house by checking for a drop in pressure at the meter outside. Pinpointing the source of the leak may involve ripping up the floors…

The truth, for me, lies somewhere in between melted cheese and keyhole surgery. What I strive for within a body of code is – how can I put this? – internal APIs. These are interfaces within the implementation that encapsulate the inner complexity of a particular behaviour (e.g, the cluster of classes used to calculate mortgage repayments). They decouple that little cluster from the rest of the implementation, just as their dependencies are decoupled from them in the same S.O.L.I.D. way. And their interfaces tend to be stable, because they’re not about the details. Tests written around those interfaces are less likely to need to change, but also more targeted to a part of the design instead of the whole call stack. So when they fail, it’s easier to pinpoint where things went wrong. (Going back to the gas leak example, imagine having multiple test points throughout the house, so we can at least determine what room the leak is in.)

Even for a monolith, I aim to think of good internal architecture as a network of microservices, each with its own tests. By far the biggest cause of brittle tests that I’ve seen is that your code quite probably isn’t like that. It’s a Big Ball of Mud. That can be exacerbated by leaving all the duplication in the test code, and/or by over-reliance on mock objects, plus a tendency to try to write tests for every method or function of every class.

You want tests that run fast and pinpoint failures, but that also leave enough wiggle room to easily refactor what’s happening behind those internal APIs.

Just 3 1-day Client Site Workshops Left

A quick plug for the remaining 1-day client site workshops I’m running to celebrate 10 years of Codemanship.

These code craft training workshops are for groups of up to 20 developers, and can be delivered anywhere in the UK for the never-to-be-repeated price of £1,995. That’s as little as £99.75 per person.

Dates remaining are Thurs Aug 15th, Tues Aug 27th and Thurs Aug 29th. You can choose from an action-packed TDD workshop, refactoring workshop or unit testing workshop. All hands-on and jammed full of practical and useful ideas.

Booking’s easy: whip out that company credit card and visit our Eventbrite page.

Codemanship Exercise “Play-Through” – CD Warehouse Part #4

In yesterday’s post, I implemented the feature to allow record labels to send batches of CDs in to the warehouse. Our design now is starting to grow to a point where compromises have to be made in the design – meaning that, at times, we have to choose between, say, making sure classes have a single responsibility and making sure classes fully encapsulate their data.

It’s very important to make our code as clean as we can, but it’s also important to remember that perfection on anything more complex than the most trivial problems isn’t possible. We might aim for perfection, but we have to accept we’ll never reach it.

This is a nuanced message. Many developers see it as justification for not aiming for perfection – indeed, for not even trying to keep their code clean in some cases. Other developers, myself included, hear “Aim for the stars and you’ll reach the Moon”.

Perfectionism has been given a bad rap in software development. Managers in particular often have an irrational fear of developers “taking quality too far”. Anyone who’s dedicated a few years to looking at people’s code will know that nobody out there is producing software that’s too good.

Keeping code clean is hard. And the more the code grows and ages, the harder it gets: like trying to stay in shape in your forties is harder than in your thirties. Developers usually greatly underestimate how much attention and effort needs to go into keeping code easily maintainable, and teams typically under-invest in it.

Having said all that, it is possible to over-invest: to devote too much thought and effort to studying and refactoring our designs. Like in life: we need to invest in our health and fitness if we want to live a long and full existence. But we also need to get on with life – living, loving, reading, traveling, dancing, painting, snowboarding, and whatever else is the thing that gets us up in the morning.

Software’s a bit like that. We need to invest in giving it a long useful life. But we also need to get shit done. Finding that balance continues to be a subject of great interest (and frustration) to me. How much refactoring is too much? A how little refactoring is too little? Ultimately, what matters is the end result – how much value did we deliver, and for how long?

Anyway, enough navel gazing. Let’s pick up where we left off yesterday and get this shit done.

We’re nearly at the end of our list of tests for the CD warehouse.

  • Buy a CD
    • Payment accepted
    • Payment rejected
    • Out of stock
  • Search for a CD (by title and artist)
    • Warehouse has matching title
    • No match found
  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles
  • Review a CD
    • Just a rating
    • Rating and review

Just two more tests to pass. Okay. Now we know what CDs a customer has bought, we can move on to reviewing a CD. There are a couple of happy paths in our test list:

  • Review a CD
    • Just a rating
    • Rating and review

The simplest case is when they just leave a rating. Let’s write a failing test for that.

To pass this test, we simply push the new review on to the CDs reviews list.

Now let’s add text to our review in our next failing test.

This is easy to pass.

So that’s us done, then. Isn’t it? We’ve crossed off the last two tests on our list.

But there may be one or two little wrinkles we need to think about. Firstly, the requirements said that customers could leave reviews for CDs they’ve bought. What if we try to leave a review for a CD we didn’t buy?

We currently don’t record who bought a CD, and our model has no concept of customer. So there’s a couple more tests we need to add to our list. First, when a CD is bought, we need to make a record of it for the customer who bought it.

Passing this test is simples.

We’re on a green light again. Let’s review the code. I’m not happy about CD accessing the customer’s CD collection directly (or the test, for that matter). Let’s extract some methods and move them to Customer to encapsulate this.

Now that we know what CDs a customer has bought from us, we can add an edge case for when a customer tries to leave a review for a CD that’s not in their collection.

And to pass this, we just need to check if the customer has that CD.

Now we’re back on a green light, let’s review this new code. I’m not happy about the message chain I’ve left: review.customer.has(this). Let’s encapsulate that in the Review class. (This refactoring is called Hide Delegate).

Okay. We’re nearly there. One last question for our real customer: what are these ratings for? As it stands, a CD can have a list of reviews with ratings. But do we expect our users to go through the list adding them up to calculate an average rating? I think the software should do that. Let’s add one more failing test.

To pass this test, we just sum the ratings and divide by their number.

Okay. Back on a green light. Let’s do a mini code review. Well, I’m not convinced my code for calculating the average rating’s entirely obvious. Let’s try extracting a function for calculating the sum.

Next on our checklist is duplication. There is some duplicated set-up in the tests, but only a small amount, and we’re on the edge of the Rule of Three here, in that I could invest time in extracting set-up methods, but would that investment pay off considering that we’re actually done? I’m going to leave it in their, favouring readability.

All the methods and classes in our code are pretty simple – no loops inside loops inside IFs or anything like that.

But I do now think – yet again – that CD has more than one job now. Let’s extract a class for handling reviews.

In my design now, CD does one job, and simply acts as a facade to the other responsibilities, shielding any clients from knowledge of the little subsystem it’s now composed of.

Now we should check through the methods in our classes for any examples of Feature Envy we may have added. Nothing to report there. (Remember, there are examples of Feature Envy that we knowingly introduced as side-effects of splitting apart cohesive responsibilities so that classes do just one job.)

Next, let’s check that the dependencies in a classes are swappable by dependency injection. The quickest way to do this in JavaScript is to look for imports. Aside from a factory class, none of the other classes in our model have direct imports. All of their dependencies are given to them from the outside. And we can see that all the imports are at the top of call stack, in the tests. e.g.

Good modular designs wear their dependencies on the outside.

Finally, let’s check that non of our modules are referencing things they don’t use. In JavaScript, this again means checking the imports.

In this test module, WebStorm tells me that OutOfStockError is unused. Let’s organise our imports and get rid of it.

And that, folks, is my play-through of the CD warehouse exercise from the Codemanship TDD course. No doubt, you’ll have noticed some things while we did it. For example, in some places I chose a defensive style of programming, raising errors if pre-conditions for buying a CD and for leaving a review weren’t met. In other places, I assumed that the client code (e.g., the UI) wouldn’t invoke methods on the model if the precondition was broken (e.g., asking for the average of zero CD ratings).

This was based on edge cases that were either discussed with the customer or not discussed. If we agreed that only people who’ve bought a CD can review it, there’s multiple ways we can enforce that in the code. The way I prefer is to not give the end user that choice if it’s not valid. This can be done by good UI design, for example. Maybe the user is only presented with a “Review” button for CDs that are in their collection.

Anyhoo, that’s a discussion for another time.


If you’d like to see the complete JS code for my CD warehouse, it’s here.


Codemanship Exercise “Play-through” – CD Warehouse Part #3

In yesterday’s post, I continued with the CD warehouse exercise from the Codemanship TDD course, adding a feature to search the catalogue by artist and title.

In this post, I’ll be working on code to receive batches of CDs into the warehouse from record labels.

There are three examples for this use case in my test list:

  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles

In the first example, we’re receiving copies of a CD that’s already in our catalogue, so we would expect to add those copies to the existing stock. Let’s write a failing test for that.

To pass this test, we just need to search the catalogue for a matching CD and then add the copies to its stock.

Now our test is passing, it’s time for a mini code review. Catalogue now breaks our principle that classes should only do one job. I can easily imagine wanting to change how we receive CDs independently of how we search the catalogue, so this class needs to be broken up.

I’ve extracted a containing class that uses the catalogue, and updated the test to change the target of the receive() method call.

There’s another thing that’s bugging me about the receive() method – it accesses CD’s data directly to add copies to the stock. Although that isn’t Feature Envy, strictly speaking, it most certainly isn’t encapsulation. Let’s fix that.

With that fixed, we can move on to the next test – receiving copies of a CD that isn’t in the catalogue.

To pass this test, we need to check if the CD’s not in the catalogue (i.e., search() returns null), create a new CD if it isn’t and add it to the catalogue before adding the received copies to its stock.

We’re on a green light again, which means it’s time for another mini code review. The first thing that jumps out at me is the distinctly unencapsulated way Warehouse adds the new CD to the catalogue. Let’s fix that.

The next thing that catches my eye is the hardwired dependencies on CD and Description. Now, objects have to be created somewhere, so some code has to use new. But if our goal is to have objects use each other polymorphically, then I have a simple rule: objects that create instances of other classes shouldn’t use them, and objects that use instance of other classes shouldn’t create them. You can create an instance or use an instance, but not both.

It’s also good practice to maintain object creation in one place (D.R.Y.), and by itself so changing – say – the constructor of CD doesn’t impact any other logic.

In fact, we already have a candidate for where that might be: the add() method I extracted and moved to Catalogue. We could move object creation to there and return the new CD to Warehouse.

This simplifies Warehouse and removes its direct dependencies on the implementations of CD and Description. But it just pushed our problem down the call stack to Catalogue, which now has those direct dependencies.

This breaks my swappability rule: Catalogue creates an instance of CD, and it uses CD (by calling cd.matches()). Let’s extract the creation into its own factory method.

And then let’s move that factory method into its own class that we can inject into Catalogue so it doesn’t need to have those hardwired implementation dependencies.

(And, yes, I’m not happy with a class name ending in “Factory”, either. Suggestions for a better name on a postcard, please.)

Some of you may be thinking “Doesn’t Catalogue have two jobs now?” Arguably, yes: searching the contents for CDs and adding CDs to the contents. But both jobs revolve around the same data. If we split these responsibilities apart, we’d have to unencapsulate the contents. Eliminating one code smell will introduce another. It’s very much in the nature of software that we’ll have to make design compromises – we can’t tick all of the boxes all of the time. I’m going to leave this as it is fr now, and we’ll see how the design evolves.

One thing that has occurred to me at this point is: what happened to the CD’s price – where is that set? There’s an extra test here we need to add.

To pass this test, we add price to our factory method.

And pass that data through from Warehouse.

Back on a green light, time for a mini code review. Not much has changed in our source code. Two things strike me; first, we have a Data Clump emerging, with the parameters title, artist and price now appearing in several method signatures. Also, three parameters is right on the edge of what I’d consider to be a Long Parameter list. Making that last test pass illustrated why this can be a problem; I found myself adding the same parameter multiple time to multiple methods. What happens the next time we have to add another piece of data to that Data Clump? Let’s simplify these signatures by preserving the whole object.

While we’re in refactoring mode, let’s simplify the test.

(Yes, I jumped the gun on the Rule of Three – but it was such obvious duplication, with no additional readability, that it was bugging me.)

We have one more test on our list for this feature. How does the warehouse receive batches of multiple CDs?

To pass this test, we can simply repeat the logic of receiving a single CD in a forEach loop.

Time for one more mini code review before we take a break. It’s important to remember that on any non-trivial design, a perfect score is rarely possible. I’ve made compromises, based on my own judgement and 37 years of programming hard knocks, that I’m happy to live with. Importantly, I thought about them, and I know why I made those compromises (e.g., introducing Feature Envy because I wanted to split up a cohesive class that was doing two distinct jobs – like Catalogue and Warehouse, or the reverse decision to keep searching and adding in Catalogue to avoid un-encapsulating its contents).

As the code currently stands, I’m good to go. No doubt if I leave the design in to soak overnight, it may taste different in the morning.

Let’s cross off the test cases we passed today from our list, and take the night off.

  • Buy a CD
    • Payment accepted
    • Payment rejected
    • Out of stock
  • Search for a CD (by title and artist)
    • Warehouse has matching title
    • No match found
  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles
  • Review a CD
    • Just a rating
    • Rating and review

Coming next: Reviewing a CD, and a final code review.

Codemanship Exercise “Play-through” – CD Warehouse Part #2

In my previous post, I started tackling the CD warehouse exercise from the Codemanship TDD course in JavaScript.

The focus of the exercise is on applying basic software design principles while test-driving a design. I encourage developers on the course to stop at every passing test – every “green light” – and do a mini code review where they ask 7 questions:

  • Is the code easy to understand?
  • Is there duplication (I encourage them to apply the Rule of Three for this exercise)?
  • Is there unnecessary complexity?
  • Do methods/functions and classes/modules do more than one job?
  • Are classes/modules encapsulated?
  • Are dependencies between classes/modules easily swappable?
  • Do classes/modules depend on things they don’t use?

At each green light, we inspect the code we’ve added or changed for each of the principles, and if the code is found wanting in any of these aspects, we refactor before moving on.

I started my take on the exercise by identifying use cases in the problem description (looking for user actions like “Buy a CD” and “Review a CD”), then making a list of test cases for each use case.

I began working through this list one test case at a time, crossing off each one once I’ve:

a. Passed the test, and

b. Refactored if necessary

I often find making a list of tests before I start coding very helpful. It forces me to analyse the problem a bit, and also gives me a rough roadmap for tackling a problem in a test-driven way. (This is the Test List pattern described in Kent Beck’s book Test-Driven Development by Example.)

This is where we got to yesterday.

  • Buy a CD
    • Payment accepted
    • Payment rejected
    • Out of stock
  • Search for a CD (by title and artist)
    • Warehouse has matching title
    • No match found
  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles
  • Review a CD
    • Just a rating
    • Rating and review

The next test on our list is searching for a CD by artist and title where the warehouse contains a matching CD. Let’s write a failing test.

I’ve set up my catalogue to contain just one CD – the one we’re searching for – which is the simplest test case I could think of.

In yesterday’s tests, I found the implementations to be quite trivial, so I didn’t triangulate them through a series of examples, I just wrote the Obvious Implementation (another of the patterns described in Kent Beck’s book). My gut is telling me that for this behaviour, there’s a bit more to it, so I’ll pass this test by doing something very simple.

Now the test is passing, it’s time to do a mini code review. I go through my checklist and it looks to me like we’re good to move on to the next example. Remember that one of the goals of TDD is to get feedback as frequently as possible, making fewer design decisions in each feedback cycle. If you feel that you’re making too many design decisions before seeing the test pass, maybe take smaller steps. This is a skill that requires a lot of practice. Even after two decades of doing TDD, I still regularly find myself backing out of a test that’s taking too long (and frequent commits are essential for that ability) and trying to break the problem down a bit more.

To pass both of these tests, I’ll need to do something a bit more sophisticated in the implementation.

This is fine for now, and both tests pass. Time for another mini code review.

There’s obvious duplication in the tests, but as we’re strictly applying the Rule of Three, we won’t do anything about that just now. But what does jump out at me is the Feature Envy that the search() method has for the fields of CD. Let’s extract that code into its own method.

And then let’s move that new method to where it clearly belongs.

One point that I think a lot of developers miss is that refactorings can have side effects – that is, eliminating one code smell (like Feature Envy) can introduce new code smells. Moving the matches() method – which has nothing to do with buying a CD – to the CD class now means that CD has more than one distinct responsibility.

I can imagine wanting to change or vary how we match CDs (e.g., partial instead of exact matches) independently of how we buy them. Let’s split this class up to separate the responsibilities.

Now we have to look at the nature of the dependency between CD and Description. It’s not swappable by dependency injection, so CD depends on that specific implementation – it’s hardwired. Let’s introduce it as a constructor parameter.

Okay. All the tests are passing and my code so far is clean, so I can move on to the next failing test, which is for when there’s no matching CD in the catalogue.

When I run this test, it surprisingly passes. I would have expected an array index out of bounds error, coming as I am from stricter languages, but JavaScript seems just fine referencing the first item in an empty list. The result is null. A discussion about the merits or otherwise of this language feature is for another day. For practical purposes, it seems to work, and this test – although not failing – is still a valid and useful test – what does it do when no matching CDs are found? So I’ll keep it as part of my suite.

That crosses off two more test cases from my list, and we’ll park it there for today.

  • Buy a CD
    • Payment accepted
    • Payment rejected
    • Out of stock
  • Search for a CD (by title and artist)
    • Warehouse has matching title
    • No match found
  • Receive batch of CDs
    • Of existing title in catalogue
    • Of new title (add to catalogue)
    • Of multiple titles
  • Review a CD
    • Just a rating
    • Rating and review

One final thought; folk often observe when they watch me doing TDD that it can sometimes feel like more refactoring than writing and passing tests. This is normal and healthy, just as long as we’re making good progress. I’ve found that developers usually underestimate how much refactoring’s required to keep code clean. Don’t skimp!

Coming next, receiving batches of CDs…