The Test Pyramid – The Key To True Agility

On the Codemanship TDD course, before we discuss Continuous Delivery and how essential it is to achieving real agility, we talk about the Test Pyramid.

It has various interpretations, in terms of the exactly how many layers and exactly what kinds of testing each layer is made of (unit, integration, service, controller, component, UI etc), but the overall sentiment is straightforward:

The longer tests take to run, the fewer of those kinds of tests you should aim to have

test_pyramid

The idea is that the tests we run most often need to be as fast as possible (otherwise we run them less often). These are typically described as “unit tests”, but that means different things to different people, so I’ll qualify: tests that do not involve any external dependencies. They don’t read from or write to databases, they don’t read or write files, they don’t connect with web services, and so on. Everything that happens in these tests happens inside the same memory address space. Call them In-Process Tests, if you like.

Tests that necessarily check our code works with external dependencies have to cross process boundaries when they’re executed. As our In-Process tests have already checked the logic of our code, these Cross-Process Tests check that our code – the client – and the external code – the suppliers – obey the contracts of their interactions. I call these “integration tests”, but some folk have a different definition of integration test. So, again, I qualify it as: tests that involve external dependencies.

These typically take considerably longer to execute than “unit tests”, and we should aim to have proportionally fewer of them and to run them proportionally less often. We might have thousands of unit tests, and maybe hundreds of integration tests.

If the unit tests cover the majority of our code – say, 90% of it – and maybe 10% of our code has direct external dependencies that have to be tested, on average we’ll make about 9 changes that need unit testing compared to 1 change that needs integration testing. In other words, we’d need to run our unit tests 9x as often as our integration tests, which is a good thing if each integration test is about 9 times slower than a unit test.

At the top of our test pyramid are the slowest tests of all. Typically these are tests that exercise the entire system stack, through the user interface (or API) all the way down to the external dependencies. These tests check that it all works when we plug everything together and deploy it into a specific environment. If we’ve already tested the logic of our code with unit tests, and tested the interactions with external suppliers, what’s left to test?

Some developers mistakenly believe that these system-levels tests are for checking the logic of the user experience – user “journeys”, if you like. This is a mistake. There are usually a lot of user journeys, so we’d end up with a lot of these very slow-running tests and an upside-down pyramid. The trick here is to make the logic of the user experience unit-testable. View models are a simple architectural pattern for logically representing what users see and what users do at that level. At the highest level they may be looking at an HTML table and clicking a button to submit a form, but at the logical level, maybe they’re looking at a movie and renting it.

A view model can help us encapsulate the logic of user experience in a way that can be tested quickly, pushing most of our UI/UX tests down to the base of the pyramid where they belong. What’s left – the code that must directly reference physical UI elements like HTML tables and buttons – can be wafer thin. At that level, all we’re testing is that views are rendered correctly and that user actions trigger the correct internal logic (which can easily be done using mock objects). These are integration tests, and belong in the middle layer of our pyramid, not the top.

Another classic error is to check core logic through the GUI. For example, checking that insurance premiums are calculated correctly by looking at what number is rendered on that web page. Some module somewhere does that calculation. That should be unit-testable.

So, if they’re not testing user journeys, and they’re not testing core logic, what do our system tests test? What’s left?

Well, have you ever found yourself saying “It worked on my machine”? The saying goes “There’s many a slip ‘twixt cup and lip.” Just because all the pieces work, and just because they all play nicely together, it’s not guaranteed that when we deploy the whole system into, say, our EC2 instances, that nothing could be different to the environments we tested it in. I’ve seen roll-outs go wrong because the servers handled dates different, or had the wrong locale, or a different file system, or security restrictions that weren’t in place on dev machines.

The last piece of the jigsaw is the system configuration, where our code meets the real production environment – or a simulation of it – and we find out if really works where it’s intended to work as a whole.

We may need dozens of those kinds of tests, and perhaps only need to run them on, say, every CI build by deploying the outputs to a staging environment that mirrors the production environment (and only if all our unit and integration tests pass first, of course.) These are our “good to go?” tests.

The shape of our test pyramid is critical to achieving feedback loops that are fast enough to allow us to sustain the pace of development. Ideally, after we make any change, we should want to get feedback straight away about the impact of that change. If 90% of our code can be re-tested in under 30 seconds, we can re-test 90% of our changes many times an hour and be alerted within 30 seconds if we broke something. If it takes an hour to re-test our code, then we have a problem.

Continuous Delivery means that our code is always shippable. That means it must always be working, or as near as possible always. If re-testing takes an hour, that means that we’re an hour away from finding out if changes we made broke the code. It means we’re an hour away from knowing if our code is shippable. And, after an hour’s-worth of changes without re-testing, chances are high that it is broken and we just don’t know it yet.

An upside-down test pyramid puts Continuous Delivery out of your reach. Your confidence that the code’s shippable at any point in time will be low. And the odds that it’s not shippable will be high.

The impact of slow-running test suites on development is profound. I’ve found many times that when a team invested in speeding up their tests, many other problems magically disappeared. Slow tests – which means slow builds, which means slow release cycles – is like a development team’s metabolism. Many health problems can be caused by a slow metabolism. It really is that fundamental.

Slow tests are pennies to the pound of the wider feedback loops of release cycles. You’d be surprised how much of your release cycles are, at the lowest level, made up of re-testing cycles. The outer feedback loops of delivery are made of the inner feedback loops of testing. Fast-running automated tests – as an enabler of fast release cycles and sustained innovation – are therefore highly desirable

A right-way-up test pyramid doesn’t happen by accident, and doesn’t come at no cost, though. Many organisations, sadly, aren’t prepared to make that investment, and limp on with upside-down pyramids and slow test feedback until the going gets too tough to continue.

As well as writing automated tests, there’s also an investment needed in your software’s architecture. In particular, the way teams apply basic design principles tends to determine the shape of their test pyramid.

I see a lot of duplicated code that contains duplicated external dependencies, for example. It’s not uncommon to find systems with multiple modules that connect to the same database, or that connect to the same web service. If those connections happened in one place only, that part of the code could be integration tested just once. D.R.Y. helps us achieve a right-way-up pyramid.

I see a lot of code where a module or function that does a business calculation also connects to an external dependency, or where a GUI module also contains business logic, so that the only way to test that core logic is with an integration test. Single Responsibility helps us achieve a right-way-up pyramid.

I see a lot of code where a module in one web service interacts with multiple features of another web service – Feature Envy, but on a larger scale – so there are multiple points of integration that require testing. Encapsulation helps us achieve a right-way-up pyramid.

I see a lot of code where a module containing core logic references an external dependency, like a database connection, directly by its implementation, instead of through an abstraction that could be easily swapped by dependency injection. Dependency Inversion helps us achieve a right-way-up pyramid.

Achieving a design with less duplication, where modules do one job, where components and services know as little as possible about each other, and where external dependencies can be easily stubbed or mocked by dependency injection, is essential if you want your test pyramid to be the right way up. But code doesn’t get that way by accident. There’s significant ongoing effort required to keep the code clean by refactoring. And that gets easier the faster your tests run. Chicken, meet egg.

If we’re lucky enough to be starting from scratch, the best way we know of to ensure a right-way-up test pyramid is to write the tests first. This compels us to design our code in such a way that it’s inherently unit-testable. I’ve yet to come across a team genuinely doing Continuous Delivery who wasn’t doing some kind of TDD.

If you’re working on legacy code, where maybe you’re relying on browser-based tests, or might have no automated tests at all, there’s usually a mountain to climb to get a test pyramid that’s the right way up. You need to write fast-running tests, but you will probably need to refactor the code to make that possible. Egg, meet chicken.

Like all mountains, though, it can be climbed. One small, careful step at a time. Michael Feather’s book Working Effectively With Legacy Code describes a process for making changes safely to code that lacks fast-running automated tests. It goes something like this:

  • Identify what code you need to change
  • Identify where around that code you’d want unit tests to make the change safely
  • Break any dependencies in that code getting in the way of unit testing
  • Write the unit tests
  • Make the change
  • While you’re there, make other improvements that will help the next developer who needs to change that code (the “boy scout rule” – leave the camp site tidier than you found it)

Change after change, made safely in this way, will – over time – build up a suite of fast-running unit tests that will make future changes easier. I’ve worked on legacy code bases that went from upside-down test pyramids of mostly GUI-based system tests, that took hours or even days to run, to right-side-up pyramids where most of the code could be tested in under a minute. The impact on the cost and the speed of delivery is always staggering. It can be done.

But be patient. A code base might take a year or two to turn around, and at first the going will be tough. I find I have to be super-disciplined in those early stages. I manually re-test as I refactor, and resist the temptation to make a whole bunch of changes at a time before I re-test. Slow and steady, adding value and clearing paths for future changes at the same time.

Action(Object), Object.Action() and Encapsulation

Just a quick post to bookmark an interesting discussion happening in Twitter right now in response to a little tweet I sent out.

Lot’s of different takes on this, but they tend to fall into three rough camps:

  • Lots of developers prefer action(object) because it reads the way we understand it – buy(cd), kick(ball) etc. Although, of course, this would imply functional programming (or static methods of unnamed classes)
  • Some like a subject, too – customer.buy(cd), player.kick(ball)
  • Some prefer the classic OOP – ball.kick(), cd.buy()

More than a few invented new requirements, I noticed. A discussion about YAGNI is for another time, though, I think.

Now, the problem with attaching the behaviour to a subject (or a function or static method of a different module or class) is you can end up with Feature Envy.

Let’s just say, for the sake of argument, that kicking a ball changes it’s position along an X-Y vector:

class Player(object):
    @staticmethod
    def kick(ball, vector):
        ball.x = ball.x + vector.x
        ball.y = ball.y + vector.y


class Ball(object):
    def __init__(self):
        self.x = 0
        self.y = 0


class Vector(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y


if __name__ == "__main__":
    ball = Ball()
    Player.kick(ball, Vector(5,5))
    print("Ball -> x =", ball.x, ", y =", ball.y)

Player.kick() has Feature Envy for the fields of Ball. Separating agency from data, I’ve observed tends to lead to data classes – classes that are just made of fields (or getters and setters for fields, which is just as bad from a coupling point of view) – and lots of low-level coupling at the other end of the relationship.

If I eliminate the Feature Envy, I end up with:

class Player(object):
    @staticmethod
    def kick(ball, vector):
        ball.kick(vector)


class Ball(object):
    def __init__(self):
        self.x = 0
        self.y = 0

    def kick(self, vector):
        self.x = self.x + vector.x
        self.y = self.y + vector.y

And in this example – if we don’t invent any extra requirements – we don’t necessarily need Player at all. YAGNI.

class Ball(object):
    def __init__(self):
        self.x = 0
        self.y = 0

    def kick(self, vector):
        self.x = self.x + vector.x
        self.y = self.y + vector.y


class Vector(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y


if __name__ == "__main__":
    ball = Ball()
    ball.kick(Vector(5,5))
    print("Ball -> x =", ball.x, ", y =", ball.y)

So we reduce coupling and simplify the design – no need for a subject, just an object. The price we pay – the trade-off, if you like – is that some developers find ball.kick() counter-intuitive.

It’s a can of worms!

“Stateless” – You Keep Using That Word…

One of the requirements of pure functions is that they are stateless. To many developers, this means simply that the data upon which the function acts is immutable. When dealing with objects, we mean that the object of an action has immutable fields, set at instantiation and then never changing throughout the instance’s life cycle.

In actual fact, this is not what ‘stateless’ means. Stateless means that the result of an action – e.e. a method call or a function call – is always the same given the same inputs, no matter how many times it’s invoked.

The classic stateless function is one that calculates square roots. sqrt(4) is always 2. sqrt(6.25) is always 2.5, and so on.

The classic stateful function is a light switch. The result of flicking the switch depends on whether the light is on or off at the time. If it’s off, it’s switched on. If it’s on, it’s switched off.

function Light() {
    this.on = false;

    this.flickSwitch = function (){
        this.on = !this.on;
    }
}

let light = new Light();

light.flickSwitch();
console.log(light);

light.flickSwitch();
console.log(light);

light.flickSwitch();
console.log(light);

light.flickSwitch();
console.log(light);

This code produces the output:

{ on: true }
{ on: false }
{ on: true }
{ on: false }

Most domain concepts in the real world are stateful, like our light switch. That is to say, they have a life cycle during which their behaviour changes depending on what has happened to them previously.

This is why finite state machines form a theoretical foundation for all program behaviour. Or, more simply, all program behaviour can be modeled as a finite state machine – a logical map of an object’s life cycle.

lightswitch

Now, a lot of developers would argue that flickSwitch() is stateful because it acts on an object with a mutable field. They would then reason that making on immutable, and producing a copy of the light with it’s state changed, would make it stateless.

const light = {
    on: false
}

function flickSwitch(light){
    return {...light, on: !light.on};
}

const copy1 = flickSwitch(light)
console.log(copy1);

const copy2 = flickSwitch(copy1);
console.log(copy2);

const copy3 = flickSwitch(copy2);
console.log(copy3);

const copy4 = flickSwitch(copy3);
console.log(copy4);

Technically, this is a pure functional implementation of our light switch. No state changes, and the result of each call to flickSwitch() is entirely determined by its input.

But, is it stateless? I mean, is it really? Technically, yes it is. But conceptually, no it certainly isn’t.

If this code was controlling a real light in the real world, then there’s only one light, it’s state changes, and the result of each invocation of flickSwitch() depends on the light’s history.

This is functional programming’s dirty little secret. In memory, it’s stateless and pure functional. Hooray for FP! But at the system level, it’s stateful.

While making it stateless can certainly help us to reason about the logic when considered in isolation – at the unit, or component or service level – when the identity of the object being acted upon is persistent, we lose those benefits at the system level.

Imagine we have two switches controlling a single light (e.g., one at the top of a flight of stairs and one at the bottom.)

lightswitches

In this situation, where a shared object is accessed in two different places, it’s harder to reason about the state of the light without knowing its history.

If I have to replace the bulb, I’d like to know if the light is on or off. With a single switch, I just need to look to see if it’s in the up (off) or down (on) position. With two switches, I need to understand the history. Was it last switched on, or switched off?

Copying immutable objects, when they have persistent identity – it’s the same light – does not make functions that act on those objects stateless. It makes them pure functional, sure. But we still need to consider their history. And in situations of multiple access (concurrency), it’s no less complicated than reasoning about mutable state, and just as prone to errors.

When I was knocking up my little code example, my first implementation of the FP version was:

const light = {
    on: false
}

function flickSwitch(light){
    return {...light, on: !light.on};
}

const copy1 = flickSwitch(light)
console.log(copy1);

const copy2 = flickSwitch(copy1);
console.log(copy2);

const copy3 = flickSwitch(copy2);
console.log(copy3);

const copy4 = flickSwitch(copy3);
console.log(copy3);

Do you see the error? When I ran it, it produced this output.

{ on: true }
{ on: false }
{ on: true }
{ on: true }

This is a class of bug I’ve seen many times in functional code. The last console.log uses the wrong copy.

The order – in this case, the order of copies – matters. And when the order matters, our logic isn’t stateless. It has history.

The most common manifestation of this class of bug I come across is in FP programs that have databases where object state is stored and shared across multiple client threads or processes.

Another workaround is to push the versioning model of our logical design into the database itself, in the form of event sourcing. This again, though, is far from history-agnostic and therefore far from stateless. Each object’s state – rather than being a single record in a single table that changes over time – is now the aggregate of the history of events that mutated it.

Going back to our finite state machine, each object is represented as the sequence of actions that brought it to its current state (e.g., flickSwitch() -> flickSwitch() -> flickSwitch() would produce a light that’s turned on.)

In reasoning about our logic, despite all the spiffy technological workarounds of FP, event sourcing and so on, if objects conceptually have history then they conceptually have state. And at the system level, we have to get that logic conceptually right.

Yet again, technology – including programming paradigm – is no substitute for thinking.

Overcoming Solution Bias

Just a short post this morning about a phenomenon I’ve seen many times in software development – which, for want of a better name, I’m calling solution bias.

It’s the tendency of developers, once they’ve settled on a solution to a problem, to refuse to let go of it – regardless of what facts may come to light that suggest it’s the wrong solution.

I’ve even watched teams argue with their customer to try to get them to change their requirements to fit a solution design the team have come up with. It seems once we have a solution in our heads (or in a Git repository) we can become so invested in it that – to borrow a metaphor – everything looks like a nail.

The damage this can do is obvious. Remember your backlog? That’s a solution design. And once a backlog’s been established, it has a kind of inertia that makes it unlikely to change much. We may fiddle at the edges, but once the blueprints have been drawn up, they don’t change significantly. It’s vanishingly rare to see teams throw their designs away and start afresh, even when it’s screamingly obvious that what they’re building isn’t going to work.

I think this is just human nature: when the facts don’t fit the theory, our inclination is to change the facts and not the theory. That’s why we have the scientific method: because humans are deeply flawed in this kind of way.

In software development, it’s important – if we want to avoid solution bias – to first accept that it exists, and that our approach must actively take steps to counteract it.

Here’s what I’ve seen work:

  • Testable Goals – sounds obvious, but it still amazes me how many teams have no goals they’re working towards other than “deliver on the plan”. A much more objective picture of whether the plan actually works can help enormously, especially when it’s put front-and-centre in all the team’s activities. Try something. Test it against the goal. See if it really works. Adapt if it doesn’t.
  • Multiple Designs – teams get especially invested in a solution design when it’s the only one they’ve got. Early development of candidate solutions should explore multiple design avenues, tested against the customer’s goals, and selected for extinction if they don’t measure up. Evolutionary design requires sufficiently diverse populations of possible solutions.
  • Small, Frequent Releases – a team that’s invested a year in a solution is going to resist that solution being rejected with far more energy than a team who invested a week in it. If we accept that an evolutionary design process is going to have failed experiments, we should seek to keep those experiments short and cheap.
  • Discourage Over-Specialisation – solution architectures can define professional territory. If the best solution is a browser-based application, that can be good news for JavaScript folks, but bad news for C++ developers. I often see teams try to steer the solution in a direction that favours their skill sets over others. This is understandable, of course. But when the solution to sorting a list of surnames is to write them into a database and use SQL because that’s what the developers know how to do, it can lead to some pretty inappropriate architectures. Much better, I’ve found, to invest in bringing teams up to speed on whatever technology will work best. If it needs to be done in JavaScript, give the Java folks a couple of weeks to learn enough JavaScript to make them productive. Don’t put developers in a position where the choice of solution architecture threatens their job.
  • Provide Safety – I can’t help feeling that a good deal of solution bias is the result of fear. Fear of failure.  Fear of blame. Fear of being sidelined. Fear of losing your job. If we accept that the design process is going to involve failed experiments, and engineer the process so that teams fail fast and fail cheaply – with no personal or professional ramifications when they do – then we can get on with the business of trying shit and seeing if it works. I’ve long felt that confidence isn’t being sure you’ll succeed, it’s not being afraid to fail. Reassure teams that failure is part of the process. We expect it. We know that – especially early on in the process of exploring the solution space – candidate solutions will get rejected. Importantly: the solutions get rejected, not the people who designed them.

As we learn from each experiment, we’ll hopefully converge on the likeliest candidate solution, and the whole team will be drawn in to building on that, picking up whatever technical skills are required as they do. At the end, we may not also deliver a good working solution, but a stronger team of people who have grown through this process.

 

Action->Object vs. Object->Action

One of the factors that I see programmers new to objects struggling with is our natural tendency to separate agency from data. Things do things to other things. The VCR plays the video. The toaster heats the toast. The driver drives the taxi. Etc.

I think it’s possibly linguistic, too, that we – in most natural languages – but the object after the verb: play(video), toast(bread), drive(taxi).

Thing is, though – this isn’t how object oriented programming works. Objects encapsulate agency with the data it works on, producing video.play(), bread.toast() and taxi.drive().

In OOP, the cat kicks its own arse.

You’re absolutely correct if you’re thinking “That isn’t how we’d say or write it in real life”. It isn’t. I suspect this is one of the reasons some programmers find OOP counter-intuitive – it goes against the way we see the world.

Ironically, Object thinking – while not intuitive in that sense – makes discovery of actions much easier. What can I do with a video? What can I do with bread? And so forth. That’s why Object->Action still dominates UI design. Well, good UI design, anyway. Likewise, developers tend to find it easier to discover functions that can be applied to types when they start with the type.

When I wrote code to tell the story of what happens when a member donates a video to a community library, each line started with a function – well, in Java, a static method, which is effectively the same thing. This is not great OOP. Indeed, it’s not OOP. It’s FP.

And that’s fine. Functional Programming works more the way we say things in the real world. Clean the dishes. Set the timer. Kick the cat. I suspect this is one reason why more and more programmers are draw to the functional paradigm – it works more the way we think, and reads more the way we talk. Or, at least, it can if we’re naming things well.

(There’s a separate discussion about encapsulation in FP. The tendency is for functional programmers not to bother with, which leads to inevitable coupling problems. That’s not because you can’t encapsulate data in FP. It’s just that, as a concept, it’s not been paid much attention.)

If you’re doing OOP – and I still do much of the time, because it’s perfectly workable, thank you very much – then it goes Object->Action. Methods like play(video) and kick(cat) hint at responsibilities being in the wrong place, leading to the lack of encapsulation I witness in so much OO code.

It’s like they say; give a C programmer C++, and they’ll write you a C program with it.

 

 

 

Do Your Unit Tests Know *Too* Much?

A common pitfall of extensive unit testing reported by many teams is that, as the number of tests builds up, changing the implementation under test forces them to rewrite many, many tests. In this scenario, the test code becomes a barrier to change instead of its enabler.

Having witnessed this quite a few times first-hand, I’ve got a few observations I want to share.

First of all, it’s actually usually a problem caused by unmanaged dependencies in our implementation code that causes changes to ripple out to large numbers of tests.

Imagine we wrote a unit test for every public method or function in our implementation, and we decide to change the way one of them works. If that breaks a hundred of our unit tests, my first thought might be “Why is this method/function referenced in a hundred tests?” Did we write a hundred distinct tests for that one thing, or is it used in the set-up of a hundred tests? That would imply that there’s no real separation of concerns in that part of the implementation. A lack of modularity creates all kinds of problems that show up in test code – usually in the set-ups.

The second thing I wanted to mention is duplication in test code. There’s a dangerous idea that’s been gaining in popularity in recent years that we shouldn’t refactor any duplication out of our tests. The thinking is that it can make our tests harder to understand, and there’s some merit to this when it’s done with little thought.

But there are ways to compose tests, reuse set-ups, assertions and whole tests that clearly communicate what’s going on (many well described in the xUnit Test Patterns book). Inexperienced programmers often struggle with code that’s composed out of small, simple parts, and it almost always comes down to poor naming.

Composition – like separation of concerns – needs to be a black box affair. If you have to look inside the box to understand what’s going on, then you have a problem. I’m as guilty of sloppy naming as anyone, and that’s something I’m working on to improve at.

There’s one mechanism in particular for removing duplication from test code that I’ve been championing for 20 years – parameterised tests. When we have multiple examples of the same rule or behaviour covered by multiple tests, it’s a quick win to consolidate those examples into a single data-driven test that exercises all of those cases. This can help us in several ways:

  • Removes duplication
  • Offers an opportunity to document the rule instead of the examples (e.g., fourthFibonacciNumberIsTwo(), sixthFibonacciNumberIsFive() can become fibonacciNumberIsSumOfPreviousTwo() )
  • It opens a door to much more exhaustive testing with surprisingly little extra code

Maybe those 100 tests could be a handful of parameterised tests?

The fourth thing I wanted to talk about is over-reliance on mock objects. Mocks can be a great tool for achieving cohesive, loosely-coupled modules in a Tell, Don’t Ask style of design – I’m under the impression that’s why they were originally invented. But as they give with one hand, they can take away with the other. The irony with mocks is that, while they can lead us to better encapsulation, they do so by exposing the internal interactions of our modules. A little mocking can be powerful design fuel, but pour too much of it on your tests and you’ll burn the house down.

And the final thing I wanted to highlight is the granularity of our tests. Do we write a test for every method of every class? Do we have a corresponding test fixture for every module in the implementation? My experience has been that it’s neither necessary nor desirable to have test code that sticks to your internal design like cheese melted into a radio.

At the other extreme, many teams have written tests that do everything from the outside – e.g., from a public API, or at the controller or service level. I call these “keyhole tests”, because when I’ve worked with them it can feel a little like keyhole surgery. My experience with this style of testing is that, for sure, it can decouple your test code from internal complexity, but at a sometimes heavy price when tests fail and we end up in the debugger trying to figure out where in the complex internal – and non-visible – call stack things went wrong. It’s like when the engineer from the gas company detects a leak somewhere in your house by checking for a drop in pressure at the meter outside. Pinpointing the source of the leak may involve ripping up the floors…

The truth, for me, lies somewhere in between melted cheese and keyhole surgery. What I strive for within a body of code is – how can I put this? – internal APIs. These are interfaces within the implementation that encapsulate the inner complexity of a particular behaviour (e.g, the cluster of classes used to calculate mortgage repayments). They decouple that little cluster from the rest of the implementation, just as their dependencies are decoupled from them in the same S.O.L.I.D. way. And their interfaces tend to be stable, because they’re not about the details. Tests written around those interfaces are less likely to need to change, but also more targeted to a part of the design instead of the whole call stack. So when they fail, it’s easier to pinpoint where things went wrong. (Going back to the gas leak example, imagine having multiple test points throughout the house, so we can at least determine what room the leak is in.)

Even for a monolith, I aim to think of good internal architecture as a network of microservices, each with its own tests. By far the biggest cause of brittle tests that I’ve seen is that your code quite probably isn’t like that. It’s a Big Ball of Mud. That can be exacerbated by leaving all the duplication in the test code, and/or by over-reliance on mock objects, plus a tendency to try to write tests for every method or function of every class.

You want tests that run fast and pinpoint failures, but that also leave enough wiggle room to easily refactor what’s happening behind those internal APIs.

Code Craft is More Throws Of The Dice

On the occasions founders ask me about the business case for code craft practices like unit testing, Continuous Integration and refactoring, we get to a crunch moment: will this guarantee success for my business?

Honestly? No. Nobody can guarantee that.

Unit testing can’t guarantee that. Test-Driven Development can’t guarantee that. Refactoring can’t guarantee it. Automated builds can’t guarantee it. Microservices can’t. The Cloud can’t. Event sourcing can’t. NoSQL can’t. Lean can’t. Scrum can’t. Kanban can’t. Agile can’t. Nothing can.

And that is the whole point of code craft. In the crap game of technology, every release of our product or system is almost certainly not a winning throw of the dice. You’re going to need to throw again. And again. And again. And again.

What code craft offers is more throws of the dice. It’s a very simple value proposition. Releasing working software sooner, more often and for longer improves your chances of hitting the jackpot. More so than any other discipline in software development.