A common pitfall of extensive unit testing reported by many teams is that, as the number of tests builds up, changing the implementation under test forces them to rewrite many, many tests. In this scenario, the test code becomes a barrier to change instead of its enabler.
Having witnessed this quite a few times first-hand, I’ve got a few observations I want to share.
First of all, it’s actually usually a problem caused by unmanaged dependencies in our implementation code that causes changes to ripple out to large numbers of tests.
Imagine we wrote a unit test for every public method or function in our implementation, and we decide to change the way one of them works. If that breaks a hundred of our unit tests, my first thought might be “Why is this method/function referenced in a hundred tests?” Did we write a hundred distinct tests for that one thing, or is it used in the set-up of a hundred tests? That would imply that there’s no real separation of concerns in that part of the implementation. A lack of modularity creates all kinds of problems that show up in test code – usually in the set-ups.
The second thing I wanted to mention is duplication in test code. There’s a dangerous idea that’s been gaining in popularity in recent years that we shouldn’t refactor any duplication out of our tests. The thinking is that it can make our tests harder to understand, and there’s some merit to this when it’s done with little thought.
But there are ways to compose tests, reuse set-ups, assertions and whole tests that clearly communicate what’s going on (many well described in the xUnit Test Patterns book). Inexperienced programmers often struggle with code that’s composed out of small, simple parts, and it almost always comes down to poor naming.
Composition – like separation of concerns – needs to be a black box affair. If you have to look inside the box to understand what’s going on, then you have a problem. I’m as guilty of sloppy naming as anyone, and that’s something I’m working on to improve at.
There’s one mechanism in particular for removing duplication from test code that I’ve been championing for 20 years – parameterised tests. When we have multiple examples of the same rule or behaviour covered by multiple tests, it’s a quick win to consolidate those examples into a single data-driven test that exercises all of those cases. This can help us in several ways:
- Removes duplication
- Offers an opportunity to document the rule instead of the examples (e.g., fourthFibonacciNumberIsTwo(), sixthFibonacciNumberIsFive() can become fibonacciNumberIsSumOfPreviousTwo() )
- It opens a door to much more exhaustive testing with surprisingly little extra code
Maybe those 100 tests could be a handful of parameterised tests?
The fourth thing I wanted to talk about is over-reliance on mock objects. Mocks can be a great tool for achieving cohesive, loosely-coupled modules in a Tell, Don’t Ask style of design – I’m under the impression that’s why they were originally invented. But as they give with one hand, they can take away with the other. The irony with mocks is that, while they can lead us to better encapsulation, they do so by exposing the internal interactions of our modules. A little mocking can be powerful design fuel, but pour too much of it on your tests and you’ll burn the house down.
And the final thing I wanted to highlight is the granularity of our tests. Do we write a test for every method of every class? Do we have a corresponding test fixture for every module in the implementation? My experience has been that it’s neither necessary nor desirable to have test code that sticks to your internal design like cheese melted into a radio.
At the other extreme, many teams have written tests that do everything from the outside – e.g., from a public API, or at the controller or service level. I call these “keyhole tests”, because when I’ve worked with them it can feel a little like keyhole surgery. My experience with this style of testing is that, for sure, it can decouple your test code from internal complexity, but at a sometimes heavy price when tests fail and we end up in the debugger trying to figure out where in the complex internal – and non-visible – call stack things went wrong. It’s like when the engineer from the gas company detects a leak somewhere in your house by checking for a drop in pressure at the meter outside. Pinpointing the source of the leak may involve ripping up the floors…
The truth, for me, lies somewhere in between melted cheese and keyhole surgery. What I strive for within a body of code is – how can I put this? – internal APIs. These are interfaces within the implementation that encapsulate the inner complexity of a particular behaviour (e.g, the cluster of classes used to calculate mortgage repayments). They decouple that little cluster from the rest of the implementation, just as their dependencies are decoupled from them in the same S.O.L.I.D. way. And their interfaces tend to be stable, because they’re not about the details. Tests written around those interfaces are less likely to need to change, but also more targeted to a part of the design instead of the whole call stack. So when they fail, it’s easier to pinpoint where things went wrong. (Going back to the gas leak example, imagine having multiple test points throughout the house, so we can at least determine what room the leak is in.)
Even for a monolith, I aim to think of good internal architecture as a network of microservices, each with its own tests. By far the biggest cause of brittle tests that I’ve seen is that your code quite probably isn’t like that. It’s a Big Ball of Mud. That can be exacerbated by leaving all the duplication in the test code, and/or by over-reliance on mock objects, plus a tendency to try to write tests for every method or function of every class.
You want tests that run fast and pinpoint failures, but that also leave enough wiggle room to easily refactor what’s happening behind those internal APIs.