The Gaps Between The Gaps – The Future of Software Testing

If you recall your high school maths (yes, with an “s”!), think back to calculus. This hugely important idea is built on something surprisingly simple: smaller and smaller slices.

If we want to roughly estimate determine the area under a curve, we can add up the areas of rectangular slices underneath. If we want to improve the estimate, we make the slices thinner. Make them thinner still, the estimate gets even better. Make them infinitely thin, and we get a completely accurate result. We can actually prove the area under the curve by taking an infinite number of samples.

In computing, I’ve lived through several revolutions where increasing computing power has meant more and more samples can be taken, until the gaps between them are so small that – to all intents and purposes – the end result is analog. Digital Signal Processing, for example, has reached a level of maturity where digital guitar amplifiers and digital synthesizers and digital tape recorders are indistinguishable from the real thing to the human ear. As sample rates and bit depths increased, and number-crunching power skyrocketed while the cost per FLOP plummeted, we eventually arrived at a point where the question of, say, whether to buy a real tube amplifier or use a digitally modeled tube amplifier is largely a matter of personal preference rather than practical difference.

Software testing’s been quietly undergoing the same revolution. When I started out, automated test suites ran overnight on machines that were thousands of times less powerful than my laptop. Today, I see large unit test suites running in minutes or fractions of minutes on hardware that’s way faster and often cheaper.

Factor in the Cloud, and teams now can chuck what would relatively recently have been classed as “supercomputing” power at their test suites for a few extra dollars each time. While Moore’s Law seems to have stalled at the CPU level, the scaling out of computing power shows no signs of slowing down – more and more cores in more and more nodes for less and less money.

I have a client who I worked with to re-engineer a portion of their JUnit test suite for a mission critical application, adding a staggering 2.5 billion additional property-based test cases (with only an additional 1,000 lines of code, I might add). This extended suite – which reuses – but doesn’t replace – their day-to-day suite of tests – runs overnight in about 5 1/2 hours on Cloud-based hardware. (They call it “draining the swamp”).

I can easily imagine that suite running in 5 1/2 minutes in a decade’s time. Or running 250 billion tests overnight.

And it occurred to me that, as the gaps between tests get smaller and smaller, we’re tending towards what is – to all intents and purposes – a kind of proof of correctness for that code. Imagine writing software to guide a probe to the moons of Jupiter. A margin of error of 0.001% in calculations could throw it hundreds of thousands of kilometres off course. How small would the gaps need to be to ensure an accuracy of, say, 1km, or 100m, or 10m? (And yes, I know they can course correct as they get closer, but you catch my drift hopefully.)

When the gaps between the tests are significantly smaller than the allowable margin for error, I think that would constitute an effective proof of correctness. In the same way that when the audio samples fall way outside of human hearing, you have effectively analog audio – at least in the perceived quality of the end result.

And the good news is that this testing revolution is already well underway. I’ve been working with clients for quite some time, achieving very high integrity software using little more than the same testing tools we’re almost all using, and off-the-shelf hardware solutions available to almost everyone.

 

 

Overcoming Solution Bias

Just a short post this morning about a phenomenon I’ve seen many times in software development – which, for want of a better name, I’m calling solution bias.

It’s the tendency of developers, once they’ve settled on a solution to a problem, to refuse to let go of it – regardless of what facts may come to light that suggest it’s the wrong solution.

I’ve even watched teams argue with their customer to try to get them to change their requirements to fit a solution design the team have come up with. It seems once we have a solution in our heads (or in a Git repository) we can become so invested in it that – to borrow a metaphor – everything looks like a nail.

The damage this can do is obvious. Remember your backlog? That’s a solution design. And once a backlog’s been established, it has a kind of inertia that makes it unlikely to change much. We may fiddle at the edges, but once the blueprints have been drawn up, they don’t change significantly. It’s vanishingly rare to see teams throw their designs away and start afresh, even when it’s screamingly obvious that what they’re building isn’t going to work.

I think this is just human nature: when the facts don’t fit the theory, our inclination is to change the facts and not the theory. That’s why we have the scientific method: because humans are deeply flawed in this kind of way.

In software development, it’s important – if we want to avoid solution bias – to first accept that it exists, and that our approach must actively take steps to counteract it.

Here’s what I’ve seen work:

  • Testable Goals – sounds obvious, but it still amazes me how many teams have no goals they’re working towards other than “deliver on the plan”. A much more objective picture of whether the plan actually works can help enormously, especially when it’s put front-and-centre in all the team’s activities. Try something. Test it against the goal. See if it really works. Adapt if it doesn’t.
  • Multiple Designs – teams get especially invested in a solution design when it’s the only one they’ve got. Early development of candidate solutions should explore multiple design avenues, tested against the customer’s goals, and selected for extinction if they don’t measure up. Evolutionary design requires sufficiently diverse populations of possible solutions.
  • Small, Frequent Releases – a team that’s invested a year in a solution is going to resist that solution being rejected with far more energy than a team who invested a week in it. If we accept that an evolutionary design process is going to have failed experiments, we should seek to keep those experiments short and cheap.
  • Discourage Over-Specialisation – solution architectures can define professional territory. If the best solution is a browser-based application, that can be good news for JavaScript folks, but bad news for C++ developers. I often see teams try to steer the solution in a direction that favours their skill sets over others. This is understandable, of course. But when the solution to sorting a list of surnames is to write them into a database and use SQL because that’s what the developers know how to do, it can lead to some pretty inappropriate architectures. Much better, I’ve found, to invest in bringing teams up to speed on whatever technology will work best. If it needs to be done in JavaScript, give the Java folks a couple of weeks to learn enough JavaScript to make them productive. Don’t put developers in a position where the choice of solution architecture threatens their job.
  • Provide Safety – I can’t help feeling that a good deal of solution bias is the result of fear. Fear of failure.  Fear of blame. Fear of being sidelined. Fear of losing your job. If we accept that the design process is going to involve failed experiments, and engineer the process so that teams fail fast and fail cheaply – with no personal or professional ramifications when they do – then we can get on with the business of trying shit and seeing if it works. I’ve long felt that confidence isn’t being sure you’ll succeed, it’s not being afraid to fail. Reassure teams that failure is part of the process. We expect it. We know that – especially early on in the process of exploring the solution space – candidate solutions will get rejected. Importantly: the solutions get rejected, not the people who designed them.

As we learn from each experiment, we’ll hopefully converge on the likeliest candidate solution, and the whole team will be drawn in to building on that, picking up whatever technical skills are required as they do. At the end, we may not also deliver a good working solution, but a stronger team of people who have grown through this process.

 

What’s The Point of Code Craft?

A conversation I seem to have over and over again – my own personal Groundhog Day – is “What’s the point in code craft if we’re building the wrong thing?”

The implication is that there’s a zero-sum trade-off between customer collaboration – the means by which we figure out what’s needed – and technical discipline. Time spent writing unit tests or refactoring duplication or automating builds is time not spent talking with our customers.

This is predicated on two falsehoods:

  • It takes longer to deliver working code when we apply more technical discipline
  • The fastest way to solve a problem is to talk about it more

All the evidence we have strongly suggests that, in the majority of cases, better quality working software doesn’t take significantly longer to deliver. In fact, studies using large amounts of industry data repeatedly show the inverse. It – on average – takes longer to deliver software when we apply less technical discipline.

Teams who code and fix their software tend to end up having less time for their customers because they’re too busy fixing bugs and because their code is very expensive to change.

Then there’s the question of how to solve our customers’ problems. We can spend endless hours in meetings discussing it, or we could spend a bit of time coming up with a simple idea for a solution, build it quickly and release it straight away for end users to try for real. The feedback we get from people using our software tends to tell us much more, much sooner about what’s really needed.

I’ll take a series of rapid software releases over the equivalent series of requirements meetings any day of the week. I’ve seen this many times in the last three decades. Evolution vs Big Design Up-Front. Rapid iteration vs. Analysis Paralysis.

The real customer collaboration happens out in the field (or in the staging environment), where developers and end users learn from each small, frequent release and feed those lessons back into the next iteration. The map is not the terrain.

Code craft enables high-value customer collaboration by enabling rapid, working releases and by delivering code that’s much easier to change. Far from getting in the way of building the right thing, it is the way.

But…

That’s only if your design process is truly iterative. Teams that are just working through a backlog may see things differently, because they’re not setting out to solve the customer’s problem. They’re setting out to deliver a list of features that some people sitting in a room – these days very probably not the end users and not the developers themselves – guessed might solve the problem (if indeed, the problem was ever discussed).

In that situation, technical discipline won’t help you deliver the right thing. But it could help you deliver the wrong thing sooner for less expense. #FailFast

 

 

Refactoring To Closures in Kotlin & IntelliJ

I spent yesterday morning practicing a refactoring in Kotlin I wanted to potentially demonstrate for a workshop, and after half a dozen unsuccessful attempts, I found a way that seems relatively safe. I thought it might be useful to document it here, both for myself for the future and anyone else who might be interested.

My goal here is to encapsulate the data used in this function for calculating quotes for fitted carpets. The solution I’m thinking of is closures.

How do I get from this to closures being injected into quote() safely? Here’s how I did it in IntelliJ.

  1. Use the Function to Scope… refactoring to extract the body of, say, the roomArea() function into an internal function.

2. As a single step, change the return type of roomArea() to a function signature that matches area(), return a reference to ::area instead of the return value from area(), and change quote() to invoke the returned area() function. (Phew!)

3. Rename roomArea() to room() so it makes more sense.

4. In quote(), highlight the expression room(width, length) and use the Extract Parameter refactoring to have that passed into quote() from the tests.

5. Now we’re going to do something similar for carpetPrice(), with one small difference. Next, as with roomArea(), use the Function to Scope refactoring to extract the body of carpetPrice() into an internal function.

6. Then swap the return value with a reference to the ::price function.

7. Now, this time we want the area to be passed in as a parameter to the price() function. Extract Parameter area from price(), change the signature of the returned function and update quote() to pass it in using the area() function. Again, this must be a single step.

8. Change the Signature of carpetPrice() to remove the redundant area parameter.

9. Rename carpetPrice() to carpet() so it makes more sense.

10. Extract Parameter for the expression carpet(pricePerSqrMtr, roundUp) in quote() called price()

 

If you want to have a crack at this yourself, the source code is at https://github.com/jasongorman/kotlin_simple_design/tree/master/fp_modular_design , along with two more examples (an OO/Java version of this, plus another example that breaks all the rules of both Simple Design and modular design in Kotlin.

 

Action->Object vs. Object->Action

One of the factors that I see programmers new to objects struggling with is our natural tendency to separate agency from data. Things do things to other things. The VCR plays the video. The toaster heats the toast. The driver drives the taxi. Etc.

I think it’s possibly linguistic, too, that we – in most natural languages – but the object after the verb: play(video), toast(bread), drive(taxi).

Thing is, though – this isn’t how object oriented programming works. Objects encapsulate agency with the data it works on, producing video.play(), bread.toast() and taxi.drive().

In OOP, the cat kicks its own arse.

You’re absolutely correct if you’re thinking “That isn’t how we’d say or write it in real life”. It isn’t. I suspect this is one of the reasons some programmers find OOP counter-intuitive – it goes against the way we see the world.

Ironically, Object thinking – while not intuitive in that sense – makes discovery of actions much easier. What can I do with a video? What can I do with bread? And so forth. That’s why Object->Action still dominates UI design. Well, good UI design, anyway. Likewise, developers tend to find it easier to discover functions that can be applied to types when they start with the type.

When I wrote code to tell the story of what happens when a member donates a video to a community library, each line started with a function – well, in Java, a static method, which is effectively the same thing. This is not great OOP. Indeed, it’s not OOP. It’s FP.

And that’s fine. Functional Programming works more the way we say things in the real world. Clean the dishes. Set the timer. Kick the cat. I suspect this is one reason why more and more programmers are draw to the functional paradigm – it works more the way we think, and reads more the way we talk. Or, at least, it can if we’re naming things well.

(There’s a separate discussion about encapsulation in FP. The tendency is for functional programmers not to bother with, which leads to inevitable coupling problems. That’s not because you can’t encapsulate data in FP. It’s just that, as a concept, it’s not been paid much attention.)

If you’re doing OOP – and I still do much of the time, because it’s perfectly workable, thank you very much – then it goes Object->Action. Methods like play(video) and kick(cat) hint at responsibilities being in the wrong place, leading to the lack of encapsulation I witness in so much OO code.

It’s like they say; give a C programmer C++, and they’ll write you a C program with it.

 

 

 

Do Your Unit Tests Know *Too* Much?

A common pitfall of extensive unit testing reported by many teams is that, as the number of tests builds up, changing the implementation under test forces them to rewrite many, many tests. In this scenario, the test code becomes a barrier to change instead of its enabler.

Having witnessed this quite a few times first-hand, I’ve got a few observations I want to share.

First of all, it’s actually usually a problem caused by unmanaged dependencies in our implementation code that causes changes to ripple out to large numbers of tests.

Imagine we wrote a unit test for every public method or function in our implementation, and we decide to change the way one of them works. If that breaks a hundred of our unit tests, my first thought might be “Why is this method/function referenced in a hundred tests?” Did we write a hundred distinct tests for that one thing, or is it used in the set-up of a hundred tests? That would imply that there’s no real separation of concerns in that part of the implementation. A lack of modularity creates all kinds of problems that show up in test code – usually in the set-ups.

The second thing I wanted to mention is duplication in test code. There’s a dangerous idea that’s been gaining in popularity in recent years that we shouldn’t refactor any duplication out of our tests. The thinking is that it can make our tests harder to understand, and there’s some merit to this when it’s done with little thought.

But there are ways to compose tests, reuse set-ups, assertions and whole tests that clearly communicate what’s going on (many well described in the xUnit Test Patterns book). Inexperienced programmers often struggle with code that’s composed out of small, simple parts, and it almost always comes down to poor naming.

Composition – like separation of concerns – needs to be a black box affair. If you have to look inside the box to understand what’s going on, then you have a problem. I’m as guilty of sloppy naming as anyone, and that’s something I’m working on to improve at.

There’s one mechanism in particular for removing duplication from test code that I’ve been championing for 20 years – parameterised tests. When we have multiple examples of the same rule or behaviour covered by multiple tests, it’s a quick win to consolidate those examples into a single data-driven test that exercises all of those cases. This can help us in several ways:

  • Removes duplication
  • Offers an opportunity to document the rule instead of the examples (e.g., fourthFibonacciNumberIsTwo(), sixthFibonacciNumberIsFive() can become fibonacciNumberIsSumOfPreviousTwo() )
  • It opens a door to much more exhaustive testing with surprisingly little extra code

Maybe those 100 tests could be a handful of parameterised tests?

The fourth thing I wanted to talk about is over-reliance on mock objects. Mocks can be a great tool for achieving cohesive, loosely-coupled modules in a Tell, Don’t Ask style of design – I’m under the impression that’s why they were originally invented. But as they give with one hand, they can take away with the other. The irony with mocks is that, while they can lead us to better encapsulation, they do so by exposing the internal interactions of our modules. A little mocking can be powerful design fuel, but pour too much of it on your tests and you’ll burn the house down.

And the final thing I wanted to highlight is the granularity of our tests. Do we write a test for every method of every class? Do we have a corresponding test fixture for every module in the implementation? My experience has been that it’s neither necessary nor desirable to have test code that sticks to your internal design like cheese melted into a radio.

At the other extreme, many teams have written tests that do everything from the outside – e.g., from a public API, or at the controller or service level. I call these “keyhole tests”, because when I’ve worked with them it can feel a little like keyhole surgery. My experience with this style of testing is that, for sure, it can decouple your test code from internal complexity, but at a sometimes heavy price when tests fail and we end up in the debugger trying to figure out where in the complex internal – and non-visible – call stack things went wrong. It’s like when the engineer from the gas company detects a leak somewhere in your house by checking for a drop in pressure at the meter outside. Pinpointing the source of the leak may involve ripping up the floors…

The truth, for me, lies somewhere in between melted cheese and keyhole surgery. What I strive for within a body of code is – how can I put this? – internal APIs. These are interfaces within the implementation that encapsulate the inner complexity of a particular behaviour (e.g, the cluster of classes used to calculate mortgage repayments). They decouple that little cluster from the rest of the implementation, just as their dependencies are decoupled from them in the same S.O.L.I.D. way. And their interfaces tend to be stable, because they’re not about the details. Tests written around those interfaces are less likely to need to change, but also more targeted to a part of the design instead of the whole call stack. So when they fail, it’s easier to pinpoint where things went wrong. (Going back to the gas leak example, imagine having multiple test points throughout the house, so we can at least determine what room the leak is in.)

Even for a monolith, I aim to think of good internal architecture as a network of microservices, each with its own tests. By far the biggest cause of brittle tests that I’ve seen is that your code quite probably isn’t like that. It’s a Big Ball of Mud. That can be exacerbated by leaving all the duplication in the test code, and/or by over-reliance on mock objects, plus a tendency to try to write tests for every method or function of every class.

You want tests that run fast and pinpoint failures, but that also leave enough wiggle room to easily refactor what’s happening behind those internal APIs.

Old Dogs, Old Tricks

Given what I do for a living, I tend to get pretty much daily practice at the fundamentals of code craft like TDD and refactoring. And, after 27 years of professional programming, I would probably still make time every day or once a week to mindfully practice this stuff.

Why? Because – from experience – I believe that these aren’t skills you learn once and then retain indefinitely. On occasions when I’ve been away from code for an extended period, coming back to it I found myself doing sloppy work. I’d forget to do things. I’d fallen out of certain habits.

And I also don’t believe these are binary skills. There’s unit testing, and then there’s unit testing. Not all unit tests are created equal. Some are easier to understand than others. Some are less coupled to the implementation than others. Some run faster than others. Some cover more examples with less code than others. Some teams say “We do unit testing” and their code is still riddled with bugs. Some say “We do unit testing” and their code hasn’t had a bug reported since it went live.

No matter how good we think we are at writing tests or making design decisions or refactoring our code and so on, there’s almost always room for improvement. And as we gain experience in these skills, improving takes more and more work.

As a guitar player of some 30 years, I know that the hour or two of practice I get maybe twice a week these days barely keeps me at the level of technical ability I reached 20 years ago. Most of the visible progress I made happened in the first 5 years of playing. But there’s no top to that hill. If I stop practicing for a while – which I have for the last year or so for various reasons – the ball rolls back down. It takes some ongoing effort just to keep it where it is.

It feels the same with code craft: so many managers who came from programming backgrounds tell me that they lost their skills after a few years away from the code. Indeed, many put themselves on my courses just to try and keep their hand in. Use it or lose it.

An hour or two of mindful practice keeps me where I am as a programmer. It takes an hour or two more to move the ball uphill a bit. This is why I’m such a strong advocate of 20% time for developers. At the very least, half a day a week needs to be set aside to learn, to practice, and to try new things. Or an hour a day. Or however you want to slice it.

Putting aside no time – and I’ve seen this so many times – doesn’t just lead to dev teams who don’t improve, it leads to dev teams who progressively get worse. The code craft muscles atrophy, habits get lost, teams get sloppy, code gets crappy, and delivery slows down to a crawl.

If you prefer a mechanical analogy, think of development as a steamboat. Your goal is to deliver goods from London to New York, sailing every two weeks back and forth across the Atlantic. How much time do you set aside to maintain the engine? None? That’s just not sustainable. How many trips could you expect to make before the engine failed?

The developers on your team are the delivery engine of your software and systems. Their skills – not just technical – need regular maintenance, or the engine breaks down. Set aside some time and put aside some budget to help keep the team in good delivery shape. Not just the occasional training day once or twice a year; a regular investment in building and maintaining their skills. Once a week, or an hour a day – whatever works in your organisation.

(You may be thinking “Hey, we pay them to work. They can learn in their own time.” And that might explain why your efforts to increase diversity on your teams haven’t worked.)

My closing message here is that it’s not just junior programmers who need to work on the fundamentals. We all do.

Here’s a tip from an old dog: a great way to reinforce knowledge and skill is to teach them. My advice for managers is to combine regular practice with mentoring. Your more experienced developers mentoring new hires will find – as I have – that the mentoring process forces us to really think about not just what we’re doing, but why we’re doing it. And it encourages us to be perhaps that little bit more exacting about how we do do it, because we’re trying to set a good example. It’s us as programmers, but in our Sunday best.