The Gaps Between The Gaps – The Future of Software Testing

If you recall your high school maths (yes, with an “s”!), think back to calculus. This hugely important idea is built on something surprisingly simple: smaller and smaller slices.

If we want to roughly estimate determine the area under a curve, we can add up the areas of rectangular slices underneath. If we want to improve the estimate, we make the slices thinner. Make them thinner still, the estimate gets even better. Make them infinitely thin, and we get a completely accurate result. We can actually prove the area under the curve by taking an infinite number of samples.

In computing, I’ve lived through several revolutions where increasing computing power has meant more and more samples can be taken, until the gaps between them are so small that – to all intents and purposes – the end result is analog. Digital Signal Processing, for example, has reached a level of maturity where digital guitar amplifiers and digital synthesizers and digital tape recorders are indistinguishable from the real thing to the human ear. As sample rates and bit depths increased, and number-crunching power skyrocketed while the cost per FLOP plummeted, we eventually arrived at a point where the question of, say, whether to buy a real tube amplifier or use a digitally modeled tube amplifier is largely a matter of personal preference rather than practical difference.

Software testing’s been quietly undergoing the same revolution. When I started out, automated test suites ran overnight on machines that were thousands of times less powerful than my laptop. Today, I see large unit test suites running in minutes or fractions of minutes on hardware that’s way faster and often cheaper.

Factor in the Cloud, and teams now can chuck what would relatively recently have been classed as “supercomputing” power at their test suites for a few extra dollars each time. While Moore’s Law seems to have stalled at the CPU level, the scaling out of computing power shows no signs of slowing down – more and more cores in more and more nodes for less and less money.

I have a client who I worked with to re-engineer a portion of their JUnit test suite for a mission critical application, adding a staggering 2.5 billion additional property-based test cases (with only an additional 1,000 lines of code, I might add). This extended suite – which reuses – but doesn’t replace – their day-to-day suite of tests – runs overnight in about 5 1/2 hours on Cloud-based hardware. (They call it “draining the swamp”).

I can easily imagine that suite running in 5 1/2 minutes in a decade’s time. Or running 250 billion tests overnight.

And it occurred to me that, as the gaps between tests get smaller and smaller, we’re tending towards what is – to all intents and purposes – a kind of proof of correctness for that code. Imagine writing software to guide a probe to the moons of Jupiter. A margin of error of 0.001% in calculations could throw it hundreds of thousands of kilometres off course. How small would the gaps need to be to ensure an accuracy of, say, 1km, or 100m, or 10m? (And yes, I know they can course correct as they get closer, but you catch my drift hopefully.)

When the gaps between the tests are significantly smaller than the allowable margin for error, I think that would constitute an effective proof of correctness. In the same way that when the audio samples fall way outside of human hearing, you have effectively analog audio – at least in the perceived quality of the end result.

And the good news is that this testing revolution is already well underway. I’ve been working with clients for quite some time, achieving very high integrity software using little more than the same testing tools we’re almost all using, and off-the-shelf hardware solutions available to almost everyone.