In-Process, Cross-Process & Full-Stack Tests

Time for a quick clarification. (If you’ve been on a Codemanship course, you may have already heard this.)

Ask twelve developers for their definitions of “unit test”, “integration test” and “system test” and you’ll likely get twelve different answers. I feel – especially for training purposes – that I need to clarify what I mean by them.

Unit Test – when I say “unit test”, what I mean is a test that executes without any external dependencies. I can go further to qualify what I mean by an “external dependency”; that’s when code is executed in a separate memory address space – a separate process – to the test code. This is typically for speed, so we can test our logic quickly without hitting databases or file systems or web services and so on. It also helps separate concerns more cleanly, as “unit testable” code has to usually be designed in such a way to make external dependencies easily swappable (e.g., by dependency injection).

Integration Test – a test that executes code running in separate memory address spaces (e.g., separate Windows services, or SQL running on a DBMS). It’s increasingly common to find developers reusing their unit tests with different set-ups (replace a database stub with the real database connection, for example). The logic of the test is the same, but the set-up involves external dependencies. This allows us to test that our core logic still works when it’s interacting with external processes. (i.e., it tests the contracts at both sides).

System Test – executes code end-to-end, across the entire tech stack, including all external dependencies like databases, files, web services, the OS and even the hardware. (I’ve seen more than one C++ app blow a fuse because it was deployed on hardware that the code wasn’t compiled to run on, for example.) This allows us to test our system’s configuration, and ideally should be done in an environment as close to the real things as possible.

It might be clearer if I called them In-Process, Cross-Process and Full-Stack tests.

 

The Gaps Between The Gaps – The Future of Software Testing

If you recall your high school maths (yes, with an “s”!), think back to calculus. This hugely important idea is built on something surprisingly simple: smaller and smaller slices.

If we want to roughly estimate determine the area under a curve, we can add up the areas of rectangular slices underneath. If we want to improve the estimate, we make the slices thinner. Make them thinner still, the estimate gets even better. Make them infinitely thin, and we get a completely accurate result. We can actually prove the area under the curve by taking an infinite number of samples.

In computing, I’ve lived through several revolutions where increasing computing power has meant more and more samples can be taken, until the gaps between them are so small that – to all intents and purposes – the end result is analog. Digital Signal Processing, for example, has reached a level of maturity where digital guitar amplifiers and digital synthesizers and digital tape recorders are indistinguishable from the real thing to the human ear. As sample rates and bit depths increased, and number-crunching power skyrocketed while the cost per FLOP plummeted, we eventually arrived at a point where the question of, say, whether to buy a real tube amplifier or use a digitally modeled tube amplifier is largely a matter of personal preference rather than practical difference.

Software testing’s been quietly undergoing the same revolution. When I started out, automated test suites ran overnight on machines that were thousands of times less powerful than my laptop. Today, I see large unit test suites running in minutes or fractions of minutes on hardware that’s way faster and often cheaper.

Factor in the Cloud, and teams now can chuck what would relatively recently have been classed as “supercomputing” power at their test suites for a few extra dollars each time. While Moore’s Law seems to have stalled at the CPU level, the scaling out of computing power shows no signs of slowing down – more and more cores in more and more nodes for less and less money.

I have a client who I worked with to re-engineer a portion of their JUnit test suite for a mission critical application, adding a staggering 2.5 billion additional property-based test cases (with only an additional 1,000 lines of code, I might add). This extended suite – which reuses – but doesn’t replace – their day-to-day suite of tests – runs overnight in about 5 1/2 hours on Cloud-based hardware. (They call it “draining the swamp”).

I can easily imagine that suite running in 5 1/2 minutes in a decade’s time. Or running 250 billion tests overnight.

And it occurred to me that, as the gaps between tests get smaller and smaller, we’re tending towards what is – to all intents and purposes – a kind of proof of correctness for that code. Imagine writing software to guide a probe to the moons of Jupiter. A margin of error of 0.001% in calculations could throw it hundreds of thousands of kilometres off course. How small would the gaps need to be to ensure an accuracy of, say, 1km, or 100m, or 10m? (And yes, I know they can course correct as they get closer, but you catch my drift hopefully.)

When the gaps between the tests are significantly smaller than the allowable margin for error, I think that would constitute an effective proof of correctness. In the same way that when the audio samples fall way outside of human hearing, you have effectively analog audio – at least in the perceived quality of the end result.

And the good news is that this testing revolution is already well underway. I’ve been working with clients for quite some time, achieving very high integrity software using little more than the same testing tools we’re almost all using, and off-the-shelf hardware solutions available to almost everyone.