My advice to developers has always been to avoid writing multi-threaded code wherever possible. Concurrency is notoriously difficult to get right, and the safest multi-threaded code is single-threaded.
I’ve been eating my own dog food on that, and it occurred to me a couple of weeks back that I’ve written very little multi-threaded code myself in recent years.
But there is still some multi-threaded code being written in languages like Java, C# and Python for high-performance solutions that are targeted at multi-CPU platforms. And over the last few months I’ve been helping a client with just such a solution for scaling up property-based tests to run on multi-core Cloud platforms.
One of the issues we faced is how do we test our multi-threaded code?
There’s a practical issue of executing multiple threads in a single-threaded unit test – particularly synchronizing so that we can assert an outcome after all threads have completed their work.
And also, thread scheduling is out of our control and – on Windows and similar platforms – unpredictable and non-repeatable. A race condition or a deadlock might not show up every time we run a test.
Over the last couple of weeks, I’ve been playing with a rough prototype to try and answer these questions. It uses a simple producer-consumer example – loading parcels into a loading bay and then taking them off the loading bay and loading them into a truck – to illustrate the challenges of both safe multi-threading and multi-threaded testing.
When I test multi-threaded code, I’m interested in two properties:
- Safety – what should always be true while the code is executing?
- Liveness – what should eventually be achieved?
To test safety, an assertion needs to be checked throughout execution. To test liveness, an assertion needs to be checked after execution.
After writing code to do this, I refactored the useful parts into custom assertion methods, always() and eventually().
always() takes a list of Runnables (Java’s equivalent of functions that accept no parameters and have no return value) that will concurrently perform the work we want to test. It will submit each Runnable to a fixed thread pool a specified number of times (thread count) and then wait for all the threads in the pool to terminate.
On a single separate thread, a boolean function (in Java, Supplier<Boolean>) is evaluated multiple times throughout execution of the threads under test. This terminates after the worker threads have terminated or timed out. If, at any point in execution, the assertion evaluates to false, the test will fail.
In use, it looks like this:
bayLoader and truckLoader are objects that implement the Runnable interface. They will be submitted to the thread pool 2x each (because we’ve specified a thread count of 2 as our third parameter), so there will be 4 worker threads in total, accessing the same data defined in our set-up.
The bayLoader threads will load parcels on to the loading bay, which holds a maximum of 50 parcels, until all the parcels have been loaded.
The truckLoader threads will unload parcels from the loading bay and load them on to the truck, until the entire manifest of parcels has been loaded.
A safety property of this concurrent logic is that there should never be more than 50 parcels in the loading bay at any time, and that’s what our always assertion checks multiple times during execution:
() -> bay.getParcelCount() <= 50
When I run this test once, it passes. Running it multiple times, it still passes. But just because a test’s passing, that doesn’t mean our code really works. Let’s deliberately introduce an error into our test assertion to make sure it fails.
() -> bay.getParcelCount() <= 49
The first time I run this, the test fails. And the second and third times. But on the fourth run, the test passes. This is the thread determinism problem; we have no control over when our assertion is checked during execution. Sometimes it catches a safety error. Sometimes the error slips through the gaps and the test misses it.
The good news is that if it catches an error just once, that proves we have an error in our concurrent logic. Of course, if we catch no errors, that doesn’t prove they’re not there. (Absence of evidence isn’t evidence of absence.)
What if we run the test 100 times? Rather than sit there clicking the “run” button over and over, I can rig this test up as a JUnitParams parameterised test and feed it 100 test cases. (If you don’t have a parameterised testing feature, you can just loop 100 times).
When I run this, it fails 91/100 times. Changing the assertion back, it passes 100/100. So I can have 100% confidence the code satisfies this safety property? Not so fast. 100 test runs leaves plenty of gaps. Maybe I can be 99% confident with 100 test runs. How about we do 1000 test runs? Again, they all pass. So that gives me maybe 99.9% confidence. 10,000 could give me 99.99% confidence. And so on.
Thankfully, after a little performance engineering, 10,000 tests run in less than 30 seconds. All green.
The eventually() assertion method works along similar lines, except that it only evaluates its assertion once at the end (and therefore runs significantly faster):
If my code encounters a deadlock, the worker threads will time out after 1000 milliseconds. If a race condition occurs and our data becomes corrupted, the assertion will fail. Running this 10,000 times shows all the tests are green. I’m 99.99% confident my concurrent logic works.
Finally, speaking of deadlocks and race conditions, how might we avoid those?
A race condition can occur when two or more threads attempt to access the same data at the same time. In particular, we run the risk of a pre-condition paradox when bay loaders attempt to load parcels on to the loading bay, and truck loaders attempt to unload parcels from the bay.
The bay loader can only load a parcel if the bay is not full. A truck loader can only unload a parcel if the bay is not empty.
When I run my tests with this implementation of LoadingBay, 12% of them fail their liveness and safety checks because there’s a non-zero possibility of, say, a bay loader attempting to load a parcel after we’ve checked the bay isn’t full and another bay loader loading the 50th parcel in between that check and loading. Similarly, a truck loader might check that the bay isn’t empty, but before they unload the last parcel another truck loader thread takes it.
To avoid this situation, we need to ensure that pre-condition checks and actions are executed in a single, atomic sequence with no chance of other threads interfering.
When I test this implementation, tests still fail. The problem is that some parcels aren’t getting loaded on to the bay (though the bay loader thinks they have been), and some parcels aren’t getting unloaded, either. Our truck loader may be putting null parcels on the truck.
When loading, the bay must not be full. When unloading, it must not be empty. So our worker threads need to wait until their pre-conditions are satisfied. Now, Java threading gives us wait() methods, but they only wait for a specified amount of time. We need to wait until a condition becomes true.
This passes all 10,000 safety and liveness test runs, so I have 99.99% confidence we don’t have a race condition. But…
What happens when all the parcels have been loaded on to the truck? There’s a risk of deadlock if the bay remains permanently empty.
So we also need a way to stop the loading and unloading process once all the manifest has been loaded.
I’ve dealt with this in a similar way to waiting for pre-conditions to be satisfied, except this time we repeat loading and unloading until the parcels are all on the truck.
You may have already spotted the patterns in these two forms of loops:
- Execute this action when this condition is true
- Execute this action until this condition is true
Let’s refactor to encapsulate those nasty while loops.
There. That looks a lot better, doesn’t it? All nice and functional.
I tend to find conditional synchronisation easier to wrap my head around than all the wait() and notify() and callbacks malarky, and experiences so far with this approach suggest I tend to produce more reliable multi-threaded code.
My explorations continue, but I thought there might be folk out there who’d find it useful to see where I’ve got so far with this.
You can see the current source code at https://github.com/jasongorman/syncloop (it’s just a proof of concept, so provided with no warranty or support, of course.)