Pete Goodliffe: Unit testing threads is hard (part 2)

In a previous post, I started a constructive moan about how difficult it is to write unit tests for threaded code. This is a large and complex topic, and I'll going to tackle it here in a few separate postings. All going well, some answers should emerge by the end of the process.

Why is it hard to test threaded code?

Let's start by working out why testing threaded code is hard. In the previous post, I stated that it is hard to unit test code that specifically does any of these:

Spawns a new thread
Waits for a thread to finish
Synchronises with another thread

I'll add one more item to that list:

Performs events after a period of time

It's hard to tests threaded code in general, but these are specifically complex points in even the simplest threaded code. (No doubt there are other particular pain areas; I'm sure that this list will grow as my pain threshold grows.)

Aside: What is a unit test? It's important to understand what a unit test is; some developers get this subtly wrong. There are many forms of tests, most of which can be automated and run during the build processes as an instant validation of the code under construction. However, not all of them are unit tests.

Unit tests exercise individual sections (or units) of code. To do this the code's connections with the outside world are replaced with stub or mock components that represent "real" components, but that are drivable from within the test harness. The unit test therefore only tests the small section of code in a controlled environment. It does not test the code's integration into the entire software system. Clearly, unit tests cannot therefore interface over network connections, or with databases (those connections would be components themselves with stub-implementations for testing purposes).

We use stub- and/or mock implementations of external interfaces in our unit tests to ensure that the unit operates in a deterministic environment. And we avoid access to potentially non-deterministic entities (databases, networks, filesystems) to ensure that our tests are simple, valid, reliable, and repeatable.

Then along come threads and rain on that little reliable, repeatable parade.

We most often use threads to split up tasks that can be run concurrently, in order to increase program performance. This is great, as long as the threads do not need to interact. Very scalable systems can be built this way. But if the threads must interact, very non-scalable systems can be the result. Unfortunately, good thread practice is way outside the scope of this article.

When you have multiple threads of control running in parallel, it becomes harder to reason about the correctness of your application. This code:

void a() { a1; a2; a3; a4; }
void b() { b1; b2; b3; b4; }
a();
b();

Clearly runs operations in the order a1, a2, a3, a4, b1, b2, b3, b4. If a() and b() were launched concurrently in separate threads, then you might see them run in order a1, b1, a2, b3, a3, b3, or perhaps a1, a2, b1, b2, a3, a4, b3, b4, or any other order. In fact, the only thing you can (probably) guarantee is that a1 will happen before a2, a2, before a3 and so on... (and that will only hold if your optimising compiler hasn't taken it upon itself to reorder your code statements to generate "faster" code).

By spawning a thread, we specifically release some control over the execution of our program, which inevitably makes it much, much harder to unit test. We can no longer run the "unit" in a carefully controlled environment.

Interacting threads intertwine in a way that is largely random, one run of a unit test for a threaded component may be very different from the next. This is because thread behaviour changes considerably with:

the physical attributes of the machine (e.g. real parallelism from multiple CPUs or multiple cores in one CPU vs simulated parallelism from OS-level threading)
the load of the machine (when the code has less time to run in because other applications/processes are hogging the CPU the thread behaviour can become very lumpy and unpredictable - sometimes it is an interesting test for your threaded app to run it on a loaded machine)
the speed of the CPU(s) in the machine, and of the memory bus/network/disks
the nature of the operation running on a background thread (is it CPU intensive, heavily contending with the "main" thread of control for CPU cycles, or is it an IO-bound batch-process, mostly blocking on data throughput)
the way the wind is blowing (who knows which thread unblocks and gets a chance to run this time the test executes?)

Threads bugs are hard to find

Apparently, breaking up is hard to do. I'd agree: breaking up thread behaviour so it's testable is practically impossible. Threads interact in a very un-repeatable way, and problems stemming from bad thread interactions are remarkably hard to find.

Most of the time your code would operate perfectly, but once in a blue moon you get a data-race, a deadlock, or a timing error. In fact, it's practically impossible to write a unit test that proves that none of those conditions can occur.

When testing single-threaded code we must consider the tests' code coverage; whether every line of code and every condition has been covered. In a multi-threaded environment the problem explodes. We must consider coverage in terms of every possible interaction of every line of code, The threaded environment is akin to shuffling a deck of cards before running the threads - each time you deal out the program instructions you'll get a different set of instructions. How can you be sure that each of those sets of instructions results in the same - or at least in a correct - result?

What have we learnt so far?

In later postings we'll look at how to write tests for threaded code, but until then, here are two very helpful tips for writing, and testing threaded code:

Avoid writing code that spawns another thread unless you absolutely have to

Avoid threaded code that has to do something externally visible to other threads other than at the beginning/end of the thread's execution
Do not tie the thread-spawning aspect of the code from the code that runs on that thread. For example, ensure that the algorithm is neatly encapsulated and testable in isolation in a single-threaded environment. Then, if necessary, write a threaded component that employs that algorithm on it's thread.

More will follow...

Tuesday, 3 June 2008

Unit testing threads is hard (part 2)

No comments: