Unit tests are good. Threads are good. At least, these days many people are telling us that they are good. Threads are the inevitable future of your programming career, if you still want to eat lunch.
But unit tests and threads are not a good combination. Not even slightly. In fact, they're a downright pain in the rear end. I've been bitten by this so many times recently that my rear end is raw.
If you haven't had the misfortune to encounter this particular brand of coding horror, consider the following simple C++ class. It creates a background thread which runs an arbitrary functor ("what") every "period" milliseconds:
class PeriodicCaller
{
public:
PeriodicCaller(const some_functor &what, unsigned period);
~PeriodicCaller();
private:
... whatever ...
};
It's got a nice, clean interface. It seems simple enough. It'll be really easy to use in your codebase. But how are we going to test it? Any ideas?
First problem: there are lifetime issues in that interface. "what" has to remain valid for as long as the background thread runs. That's hard to do, as you don't know exactly how long the thread runs for. Fortunately, you can solve this problem by insisting classes that spawn a background thread (or threads) guarantee the thread has stopped by the time the destructor completes. This moves the issue up a level: "what" has to remain valid for as long as the PeriodicCaller object exists. This is far, far easier to reason about (and not an unusual problem for C++ object construction).
OK, one down. What's next?
Unit tests stub out all peripheral code sections (using mocks or stubs) to provide a precisely controlled interface to test your unit of code. In this respect, unit tests rock - to write them you must ensure that your class connects only with a finite set of precisely defined interfaces. That's good design. Unit tests help to ensure your class design is sound. Cool.
In the tests, we use these interfaces to isolate the code under test from unexpected change, and to arrange sets of specific operational conditions for it the unit to run in.
Great stuff.
Threads add a new level of unexpected interactions and interactions. What tests should we write for the PeriodicCaller? Some of them might include:
- It calls the functor after "period" has elapsed
- It calls it N times after N "period"s have passed
- Once the PeriodicCaller is deleted, the functor is never called again (just how long should you wait to be sure it is never called again?)
These tests will be slow (they have to wait for specific "period"s - or arbitrary lengths of time - to elapse). It requires the test machine to not be too loaded (or the background thread might not get a chance to execute often enough, causing the test 2 to fail). You could only write these tests reliably with some control over the background thread.
In a unit test for PeriodicCaller, you can't control the background thread that is spawned. Even if there was a "for testing" accessor to the background thread object (returning a boost::thread, or whatever) how would you use that object to drive the thread to make unit testing predictable?
By definition, threads interweave in arbitrary ways. You simply can't guarantee you're covering all possible interactions of those threads in a unit test. Subtle thread interaction problems are a surefire recipe for unit tests that work most of the time, and collapse every so often. Hard to find, hard to fix.
You could try to inject sanity with an API over the background thread that allowed you to flatten the thready behaviour out and call it sequentially on the test thread. That might help avoid thread disasters in the unit test, but the test would not be reflecting the reality of threaded operation. All the problems would be masked, not removed.
It is specifically hard to unit test code that does any of these:
- Spawns a new thread
- Waits for a thread to finish
- Synchronises with another thread
So what does this teach us? Should we avoid using threads? Well, no, clearly we can't do that - threads are useful. But since it's very, very hard to prove that our threaded code is correct, we should avoid writing any more threaded code than strictly necessary. And then we need to create interfaces to threaded components that are testable.
How can we craft testable thread interfaces? There are a few potential solutions. Before I consider writing about them, what do you think? How have you solved these problems?