Tuesday 27 May 2008

Unit testing threads is hard

Unsurprising fact that people don't talk about that much #10045: Unit testing threads is hard.

Unit tests are good. Threads are good. At least, these days many people are telling us that they are good. Threads are the inevitable future of your programming career, if you still want to eat lunch.

But unit tests and threads are not a good combination. Not even slightly. In fact, they're a downright pain in the rear end. I've been bitten by this so many times recently that my rear end is raw.

If you haven't had the misfortune to encounter this particular brand of coding horror, consider the following simple C++ class. It creates a background thread which runs an arbitrary functor ("what") every "period" milliseconds:


class PeriodicCaller
{
public:
PeriodicCaller(const some_functor &what, unsigned period);
~PeriodicCaller();

private:
... whatever ...
};



It's got a nice, clean interface. It seems simple enough. It'll be really easy to use in your codebase. But how are we going to test it? Any ideas?

First problem: there are lifetime issues in that interface. "what" has to remain valid for as long as the background thread runs. That's hard to do, as you don't know exactly how long the thread runs for. Fortunately, you can solve this problem by insisting classes that spawn a background thread (or threads) guarantee the thread has stopped by the time the destructor completes. This moves the issue up a level: "what" has to remain valid for as long as the PeriodicCaller object exists. This is far, far easier to reason about (and not an unusual problem for C++ object construction).

OK, one down. What's next?

Unit tests stub out all peripheral code sections (using mocks or stubs) to provide a precisely controlled interface to test your unit of code. In this respect, unit tests rock - to write them you must ensure that your class connects only with a finite set of precisely defined interfaces. That's good design. Unit tests help to ensure your class design is sound. Cool.

In the tests, we use these interfaces to isolate the code under test from unexpected change, and to arrange sets of specific operational conditions for it the unit to run in.

Great stuff.

Threads add a new level of unexpected interactions and interactions. What tests should we write for the PeriodicCaller? Some of them might include:
  1. It calls the functor after "period" has elapsed
  2. It calls it N times after N "period"s have passed
  3. Once the PeriodicCaller is deleted, the functor is never called again (just how long should you wait to be sure it is never called again?)

These tests will be slow (they have to wait for specific "period"s - or arbitrary lengths of time - to elapse). It requires the test machine to not be too loaded (or the background thread might not get a chance to execute often enough, causing the test 2 to fail). You could only write these tests reliably with some control over the background thread.

In a unit test for PeriodicCaller, you can't control the background thread that is spawned. Even if there was a "for testing" accessor to the background thread object (returning a boost::thread, or whatever) how would you use that object to drive the thread to make unit testing predictable?

By definition, threads interweave in arbitrary ways. You simply can't guarantee you're covering all possible interactions of those threads in a unit test. Subtle thread interaction problems are a surefire recipe for unit tests that work most of the time, and collapse every so often. Hard to find, hard to fix.

You could try to inject sanity with an API over the background thread that allowed you to flatten the thready behaviour out and call it sequentially on the test thread. That might help avoid thread disasters in the unit test, but the test would not be reflecting the reality of threaded operation. All the problems would be masked, not removed.

It is specifically hard to unit test code that does any of these:
  • Spawns a new thread
  • Waits for a thread to finish
  • Synchronises with another thread

So what does this teach us? Should we avoid using threads? Well, no, clearly we can't do that - threads are useful. But since it's very, very hard to prove that our threaded code is correct, we should avoid writing any more threaded code than strictly necessary. And then we need to create interfaces to threaded components that are testable.

How can we craft testable thread interfaces? There are a few potential solutions. Before I consider writing about them, what do you think? How have you solved these problems?

2 comments:

Anonymous said...

Why was there no follow on bankruptcy then? The bailout of AIG FP went to (wow power leveling) hedge funds that bound credit swaps on Lehman failing or others betting on rating (wow power leveling) declines. AIG has drained over 100 billion from the government. Which had to go to (wow power leveling) those who bet on failures and downgrades. Many of whom (power leveling)were hedge funds. I-banks that had offsetting swaps needed the money from the AIG bailout or they would have been caught. Its an (wow powerleveling) insiders game and it takes just a little bit too much time for most people to think (wow gold) through where the AIG 100 billion bailout money went to, hedge funds and players, many of whom hire from the top ranks of DOJ, Fed, Treasury, CAOBO

Graeme said...

Well said Pete!!

I am currently trying to implement a unit test framework for an embedded system running multipel threads and have run into the issue of being able to test a thread which responds to events.

The thread is run using for(;;) and as such I have no way of exiting the function once I have called it. I am strugglign to figure out a way of doing this without introducing threading to the framework but was trying to keep it as simple as possible. any ideas gladly received.

G