Showing posts with label threads. Show all posts
Showing posts with label threads. Show all posts

Friday, 20 June 2008

Unit testing threads is hard (part 4)

In this next part of my occasional series on threading, we'll look at some of the classic problems that I've encountered when testing threaded code.

First, here are some facts:
  • Concurrent threads of code will execute differently on multiple CPU systems than on single CPUs with OS simulated multi-threading. Inter-thread synchronisation primitives will behave subtly differently causing the code to behave in different ways. This is a great way to expose subtle bugs when you least expect it.
  • CPU load (i.e. the number of processes running at one time, and the amount of work they are doing) can dramatically effect the performance of your process. What usually takes 10 microseconds could, sometimes, take 10 seconds, or even 100 seconds.
  • Without a rich set of thread primitives, and a good understanding of how to use them correctly, you will never be able to craft good unit tests for threaded code.
So, based on these, here are some classic mistakes in threaded unit tests:
  • Problem: Testing regularly occurring events that happen every N seconds by waiting Nx10 seconds and checking 10 events were fired. Result: A test that often gets the wrong number of events. Why: Asking a thread to wait for a period of time does not guarantee that it will wake up after that time exactly. You're at the mercy of the OS as to when your thread wakes up. If the computer is heavily loaded it might be a long time after you expected. More events may have fired by then.
  • Problem: Testing for a result that should be returned asynchronously by waiting for the event to occur, or for 'N' seconds to expire, because the result will "never" take N seconds to come back, right? Result: A test that fails whenever the test computer is heavily loaded. Why: If the computer is heavily loaded then its entirely possible that the background thread will take a long time to calculate the answer. Longer than N. The timeout will one day time out. I've seen this happen frequently on a machine with two or more builds running simultaneously.
  • Problem: Thread creation is not adequately managed in the code. There are races in construction/destruction of threaded objects. Result: Unit tests occasionally crash. Why: You can't safely construct test scaffolding around the threaded object unless you can control when and how the thread runs.
  • Problem: Tests that don't honour thread requirements. Calls that should only be made on a certain background thread are made on the foreground thread in the test because it should work OK. Result: Occasional crashes. Why: You did the Wrong Thing. If you shoot yourself in the foot, you should expect to hobble.
  • Problem: When objects are deleted in the unit test, background threads still exist that reference these objects. Result: Bang. Why: If the background thread fires the event after callback objects are destructed then you're effectively dereferencing a dangling pointer. Things go horribly wrong. This often is a subtle problem: frequently the unit tests run successfully, and the cleanup is fine. But if your test fails, the unit test framework signals this by throwing an exception. The exception causes objects to be destroyed as the stack unwinds. If the background thread does not get stopped as the stack unwinds, one unit test failure may manifest as a bizarre runtime crash. Ouch.
  • Problem: Testing threaded components as if they were not threaded. We abstract the threadedness out to make the tests easy and desterministic. It's what I'm preaching about in these posts! Result: The tests are easy to write, and never go wrong. Hurrah! However. the software component is still buggy as you have not tested it in the way it is normally run. Why: Go figure.
So we can see that when unit tests for threaded code are written badly, the tests can:
  • Behave erratically, sometimes working, sometimes not working.
  • Occasionally crash (segfault on Linux, for example)
  • Not actually test the threaded code in the way it is used in Real Life

Tuesday, 3 June 2008

Unit testing threads is hard (part 2)

In a previous post, I started a constructive moan about how difficult it is to write unit tests for threaded code. This is a large and complex topic, and I'll going to tackle it here in a few separate postings. All going well, some answers should emerge by the end of the process.

Why is it hard to test threaded code?

Let's start by working out why testing threaded code is hard. In the previous post, I stated that it is hard to unit test code that specifically does any of these:
  • Spawns a new thread
  • Waits for a thread to finish
  • Synchronises with another thread
I'll add one more item to that list:
  • Performs events after a period of time
It's hard to tests threaded code in general, but these are specifically complex points in even the simplest threaded code. (No doubt there are other particular pain areas; I'm sure that this list will grow as my pain threshold grows.)

Aside: What is a unit test? It's important to understand what a unit test is; some developers get this subtly wrong. There are many forms of tests, most of which can be automated and run during the build processes as an instant validation of the code under construction. However, not all of them are unit tests.

Unit tests exercise individual sections (or
units) of code. To do this the code's connections with the outside world are replaced with stub or mock components that represent "real" components, but that are drivable from within the test harness. The unit test therefore only tests the small section of code in a controlled environment. It does not test the code's integration into the entire software system. Clearly, unit tests cannot therefore interface over network connections, or with databases (those connections would be components themselves with stub-implementations for testing purposes).
We use stub- and/or mock implementations of external interfaces in our unit tests to ensure that the unit operates in a deterministic environment. And we avoid access to potentially non-deterministic entities (databases, networks, filesystems) to ensure that our tests are simple, valid, reliable, and repeatable.

Then along come threads and rain on that little reliable, repeatable parade.

We most often use threads to split up tasks that can be run concurrently, in order to increase program performance. This is great, as long as the threads do not need to interact. Very scalable systems can be built this way. But if the threads must interact, very non-scalable systems can be the result. Unfortunately, good thread practice is way outside the scope of this article.

When you have multiple threads of control running in parallel, it becomes harder to reason about the correctness of your application. This code:

void a() { a1; a2; a3; a4; }
void b() { b1; b2; b3; b4; }
Clearly runs operations in the order a1, a2, a3, a4, b1, b2, b3, b4. If a() and b() were launched concurrently in separate threads, then you might see them run in order a1, b1, a2, b3, a3, b3, or perhaps a1, a2, b1, b2, a3, a4, b3, b4, or any other order. In fact, the only thing you can (probably) guarantee is that a1 will happen before a2, a2, before a3 and so on... (and that will only hold if your optimising compiler hasn't taken it upon itself to reorder your code statements to generate "faster" code).

By spawning a thread, we specifically release some control over the execution of our program, which inevitably makes it much, much harder to unit test. We can no longer run the "unit" in a carefully controlled environment.

Interacting threads intertwine in a way that is largely random, one run of a unit test for a threaded component may be very different from the next. This is because thread behaviour changes considerably with:
  • the physical attributes of the machine (e.g. real parallelism from multiple CPUs or multiple cores in one CPU vs simulated parallelism from OS-level threading)
  • the load of the machine (when the code has less time to run in because other applications/processes are hogging the CPU the thread behaviour can become very lumpy and unpredictable - sometimes it is an interesting test for your threaded app to run it on a loaded machine)
  • the speed of the CPU(s) in the machine, and of the memory bus/network/disks
  • the nature of the operation running on a background thread (is it CPU intensive, heavily contending with the "main" thread of control for CPU cycles, or is it an IO-bound batch-process, mostly blocking on data throughput)
  • the way the wind is blowing (who knows which thread unblocks and gets a chance to run this time the test executes?)
Threads bugs are hard to find

Apparently, breaking up is hard to do. I'd agree: breaking up thread behaviour so it's testable is practically impossible. Threads interact in a very un-repeatable way, and problems stemming from bad thread interactions are remarkably hard to find.

Most of the time your code would operate perfectly, but once in a blue moon you get a data-race, a deadlock, or a timing error. In fact, it's practically impossible to write a unit test that proves that none of those conditions can occur.

When testing single-threaded code we must consider the tests' code coverage; whether every line of code and every condition has been covered. In a multi-threaded environment the problem explodes. We must consider coverage in terms of every possible interaction of every line of code, The threaded environment is akin to shuffling a deck of cards before running the threads - each time you deal out the program instructions you'll get a different set of instructions. How can you be sure that each of those sets of instructions results in the same - or at least in a correct - result?

What have we learnt so far?

In later postings we'll look at how to write tests for threaded code, but until then, here are two very helpful tips for writing, and testing threaded code:
  • Avoid writing code that spawns another thread unless you absolutely have to
  • Avoid threaded code that has to do something externally visible to other threads other than at the beginning/end of the thread's execution
  • Do not tie the thread-spawning aspect of the code from the code that runs on that thread. For example, ensure that the algorithm is neatly encapsulated and testable in isolation in a single-threaded environment. Then, if necessary, write a threaded component that employs that algorithm on it's thread.

More will follow...

Tuesday, 27 May 2008

Unit testing threads is hard

Unsurprising fact that people don't talk about that much #10045: Unit testing threads is hard.

Unit tests are good. Threads are good. At least, these days many people are telling us that they are good. Threads are the inevitable future of your programming career, if you still want to eat lunch.

But unit tests and threads are not a good combination. Not even slightly. In fact, they're a downright pain in the rear end. I've been bitten by this so many times recently that my rear end is raw.

If you haven't had the misfortune to encounter this particular brand of coding horror, consider the following simple C++ class. It creates a background thread which runs an arbitrary functor ("what") every "period" milliseconds:

class PeriodicCaller
PeriodicCaller(const some_functor &what, unsigned period);

... whatever ...

It's got a nice, clean interface. It seems simple enough. It'll be really easy to use in your codebase. But how are we going to test it? Any ideas?

First problem: there are lifetime issues in that interface. "what" has to remain valid for as long as the background thread runs. That's hard to do, as you don't know exactly how long the thread runs for. Fortunately, you can solve this problem by insisting classes that spawn a background thread (or threads) guarantee the thread has stopped by the time the destructor completes. This moves the issue up a level: "what" has to remain valid for as long as the PeriodicCaller object exists. This is far, far easier to reason about (and not an unusual problem for C++ object construction).

OK, one down. What's next?

Unit tests stub out all peripheral code sections (using mocks or stubs) to provide a precisely controlled interface to test your unit of code. In this respect, unit tests rock - to write them you must ensure that your class connects only with a finite set of precisely defined interfaces. That's good design. Unit tests help to ensure your class design is sound. Cool.

In the tests, we use these interfaces to isolate the code under test from unexpected change, and to arrange sets of specific operational conditions for it the unit to run in.

Great stuff.

Threads add a new level of unexpected interactions and interactions. What tests should we write for the PeriodicCaller? Some of them might include:
  1. It calls the functor after "period" has elapsed
  2. It calls it N times after N "period"s have passed
  3. Once the PeriodicCaller is deleted, the functor is never called again (just how long should you wait to be sure it is never called again?)

These tests will be slow (they have to wait for specific "period"s - or arbitrary lengths of time - to elapse). It requires the test machine to not be too loaded (or the background thread might not get a chance to execute often enough, causing the test 2 to fail). You could only write these tests reliably with some control over the background thread.

In a unit test for PeriodicCaller, you can't control the background thread that is spawned. Even if there was a "for testing" accessor to the background thread object (returning a boost::thread, or whatever) how would you use that object to drive the thread to make unit testing predictable?

By definition, threads interweave in arbitrary ways. You simply can't guarantee you're covering all possible interactions of those threads in a unit test. Subtle thread interaction problems are a surefire recipe for unit tests that work most of the time, and collapse every so often. Hard to find, hard to fix.

You could try to inject sanity with an API over the background thread that allowed you to flatten the thready behaviour out and call it sequentially on the test thread. That might help avoid thread disasters in the unit test, but the test would not be reflecting the reality of threaded operation. All the problems would be masked, not removed.

It is specifically hard to unit test code that does any of these:
  • Spawns a new thread
  • Waits for a thread to finish
  • Synchronises with another thread

So what does this teach us? Should we avoid using threads? Well, no, clearly we can't do that - threads are useful. But since it's very, very hard to prove that our threaded code is correct, we should avoid writing any more threaded code than strictly necessary. And then we need to create interfaces to threaded components that are testable.

How can we craft testable thread interfaces? There are a few potential solutions. Before I consider writing about them, what do you think? How have you solved these problems?