Friday, 20 June 2008

Unit testing threads is hard (part 4)

In this next part of my occasional series on threading, we'll look at some of the classic problems that I've encountered when testing threaded code.

First, here are some facts:
  • Concurrent threads of code will execute differently on multiple CPU systems than on single CPUs with OS simulated multi-threading. Inter-thread synchronisation primitives will behave subtly differently causing the code to behave in different ways. This is a great way to expose subtle bugs when you least expect it.
  • CPU load (i.e. the number of processes running at one time, and the amount of work they are doing) can dramatically effect the performance of your process. What usually takes 10 microseconds could, sometimes, take 10 seconds, or even 100 seconds.
  • Without a rich set of thread primitives, and a good understanding of how to use them correctly, you will never be able to craft good unit tests for threaded code.
So, based on these, here are some classic mistakes in threaded unit tests:
  • Problem: Testing regularly occurring events that happen every N seconds by waiting Nx10 seconds and checking 10 events were fired. Result: A test that often gets the wrong number of events. Why: Asking a thread to wait for a period of time does not guarantee that it will wake up after that time exactly. You're at the mercy of the OS as to when your thread wakes up. If the computer is heavily loaded it might be a long time after you expected. More events may have fired by then.
  • Problem: Testing for a result that should be returned asynchronously by waiting for the event to occur, or for 'N' seconds to expire, because the result will "never" take N seconds to come back, right? Result: A test that fails whenever the test computer is heavily loaded. Why: If the computer is heavily loaded then its entirely possible that the background thread will take a long time to calculate the answer. Longer than N. The timeout will one day time out. I've seen this happen frequently on a machine with two or more builds running simultaneously.
  • Problem: Thread creation is not adequately managed in the code. There are races in construction/destruction of threaded objects. Result: Unit tests occasionally crash. Why: You can't safely construct test scaffolding around the threaded object unless you can control when and how the thread runs.
  • Problem: Tests that don't honour thread requirements. Calls that should only be made on a certain background thread are made on the foreground thread in the test because it should work OK. Result: Occasional crashes. Why: You did the Wrong Thing. If you shoot yourself in the foot, you should expect to hobble.
  • Problem: When objects are deleted in the unit test, background threads still exist that reference these objects. Result: Bang. Why: If the background thread fires the event after callback objects are destructed then you're effectively dereferencing a dangling pointer. Things go horribly wrong. This often is a subtle problem: frequently the unit tests run successfully, and the cleanup is fine. But if your test fails, the unit test framework signals this by throwing an exception. The exception causes objects to be destroyed as the stack unwinds. If the background thread does not get stopped as the stack unwinds, one unit test failure may manifest as a bizarre runtime crash. Ouch.
  • Problem: Testing threaded components as if they were not threaded. We abstract the threadedness out to make the tests easy and desterministic. It's what I'm preaching about in these posts! Result: The tests are easy to write, and never go wrong. Hurrah! However. the software component is still buggy as you have not tested it in the way it is normally run. Why: Go figure.
So we can see that when unit tests for threaded code are written badly, the tests can:
  • Behave erratically, sometimes working, sometimes not working.
  • Occasionally crash (segfault on Linux, for example)
  • Not actually test the threaded code in the way it is used in Real Life


Tim said...

Loving the series, but what happened to part 3?

Anonymous said...

Hi Pete,

These posts on testing threaded code have been nice (though I too am also wondering where part 3 is!). Concrete advice on this stuff is hard to come by.

I saw elsewhere on your blog that you use boost. You might be interested to know I've reimplemented the boost threads API in terms of "user-space threads", meaning Fibers on Windows and the functions found in ucontext.h on UNIX. The point being that you can swap in the alternative library/headers to get extra guarantees about when context switches will happen, making it easier to test logic and algorithmic correctness in isolation from synchronization issues.

I was wondering what someone with your experience would make of it.

Currently the code only mirrors the boost 1.34 library, but I hope to update to match the changes made in boost 1.35 soon.