Selective Unit Testing – Costs and Benefits

I’ve been writing unit tests regularly for 2-3 years now, and doing full-blown test-driven development (TDD) full time for about the last year. Throughout this whole time, I keep noticing the same thing over and over:

Published Nov 4, 2009

For certain types of code, unit testing works brilliantly, flows naturally, and significantly enhances the quality of the resulting code.
But for other types of code, writing unit tests consumes a huge amount of effort, doesn’t meaningfully aid design or reduce defects at all, and makes the codebase harder to work with by being a barrier to refactoring or enhancement.

I guess that shouldn’t surprise anyone, because well-respected techniques in all fields – e.g., techniques for winning debates, for dieting, for making money – tend to be strong in some scenarios but weaker in others. At one point I thought this observation was somehow controversial, but every other developer with whom I’ve discussed it already considered it self-evident that unit testing is sometimes very effective and sometimes just isn’t.

So why am I writing this? Two reasons:

Because I think we can go further and understand the underlying forces that make unit testing worthwhile (or not) for any given unit of code.
Because a minority of developers still believes that they should aim for 100% unit test coverage, and that if they don’t follow the TDD code-first process, then they’ve failed as a professional. I’m not satisfied with that view.

How much does that code benefit from unit testing?

I could list a dozen great benefits that come from unit testing, but the list really boils down to two things: Unit tests help you to design some code while you’re writing it, and also help to *verify *that your implementation actually does what you intended it to do.

That sounds great – and often is – but it’s still legal to question the whole idea. Consider: Why do you actually want a secondary system to help design or verify your code? Doesn’t your source code itself express the design and behaviour of your solution? If unit tests are a repetition of the same design, in what sense do they demonstrate the correctness of that design? What about DRY?

In my experience, if your code is not obvious at a single glance – so working out its exact behaviour would take time and careful thought – then additional design and verification assistance (e.g., through unit testing) is essential to be sure that all your cases are handled properly. For example, if you’re coding a system of business rules or parsing a complex hierarchical string format, there will be too many possible code paths to check at a glance. In scenarios like these, unit tests are extremely helpful and valuable.

Conversely, if your code is basically obvious – so at a glance you can see exactly what it does – then additional design and verification (e.g., through unit testing) yields extremely minimal benefit, if any. For example, if you’re writing a method that gets the current date and the amount of free disk space, and then passes them both to a logging service, the source code listing says everything you need to say about that design. What would a unit test add here, given that you’d be mocking out the clock and disk space provider anyway?

In summary, I’m arguing that the benefit of unit testing is correlated with the non-obviousness of the code under test.

How much does it cost to unit test that code?

A few obvious costs spring to mind:

The time spent actually writing unit tests in the first place
The time spent fixing and updating unit tests, either because you’ve deliberately refactored interfaces between code units or the responsibilities distributed among them, or because tests broke unexpectedly when you made other changes
The tendency – either by you or your colleagues – to avoid improving and refactoring application code out of fear that it may break a load of unit tests and hence incur extra work

As many have written, you can reduce the cost of maintaining unit tests by following certain best practises. After doing that, the remaining cost may be tiny or it may still be significant.

In my experience, the remaining total cost of unit testing a certain code unit is very closely correlated with its number of dependencies on other code units. Why might that be?

Writing tests in the first place: If a method has no dependencies and merely acts as a simple function of a single parameter, unit tests are just a list of examples of input points mapping to output points. But if it takes four parameters and reads or writes five other services (abstract or otherwise) through class properties, you’ve got a lot of mocking to do and API usages to figure out. But this is a trivial cost compared to…

Maintenance: It’s well established that the more direct dependencies a code unit has, the more frequently it gets forced to change. (In fact, this is basically how “instability” is defined by standard code metrics tools.) You can easily see why: On any given day, each of those dependencies has some probability of changing its API or behaviour, forcing you to update your code and its unit tests.

Note that these issues apply equally even if you’re using an IoC container and coding purely to interfaces.

In summary, I’m arguing that the cost of unit testing is correlated with the number of dependencies (concrete or interface) that a code unit has.

Visualising the Costs and Benefits

OK, let’s put those two ideas on a single diagram:

This deliberately simplistic diagram illustrates four broad categories of code:

Complex code with few dependencies (top left). Typically this means self-contained algorithms for business rules or for things like sorting or parsing data. This cost-benefit argument goes strongly in favour of unit testing this code, because it’s cheap to do and highly beneficial.
Trivial code with many dependencies (bottom right). I’ve labelled this quadrant “coordinators”, because these code units tend to glue together and orchestrate interactions between other code units. This cost-benefit argument is in favour of not unit testing this code: it’s expensive to do and yields little practical benefit. Your time is finite; spend it more effectively elsewhere.
Complex code with many dependencies (top right). This code is very expensive to write with unit tests, but too risky to write without. Usually you can sidestep this dilemma by decomposing the code into two parts: the complex logic (algorithm) and the bit that interacts with many dependencies (coordinator).
Trival code with few dependencies (bottom left). We needn’t worry about this code. In cost-benefit terms, it doesn’t matter whether you unit test it or not.

Let’s get practical. What about my ASP.NET MVC web application?

In ASP.NET MVC, the most general-purpose place to put your application logic is in your controllers. Unfortunately if you keep dumping stuff there, they’ll become unwieldy – amassing complex logic but being very expensive to unit test because of all the dependencies and overlapping concerns. This is known a
s the fat controller anti-pattern.

To avoid this, you can factor out independent bits of application logic into service classes and business logic into domain model classes. You can also split out cross-cutting concerns into ASP.NET MVC filters, custom model binders, and custom action results.

The more you do this, the better, clearer, and simpler your controllers become. Ultimately, the better you structure your controllers, the more they end up being trivial coordinators that manage interactions between other code units while having very little or no logic of their own. In other words, the better you structure your controllers, the more they move down towards the bottom-right corner of the preceding diagram, and the less it makes sense to unit test them.

My controllers aim to be just a meeting place for all the different APIs of my many services. This controller code is trivially readable, and links together multiple dependencies. In cost-benefit terms, I find I’m more productive not unit testing these and instead using the time saved to keep refactoring and writing integration tests.

But, um, surely we don’t want less automated testing?

In case anyone is misinterpreting me, I’m not saying you shouldn’t do unit testing or TDD. What I am saying is:

I personally find I can deliver more business value per hour worked over the long term by using TDD only on the kinds of code for which it’s strong. This means nontrivial code with few dependencies (algorithms or self-contained business logic).
I sometimes deliberately decompose code into algorithms and coordinators, so that the former can be most clearly unit tested, and the latter most clearly expressed as C# without unit tests. The most obvious example is factoring application logic out of ASP.NET MVC controllers.
I’m increasingly becoming aware of the practical business value achieved through integration testing. For a web application, that usually means using some kind of browser automation tool such as Selenium RC or WatiN. This doesn’t replace unit testing, but I’d rather spend an hour writing integration tests to prove the whole system works together in some scenario, than spend that hour writing unit tests associated with trivial code whose behaviour I can know at a glance and which is likely to change each time some underlying API changes anyway.

This is just a description of my experience so far. It’s OK if yours is different.

Footnote: Source Code as Design

To expand on the question, “Doesn’t your source code itself already express the design and behaviour of the solution?”, consider the point made by Jim Reeves in his now-much-cited 1992 article for C++ Journal entitled What is Software Design?

The final goal of any engineering activity is some type of [design] documentation … After reviewing the software development life cycle as I understood it, I concluded that the only software documentation that actually seems to satisfy the criteria of an engineering design is the source code listings.

His argument is that our source code is not the software itself (for the actual software is an executable binary file of some sort); our source code is the design for that software. A programming language succeeds or fails to the extent that it lets us succinctly and accurately describe our intended software design to the compiler. So, reader, can and should unit tests replace source code as the truest expression of our designs?