Documented unittests & code coverage

Thu Aug 4 12:04:19 PDT 2016

On Thursday, 4 August 2016 at 10:24:39 UTC, Walter Bright wrote:
> On 8/4/2016 1:13 AM, Atila Neves wrote:
>> On Thursday, 28 July 2016 at 23:14:42 UTC, Walter Bright wrote:
>>> On 7/28/2016 3:15 AM, Johannes Pfau wrote:
>>>> And as a philosophical question: Is code coverage in 
>>>> unittests even a
>>>> meaningful measurement?
>>>
>>> Yes. I've read all the arguments against code coverage 
>>> testing. But in my
>>> usage of it for 30 years, it has been a dramatic and 
>>> unqualified success in
>>> improving the reliability of shipping code.
>>
>> Have you read this?
>>
>> http://www.linozemtseva.com/research/2014/icse/coverage/
>
> I've seen the reddit discussion of it. I don't really 
> understand from reading the paper how they arrived at their 
> test suites, but I suspect that may have a lot to do with the 
> poor correlations they produced.

I think I read the paper around a year ago, my memory is fuzzy. 
 From what I remember they analysed existing test suites. What I 
do remember is having the impression that it was done well.

> Unittests have uncovered lots of bugs for me, and code that was 
> unittested had far, far fewer bugs showing up after release. 
> <snip>

No argument there, as far as I'm concerned, unit tests = good 
thing (TM).

It think measuring unit test code coverage is a good idea, but 
only so it can be looked at to find lines that really should have 
been covered but weren't. What I take issue with is two things:

1. Code coverage metric targets (especially if the target is 
100%).  This leads to inane behaviours such as "testing" a print 
function (which itself was only used in testing) to meet the 
target. It's busywork that accomplishes nothing.

2. Using the code coverage numbers as a measure of unit test 
quality. This was always obviously wrong to me, I was glad that 
the research I linked to confirmed my opinion, and as far as I 
know (I'd be glad to be proven wrong), nobody else has published 
anything to convince me otherwise.

Code coverage, as a measure of test quality, is fundamentally 
broken. It measures coupling between the production code and the 
tests, which is never a good idea. Consider:

int div(int i, int j) { return i + j; }
unittest { div(3, 2); }

100% coverage, utterly wrong. Fine, no asserts is "cheating":

int div(int i, int j) { return i / j; }
unittest { assert(div(4, 2) == 2); }

100% coverage. No check for division by 0. Oops.

This is obviously a silly example, but the main idea is: coverage 
doesn't measure the quality of the sentinel values. Bad tests 
serve only as sanity tests, and the only way I've seen so far to 
make sure the tests themselves are good is mutant testing.

Atila