[phobos] enforce() vs. assert() for range primitives

Sat Aug 21 20:54:52 PDT 2010

  God damn it, for some strange reason whenever I reply to Andrei's 
posts (and only to Andrei's and not anyone else's) using Thunderbird 
(but not the gmail web interface) the results only go to Andrei.  Here 
's my reply.  Andrei, I apologize for yet another double send.

  On 8/21/2010 11:18 PM, Andrei Alexandrescu wrote:
> On 08/18/2010 05:49 PM, David Simcha wrote:
>> While I was debugging std.range and std.algorithm I noticed that there's
>> tons of inconsistency about when enforce() vs. assert() is used for
>> checking input to range primitives. I think this needs to be made
>> consistent, and there are good arguments for both.
>
> As mentioned, I've studied this exchange, thought a lot about it, and 
> also discussed it with Walter.
>
> Clearly right now Phobos is inconsistent. Walter and other used 
> asserts and contracts and other means etc. My approach since I started 
> has been to sprinkle enforce() everywhere and worry about it later, 
> and that later has come.
>
> One problem with that approach is that, as Lars mentioned, there are 
> 3-4 checks on a simple operation due to the use of higher-order ranges 
> that wrap other ranges, like Stride!Chain!Whatever. This issue is not 
> difficult to tackle: we can introduce in Phobos the notion that a 
> range wrapping another range defers the checking to the wrapped range. 
> That way the wrapper only uses assert() and essentially trusts that 
> the other range protects its own integrity. Consider as an example 
> Take's implementation of popFront:
>
>     void popFront()
>     {
>         enforce(_maxAvailable > 0,
>             "Attempting to popFront() past the end of a "
>             ~ Take.stringof);
>         original.popFront();
>         --_maxAvailable;
>     }
>
> The enforce is redundant - we could decide that Take defers checking 
> to original. So this code should be allowed:
>
>     void popFront()
>     {
>         assert(_maxAvailable > 0, ...);
>         original.popFront();
>         --_maxAvailable;
>     }
>
> But (very careful) Take should still protect its _own_ integrity, 
> assuming original also does. So the following code would be incorrect:
>
>     void popFront()
>     {
>         assert(_maxAvailable > 0, ...);
>         --_maxAvailable;
>         original.popFront();
>     }
>
> If original.popFront throws, _maxAvailable has already been 
> decremented so Take is in an inconsistent state. So rule #1 could be 
> something like:
>
> Rule #1: A higher-order entity in Phobos can generally assume that the 
> lower-order entities it builds on protect their own integrity, and 
> protect its own under that premise.
>
> I am sure there would be fuzzy corners to this rule, but by and large 
> it shold cut down on a lot of enforce calls. (We could and should 
> replace them with asserts).

Sounds mostly good.  One thing that still bugs me, though, is the idea 
of using enforce() in ranges like Iota that are supposed to be super 
cheap and don't risk memory corruption even if they're in some crazy 
invalid state.  Here's a benchmark:

import std.range, std.perf, std.stdio;

void main() {
     auto pc = new PerformanceCounter;
     pc.start;

     uint num;

     foreach(elem; iota(100_000_000)) {
         num += elem;
     }

     pc.stop;
     writeln(pc.milliseconds, " milliseconds");

     // Produce an externally visible effect to keep everything from being
     // optimized away.
     writeln("Ignore this:  ", num);
}

Iota currently doesn't do enforce(!empty) in front and popFront.  The 
benchmark takes ~264 milliseconds on my computer.  If I add 
enforce(!empty) in front and popFront, this goes up to 977 
milliseconds.  If instead of enforce() I use if(empty) throw new 
Exception("") it still goes up to 683 milliseconds.  If I replace Iota 
with the following (virtual function using) range, I get 681 milliseconds.

class Foo {
     uint num;

     @property bool empty() {
         return num >= 100_000_000;
     }

     @property uint front() {
         return num;
     }

     void popFront() {
         num++;
     }
}

All numbers are for -O -inline -release builds.

>
> The second rule concerns safety. To my dismay, this program still 
> compiles and runs "successfully" with -release:
>
> import std.stdio;
> void main(string[] args)
> {
>     writeln(args[10]);
> }
>
> It should throw, and it should only run with -noboundscheck enabled. 
> That flag was defined by Walter under my pressure, and I think it's 
> misnamed - it should really be -safe.
>
> Safety is one matter I am extremely rigid about. There is a point 
> where we must draw a line in the sand regarding checks, and safety is 
> it. Since there is a simple and clear definition of it, we can't 
> redefine it as we'd prefer. So I'd say:
>
> Rule #2: All checks protecting memory safety must be done with 
> enforce() or rely on built-in array indexing (which in turn obeys 
> -noboundscheck).

I tend to agree.  Memory corruption issues are exceptionally hard to 
debug and most things will use array indexing in their implementation 
rather than raw pointers anyhow, so the checks can be disabled if 
necessary.  Does dereferencing a null pointer count as a memory safety 
issue?

>
> Finally, we have several places where we call APIs such as the I/O 
> code etc. For those I think it's fine to mandate use of enforce() for 
> several reasons, a simple one being that such calls would not be 
> slowed down considerably by the extra test. So:
>
> Rule #3: All external API calls must be protected by enforce().
>
> Of course there are many case-by-case judgments, but I think having 
> such guidelines in mind should be helpful. Thoughts?

Agreed.  Once you're bringing in some expensive external API, small 
efficiencies aren't very important.