assert(0) behavior

Mon Aug 3 16:57:36 PDT 2015

On 8/3/15 6:54 PM, Walter Bright wrote:
> On 8/3/2015 8:50 AM, Steven Schveighoffer wrote:
>> 1. They aren't removed, they are replaced with a nearly useless segfault.
>
> Not useless at all:
>
> 1. The program does not continue running after it has failed. (Please,
> let's not restart that debate.

You can run some code that is fresh, i.e. don't *continue* running 
failing code, but it's ok, for example, to run a signal handler, or call 
a pure function. My thought is that you should print the file location 
and exit (perhaps with segfault).

> 2. Running it under a debugger, the location of the fault will be
> identified.

Yeah, the real problem is when you are not running under a debugger. If 
you have an intermittent bug, or one that is in the field, a stack trace 
of the problem is worth 100x more than a segfault and trying to talk a 
customer through setting up a debugger, or even setting up their system 
to do a core dump for you to debug. They aren't always interested in 
that enough to care.

But also, a segfault has a specific meaning. It means, you tried to 
access memory you don't have access to. This clearly is not that, and it 
sends the wrong message -- memory corruption. This is not memory 
corruption, it's a program error.

>> 2. If we are going to put something in there instead of "assert", why
>> not just
>> throw an error?
>
> Because they are expensive.

They are expensive when? 2 microseconds before the program ends? At that 
point, I'd rather you spend as much resources telling me what went wrong 
as possible. More info the better.

Now, if it's expensive to include the throw vs. not including it (I 
don't know), then that's a fair point. If it means an extra few 
instructions to set up the throw (which isn't run at all when the 
assert(0) line isn't run), that's not convincing.

> To keep asserts in all their glory in released code, do not use the
> -release switch.

OK, this brings up another debate. The thing that triggered all this is 
an issue with core.time, see issue 
https://issues.dlang.org/show_bug.cgi?id=14863

Essentially, we wrote code to get all the clock information at startup 
on a posix system that supports clock_gettime, which is immutable and 
can be read easily at startup. However, how we handled the case where a 
clock could not be fetched is to assert(0, "don't support clock " ~ 
clockname).

The clear expectation was that the message will be printed (along with 
file/line number), and we can go fix the issue.

On Linux 2.6.32 - a supported LTS release I guess - one of these clocks 
is not supported. This means running a simple "hello world" program 
crashes at startup in a segfault, not a nice informative message.

So what is the right answer here? We want an assert to trigger for this, 
but the only assert that stays in is assert(0). However, that's useless 
if all someone sees is "segfuault". "some message" never gets printed, 
because druntime is compiled in release mode. I'm actually quite 
thrilled that someone figured this all out -- one person analyzed his 
core dump to narrow down the function, and happened to include his linux 
kernel version, and another person had the arcane knowledge that some 
clock wasn't supported in that version.

Is there a better mechanism we should be using in druntime that is 
clearly going to be compiled in release mode? Should we just throw an 
assert error directly? Clearly the code is not "corrupted" at this 
point, it's just an environmental issue. But we don't want to continue 
execution. What is the right call?

At the very least, assert(0, "message") should be a compiler error, the 
message is unused information.

-Steve