Movement against float.init being nan

Tue Aug 23 01:16:41 UTC 2022

On 8/22/22 8:46 PM, Walter Bright wrote:
> On 8/22/2022 7:06 AM, Steven Schveighoffer wrote:
>>> It's not as convenient as a segfault but at some point, the error 
>>> becomes obvious.
>> Does it? And "at some point" it becomes obvious no matter what the 
>> error you made.
> 
> The point is, since 0.0 is a common value for a floating point value to 
> be, just when does it become obvious that it is wrong? Are you really 
> going to notice if your computation is 5% off? Isn't it a *lot* more 
> obvious that it is wrong if it is NaN?

This is a highly dependent situation. It could be 0, which is 100% off. 
It could be 5%. It could be 0.0001% off, which might actually not be a 
problem that is noticed.

So I have an actual true story. One of the calculation spreadsheets we 
use had a fudge factor that someone inserted. Essentially, they added a 
value of 0.35 to a cost field (which is in the tens of thousands of 
dollars range). Given this is Excel we have no way of knowing who did it 
or when (probably to make it match some utility-provided tool value). 
But we didn't catch it for months. Only until we had a job where the 
cost was 100% covered by the utility, and the cost came out to $0.35, we 
caught it.

This happened because it added 0.35 to 0 (the default value of an empty 
cell). If instead it printed NaN I would have ignored that price, and 
just put 0 in *at a later calculation* to prevent errors showing up in 
the final proposal. Then I would have missed the fudge factor someone 
sneaked in.

The situations are completely dependent on the situation for *finding a 
problem*, for *diagnosing a problem* and for *fixing the problem*. It's 
impossible to predict how people will behave or how they will write code 
to cope with the situation they have.

I think it's a wash in using either 0 or NaN for a default value when 
that value is incorrect. But I think in terms of *frequency*, a default 
value of 0 for a float that isn't explicitly initialized is 99% of the 
time correct, which means you will have *less of these problems to find*.

> 
> 
>>> I would start there to inspect variables, identify the NaNs.
>> What if you can't? What if it only happens randomly, and you don't 
>> just happen to have a debugger attached?
>> I'm not saying it's easier with 0, but just not any different.
> 
> 0.0 is hardly a rare value, even in correct calculations. NaN is always 
> wrong.

It's not rare because it's a very very common initial value.

> 
> 
>>> Then I would trace them in a debugger and go up the call chain until 
>>> I find the location where it became NaN. Then I would identify the 
>>> source which introduced the NaN and trace that back until I found its 
>>> origin.
>> If you have a chance to use a debugger and it happens at that time.
> 
> 0 initialization wouldn't make it better.

I will concede that if you have a debugger attached and can watch the 
things change in real time, seeing NaN show up can give you a better 
clue as to where the problem came from.

> 
> 
>>> The advantage I see in NaN is that it's always (instead of only 
>>> almost always) immediately obvious that it's wrong whereas 0.0 can be 
>>> valid or invalid so you need to figure out which one it is which 
>>> requires an extra step.
>> It might be noticed that it's NaN. It also might not. It depends on 
>> how it's used.
> 
> NaN propagates. 0.0 does not.

Someone has to look at it, to "obviously" see that it's wrong.

> 
> 
>> Either way, you need to find the source, and NaN doesn't help unless 
>> you want to start either instrumenting *all* code (possibly including 
>> code you don't control), or use a debugger (which isn't always possible).
> 
> Such a situation is objectively worse with 0.0. Is instrumenting all the 
> code to detect 0.0 going to work? Nope, too many false positives, as 0.0 
> is a common value for floating point numbers.

Either way, it's a mess. Better to just logically trace it based on 
where it's assigned from, instead of instrumenting, and trying to find 
NaNs in random places.

> 
> 
>> Can we have some kind of linting system that identifies NaNs that are 
>> used?
> 
> I have no objection to a linter that flags default initializations of 
> floating point values. It shouldn't be part of the D compiler, though.
> 

Something with semantic capabilities has to be used to prove it's not 
set before being used. Is there anything besides the compiler front end 
that can do this?

-Steve