Time to move std.experimental.checkedint to std.checkedint ?

H. S. Teoh hsteoh at quickfur.ath.cx
Tue Mar 30 00:02:54 UTC 2021


On Mon, Mar 29, 2021 at 10:47:49PM +0000, tsbockman via Digitalmars-d wrote:
> On Monday, 29 March 2021 at 20:00:03 UTC, Walter Bright wrote:
> > It isn't even clear what the behavior on overflows should be. Error?
> > Wraparound? Saturation?
> 
> It only seems unclear because you have accepted the idea that computer
> code "integer" operations may differ from mathematical integer
> operations in arbitrary ways.

The only thing at fault here is the name "integer". `int` in D is
defined to be a 32-bit machine word. The very specification of "32-bit"
already implies modulo 2^32. Meaning, this is arithmetic modulo 2^32,
this is NOT a mathematical infinite-capacity integer. Ditto for the
other built-in integral types. When you typed `int` you already signed
up for all of the "unintuitive" behaviour that has been the standard
behaviour of built-in machine words since the 70's and 80's.  They
*approximate* mathematical integers, but they are certainly NOT the same
thing as mathematical integers, and this is *by definition*.

If you want mathematical integers, you should be using std.bigint or
something similar instead.


> Otherwise, the algorithm is simple:
> 
>     if(floor(mathResult) <= codeResult && codeResult <= ceil(mathResult))
>         return codeResult;
>     else
>         signalErrorSomehow();

Implementing such a scheme would introduce so much overhead that it
would render the `int` type essentially useless for systems programming.
Or for any application where performance is important, for that matter.


> Standard mathematical integer addition does not wrap around or
> saturate.  When someone really wants an operation that wraps around or
> saturates (not just for speed's sake), then that is a different
> operation and should use a different name and/or type(s), to avoid
> sowing confusion and ambiguity throughout the codebase for readers and
> compilers.

The meaning of +, -, *, /, % for built-in machine words has been the one
in modulo 2^n arithmetic since the early days when computers were first
invented.  This isn't going to change anytime soon in a systems
language.  It doesn't matter what you call them; if you don't like the
use of the symbols +, -, *, / for anything other than "standard
mathematical integers", make your own language and call them something
else. But they are the foundational hardware-supported operations upon
which more complex abstractions are built; without them, you wouldn't
even be capable of arithmetic in the first place.

It's unrealistic to impose pure mathematical definitions on
limited-precision hardware numbers.  Sooner or later, any programmer
must come to grips with what's actually implemented in hardware, not
what he imagines some ideal utopian hardware would implement.  It's like
people complaining that IEEE floats are "buggy" or otherwise behave in
strange ways.  That's because they're NOT mathematical real numbers.
But they *are* a useful approximation of mathematical real numbers -- if
used correctly.  That requires learning to work with what's implemented
in the hardware rather than imposing mathematical ideals on an
abstraction that requires laborious (i.e., inefficient) translations to
fit the ugly hardware reality.

If you don't like the "oddness" of hardware-implemented types, there's
always the option of using std.bigint, or software like Mathematica or
similar that frees you from needing to worry about the ugly realities of
the hardware. Just don't expect the same kind of performance you will
get by using the hardware types directly.


> All of the integer behavior that people complain about violates this
> in some way: wrapping overflow, incorrect signed-unsigned comparisons,
> confusing/inconsistent implicit conversion rules, undefined behavior
> of various more obscure operations for certain inputs, etc.
> 
> Mathematical integers are a more familiar, simpler, easier to reason
> about abstraction. When we use this abstraction, we can draw upon our
> understanding and intuition from our school days, use common
> mathematical laws and formulas with confidence, etc. Of course the
> behavior of the computer cannot fully match this infinite abstraction,
> but it could at least tell us when it is unable to do what was asked
> of it, instead of just silently doing something else.

It's easy to invent idealized abstractions that are easy to reason
about, but which require unnatural contortions to implement efficiently
in hardware.  A programming language like D that claims to be a systems
programming language needs to be able to program the hardware directly,
not to impose some ideal abstractions that do not translate nicely to
hardware and that therefore require a lot of complexity on the part of
the compiler to implement, and on top of that incurs poor runtime
performance.

To quote Knuth:

	People who are more than casually interested in computers should
	have at least some idea of what the underlying hardware is like.
	Otherwise the programs they write will be pretty weird. -- D.
	Knuth

Again, if you expect mathematical integers, use std.bigint. Or MathCAD
or similar. The integral types defined in D are raw hardware types of
fixed bit length -- which by definition operate according to modulo 2^n
arithmetic. The "peculiarities" of the hardware types are inevitable,
and I seriously doubt this is going to change anytime in the foreseeable
future.  By using `int` instead of `BigInt`, the programmer has already
implicitly accepted the "weird" hardware behaviour, and must be prepared
to deal with the consequences.  Just as when you use `float` or `double`
you already signed up for IEEE semantics, like it or not. (I don't, but
I also recognize that it's unrealistic to expect the hardware type to
match up 100% with the mathematical ideal.) If you don't like that, use
one of the real arithmetic libraries out there that let you work with
"true" mathematical reals that aren't subject to the quirks of IEEE
floating-point numbers. Just don't expect anything that will be
competitive performance-wise.

Like I said, the only real flaw here is the choice of the name `int` for
a hardware type that's clearly NOT an unbounded mathemetical integer.
It's too late to rename it now, but basically it should be thought of as
`intMod32bit` rather than `integerInTheMathematicalSense`. Once you
mentally translate `int` into "32-bit 2's-complement binary word in a
hardware register", everything else naturally follows.


T

-- 
They pretend to pay us, and we pretend to work. -- Russian saying


More information about the Digitalmars-d mailing list