Signed word lengths and indexes

Tue Jun 15 02:38:42 PDT 2010

Walter Bright:

>D's safe mode, integer overflow *cannot* lead to memory corruption. So when you say something is "unsafe", I think it's reasonable to ask what you mean by it.<

I meant "more numerical safe". That is it helps avoid part of the integral-derived bugs.

>We've tried very hard to not have such things in D. The idea is that code that looks the same either behaves the same or issues an error. There's no way to make your proposal pass this requirement.<

I see. We can drop this, then.

>We can argue forever with how significant it is, I don't assign nearly as much to it as you do.<

I see. If you try solving many Project Euler problems you can see how common those bugs are :-) For other kind of code they are probably less common.

>If you use the D loop abstractions, you should never have these issues with it.<

In D I am probably using higher loop abstractions than the ones you use normally, but now and then I have those bugs anyway. Talking the length of an array is necessary now and then even if you use loop abstractions (and higher-order functions as maps, filters, etc).

>Here's what the wikipedia said about it.
>>"In Python, a number that becomes too large for an integer seamlessly becomes a long.[1] And in Python 3.0, integers and arbitrary sized longs are unified."<<

This is exactly the same things I have said :-)

>(Just switching to long isn't good enough - what happens when long overflows?<

Maybe this is where you didn't understand the situation: Python 2.x "long" means multi-precision integral numbers. In my example the number was 1001 decimal digits long.

>I generally don't like solution like this because it makes tripping the bug so rare that it can lurk for years. I prefer to flush bugs out in the open early.)<

In Python 2.x this causes zero bugs because those "longs" are multi-precision.

>3x is a BIG deal. If you're running a major site, this means you only need 1/3 of the hardware, and 1/3 of the electric bill. If you're running a program that takes all day, now you can run it 3 times that day.<

This point of the discussion is probably too much indefinite to say something useful about it. I can answer you that in critical spots of the program it is probably easy enough to replace multiprecision ints with fixnums, and this can make the whole program no significantly slower than C code. And in some places the compiler can infer where fixnums are enough and use them automatically. In the end regarding this point mine is mostly a gut feeling derived from many years of usage of multiprecision numbers: I think that in a nearly-system language as D well implemented multi-precision numbers (with the option to use fixnums in critical spots) can lead to efficient enough programs. I have programmed in a compiled CLisp a bit, and the integer value performance is not so bad. I can of course be wrong, but only an actual test can show it :-) Maybe someday I will try it and do some benchmarks. Current BigInt of D need the small-number optimization before a test can be tried (that is to avoid heap allocation when the bignumber fits in 32 or 64 bits), and the compiler is not smart enough to replace bigints with ints where bigints are not necessary. In the meantime I have done several benchmarks in C# with runtime ingegral overflow enabled or disabled, and I have seen that the performance with those enabled is only a bit less, not significantly so (I have seen the same thing in Delphi years ago).

>That idea has a lot of merit for 64 bit systems. But there are two problems with it: 1. D source code is supposed to be portable between 32 and 64 bit systems. This would fail miserably if the sign of things silently change in the process.<

Then we can use a signed word on 32 bit systems too.
Or if you don't like that, to represent lengths/indexes we can use 64 bit signed values on 32 bit systems too.

>2. For an operating system kernel's memory management logic, it still would make sense to represent the address space as a flat range from 0..n, not one that's split in the middle, half of which is accessed with negative offsets. D is supposed to support OS development.<

I am not expert enough about this to understand well the downsides of signed numbers used in this. But I can say that D is already not the best language to develop non-toy operating systems.
And even if someone writes a serious operating system with D, this is an uncommon application of D language, where probably 95% of other people write other kinds of programs where unsigned integers everywhere are not the best thing.
And the uncommon people that want to write an OS or device driver with D can use signed words. Such uncommon people can even design and use their own arrays with unsigned-word lengths/indexes :-)
Designing D to appeal to a very uncommon kind of power-users that need to write an operating system with D doesn't look like a good design choice.

If this whole thread goes nowhere then later I can even close bug 3843, because there's little point in keeping it open.

Bye,
bearophile