Signed word lengths and indexes

Mon Jun 14 18:49:27 PDT 2010

bearophile wrote:
> Walter Bright:
>> As for "unsafe", I think you need to clarify this, as D is not memory
>> unsafe despite the existence of integer over/under flows.<
> 
> Modern languages must understand that there are other forms of safety beside
> memory safety. Integer overflows and signed-unsigned conversion-derived bugs
> can cause disasters as well.
> 
> In current D language the usage of unsigned numbers is a safety hazard. So
> far nothing I have seen written by you or other people has shown that this is
> false.

D's safe mode, integer overflow *cannot* lead to memory corruption. So when you 
say something is "unsafe", I think it's reasonable to ask what you mean by it.

For example, if you define "safe" as "guaranteed to not have bugs", then you're 
requiring that there be a proof of correctness for all programs in D.

>> Actually, I think they make a lot of sense, and D's improvement on them
>> that only disallows conversions that lose bits based on range propagation
>> is far more sensible than C#'s overzealous restrictions.<
> 
> 1) I'd like D to use signed words to represent lengths and array indexes.

This would lead to silent breakage of code transferred from C and C++. We've 
tried very hard to not have such things in D. The idea is that code that looks 
the same either behaves the same or issues an error. There's no way to make your 
proposal pass this requirement.

> We are going to 64 bit systems where 63 bits can be enough for lenghts. If
> arrays of 4 billion items are seen as important on 32 bit systems too, then
> use a long :-) 2) I don't like D to silently gulp down expressions that mix
> signed and unsigned integers and spit out wrong results when the integers
> were negative.

That idea has a lot of merit for 64 bit systems. But there are two problems with it:

1. D source code is supposed to be portable between 32 and 64 bit systems. This 
would fail miserably if the sign of things silently change in the process.

2. For an operating system kernel's memory management logic, it still would make 
sense to represent the address space as a flat range from 0..n, not one that's 
split in the middle, half of which is accessed with negative offsets. D is 
supposed to support OS development.

>> I have a hard time believing that Python and Ruby are more productive
>> primarily because they do not have an unsigned type.<
> 
> Python is very productive (for small or medium sized programs! On large
> programs Python is less good) because of a quite long list of factors. My
> experience with D and Python (and several other languages) has shown me that
> Python not using fixnums is one of the factors that help productivity. It's
> surely not the only factor, and I agree with you that it's not the most
> important, but it's surely one of the significant factors and it can't be
> ignored.

We can argue forever with how significant it is, I don't assign nearly as much 
to it as you do.

> Python integers don't overflow,
> this at the same time allows you to safe
> brain time and brain power thinking about possible overflows and the code to
> avoid their risk, and makes coding more relaxed. And if you try to write 50
> Project Euler programs in Python and D you will surely see how many bugs the
> Python code has avoided you compared to D. Finding and fixing such bugs in D
> code requires lot of time that you save in Python.

This is where we differ. I very rarely have a bug due to overflow or 
signed/unsigned differences. If you use the D loop abstractions, you should 
never have these issues with it.

>> Python did not add overflow protection until 3.0, so it's very hard to say 
>> this crippled productivity in early versions.
>> http://www.python.org/dev/peps/pep-0237/
> 
> You are wrong. Python 2.x dynamically switches to larger integer types when
> overflow happens. This is done transparently and avoids bugs and keeps
> programs more efficient. This is on Python V.2.6.5 but similar things happen
> in much older versions of Python:
> 
>>>> a = 2 type(a)
> <type 'int'>
>>>> a += 10 ** 1000 len(str(a))
> 1001
>>>> type(a)
> <type 'long'>

Here's what the wikipedia said about it.

"In Python, a number that becomes too large for an integer seamlessly becomes a 
long.[1] And in Python 3.0, integers and arbitrary sized longs are unified."

-- http://en.wikipedia.org/wiki/Integer_overflow

(Just switching to long isn't good enough - what happens when long overflows? I 
generally don't like solution like this because it makes tripping the bug so 
rare that it can lurk for years. I prefer to flush bugs out in the open early.)

>> This is completely impractical in a systems language, and is one reason why
>> Ruby & Python are execrably slow compared to C-style languages.
> 
> Lisp languages can be only a 1.0-3.0 times slower can C despite using mostly
> multi-precision numbers. So I don't think well implemented multi-precision
> numbers are so bad in a very fast language.

3x is a BIG deal. If you're running a major site, this means you only need 1/3 
of the hardware, and 1/3 of the electric bill. If you're running a program that 
takes all day, now you can run it 3 times that day.