Signed word lengths and indexes

Mon Jun 14 16:40:01 PDT 2010

Walter Bright:

>D provides powerful abstractions for iteration; it is becoming less and less desirable to hand-build loops with for-statements.<

I agree.

>As for "unsafe", I think you need to clarify this, as D is not memory unsafe despite the existence of integer over/under flows.<

Modern languages must understand that there are other forms of safety beside memory safety. Integer overflows and signed-unsigned conversion-derived bugs can cause disasters as well.

In current D language the usage of unsigned numbers is a safety hazard. So far nothing I have seen written by you or other people has shown that this is false.

>Actually, I think they make a lot of sense, and D's improvement on them that only disallows conversions that lose bits based on range propagation is far more sensible than C#'s overzealous restrictions.<

1) I'd like D to use signed words to represent lengths and array indexes. We are going to 64 bit systems where 63 bits can be enough for lenghts. If arrays of 4 billion items are seen as important on 32 bit systems too, then use a long :-)
2) I don't like D to silently gulp down expressions that mix signed and unsigned integers and spit out wrong results when the integers were negative.

>I have a hard time believing that Python and Ruby are more productive primarily because they do not have an unsigned type.<

Python is very productive (for small or medium sized programs! On large programs Python is less good) because of a quite long list of factors. My experience with D and Python (and several other languages) has shown me that Python not using fixnums is one of the factors that help productivity. It's surely not the only factor, and I agree with you that it's not the most important, but it's surely one of the significant factors and it can't be ignored.

Python integers don't overflow, this at the same time allows you to safe brain time and brain power thinking about possible overflows and the code to avoid their risk, and makes coding more relaxed. And if you try to write 50 Project Euler programs in Python and D you will surely see how many bugs the Python code has avoided you compared to D. Finding and fixing such bugs in D code requires lot of time that you save in Python.

In D there are other bugs derived from mixing signed and unsigned numbers (and you can't avoid them just avoiding using unsigned numbers in your code, because lenghts and indexes and other things use them).

> Python did not add overflow protection until 3.0, so it's very hard to say
> this crippled productivity in early versions. http://www.python.org/dev/peps/pep-0237/ 

You are wrong. Python 2.x dynamically switches to larger integer types when overflow happens. This is done transparently and avoids bugs and keeps programs more efficient. This is on Python V.2.6.5 but similar things happen in much older versions of Python:

>>> a = 2
>>> type(a)
<type 'int'>
>>> a += 10 ** 1000
>>> len(str(a))
1001
>>> type(a)
<type 'long'>

> Ruby & Python 3.0 dynamically switch to larger integer types when overflow
> happens.

This is wrong. Python 3.0 has just the multi-precision integer type, that is called "int".

For small values it can and will probably use under the cover an user-invisible optimization that is essentially the same thing that Python 2.x does. At the moment Python 3 integers are a bit slower than Python 2.x ones because this optimization is not done yet, one of the main design goals of Python is to keep the C interpreter of Python itself really simple, so even not expert C programmer can hack it and help in the develpment of Python.

The PEP 237 and its unification of types was done because:
1) there's no need to keep two integer types in the language, you can just keep one and the language can use invisible optimizations where possible. Python is designed to be simple, so removing one type is good.
2) Actually in very uncommon situations the automatic switch to multi-precision integers can't happen. Such situations are very hard to find, they do not come up in normal numerical code, they come up when you use C extensions (or Python standard library code that is written in C). You can program every day four years in Python 2.x and never find such cases.

>This is completely impractical in a systems language, and is one reason why Ruby & Python are execrably slow compared to C-style languages.<

Lisp languages can be only a 1.0-3.0 times slower can C despite using mostly multi-precision numbers. So I don't think well implemented multi-precision numbers are so bad in a very fast language. And where performance really matters fixnums can be used. In the last years I am starting to think that using fixnums everywhere is a premature optimization. But anyway, the purpose of my original post was not to advocate the replacement of fixnums in D with multi-precision numbers, it was about the change of array indexes and lenghts from unsigned to signed.

Python is slow compared to D, and surely their multi-precision numbers don't help their performance, but the "lack" of Python performance has many causes and the main ones are not the multi-precision numbers. 

The main cause is that Python is designed to have a simple interpterer that can be modified by not very expert C programmers. This allows lot of people to write and work on it, this was one of the causes of the Python success. The unladen swallow project has shown that you can make Python 2-4 times faster just "improving" (messing up and adding some hairy hacks to it) its interpreter, etc.

One of the main causes of the low Python performance is that it's dynamically typed and at the same time it lacks a Just-in-time compiler. A Psyco JIT compiler allows me to write Python code that is usually no more than 10 times slower than D. The wonderful JIT compiler of Lua (that lacks multi-precion numbers but has dynamic typing) allows it to run usually at 0.9-2.5 times slower than D compiled with DMD (0.9 means it's faster on some FP-heavy code).

Other causes of Python low performance is just that Python code is often not written with performance in mind. I am often able to write Python programs that are 2-3 times faster than Python programs I can find around.

Bye,
bearophile