Why Strings as Classes?

Wed Aug 27 05:31:06 PDT 2008

On 2008-08-27 13:21:10 +0200, bearophile <bearophileHUGS at lycos.com> said:

> Fawzi Mohamed:
>> You know I have got the impression that you have a naive view of 
>> datastructures,<
> 
> I think you are wrong, see below.

good :)

>> One cannot expect to have a single data structure accomodate all uses,<
> 
> My view on the topic are:
> - Each data structure (DS) is a compromise, it allows you to do some 
> operations with a certain performance, while giving you a different 
> performance on onther operations, so it gives you a performance 
> profile. Generally you can't have a DS with the best performance for 
> every operation.
> - Sometimes you may want to choose a DS with a worse performance just 
> because its implementation is simpler, to reduce development time, bug 
> count, etc.
> - The standard library of a modern language has to contain most of the 
> most commmon DSs, to avoid lot of problems and speed up programming, 
> etc.
> - If the standard library doesn't contain a certain DS or if the DS 
> required is very uncommon, or the performance profile you need for a 
> time-critical part of your code is very sharp, then your language is 
> supposed able to allow you to write your own DS (some scripting 
> languages may need you to drop do a lower level language to do this).
> - A modern language is supposed to have some built-in DSs. What ones to 
> choose? This is a tricky question, but the answer I give is that a 
> built-in data structure has to be very flexible, so it has to be 
> efficient enough in a large variety of situations, without being 
> optimal for any situations. This allows the programmers to use it in 
> most situations, where max performance isn't required, so the 
> programmer has to use DSs of the standard library (or even his/her own 
> ones) only once in a while. Creating a very flexible DS is not easy, it 
> requires lot of tuning, tons of benchmarks done on real code, your DS 
> often needs to use some extra memory to be flexible (you can even add 
> subsytems to such DS that collect its usage statistics during the 
> runtime to adapt itself to the specific usage in the code).
> 
> 
>> Don't get me wrong, it is useful to know which usage patterns give 
>> performance problems with the default data structures,<
> 
> If you want to write efficient programs such knowledge is very 
> important, even in scripting languages.

on this we agree

>> and if associative arrays would use a tree or some sorted structure for 
>> small sizes (avoiding the cost of hashing) I would not complain,<
> 
> Python hashes are optimized for being little too, and they don't use a tree.
> 
> 
>> but I do not think (for example) that arrays should necessarily be very 
>> optimized for appending...<
> 
> From the long thread it seems that allowing both fast slices, 
> mutability and fast append isn't easy. I think all three features are 
> important, so some compromise has to be found, because appending is a 
> common enough operation and at the moment is slow or relly slow, too 
> much slow.
> 
> Now you say that the built-in arrays don't need to be very optimized 
> for appending. In the Python world they solve this problem mining real 
> programs to collect real usage statistics. So they try to know if the 
> append is actually used in real programs, and how often, where, etc.

you know as you say it depends on the programs, but it also depend on 
the language, if in a language it is clear that using the default 
structure you are not supposed to append often, and if you really have 
to do it you use a special method if you do, then the programs written 
in that language will use that.

On the other hand when you translate code from a language to another 
you might encounter problems.

The standard array as I see it embodies the philosophy of the C array:
- minimal memory overhead (it is ok to have lots of them, D does have 
some overhead vs C)
- normal memory layout (usable with low level routines that pass memory 
around and with C)
- you can check bounds (C can't)
- you can slice it
- appending to it is difficult

D tries to do different things to mitigate the fact that appending is 
difficult, maybe it could do more, but it will *never* be as efficient 
as a structure that gives up that fact that an array has to be just a 
chunk of contiguous memory.

Now I find the choice of having the basic array being just a chunk of 
contiguous memory, and that the overhead of the structure should be 
minimal very reasonable for a system programming language that has to 
interact with C, so I also find ok that appending is not as fast as 
with other structures.

Clearly someone coming from lisp, any functional language or even 
python, might disagree about what is requested from the "default" array 
container.
It isn't that one is right and the other wrong, it is just a question 
of priorities of the language, the feel of it, its style...

This does not mean that if improvements can be done without 
compromising too much it shouldn't be done, just that failing some 
benchmarks might be ok :)

Fawzi

> 
> Bye,
> bearophile