Why Strings as Classes?

Wed Aug 27 04:21:10 PDT 2008

Fawzi Mohamed:
>You know I have got the impression that you have a naive view of datastructures,<

I think you are wrong, see below.

>One cannot expect to have a single data structure accomodate all uses,<

My view on the topic are:
- Each data structure (DS) is a compromise, it allows you to do some operations with a certain performance, while giving you a different performance on onther operations, so it gives you a performance profile. Generally you can't have a DS with the best performance for every operation.
- Sometimes you may want to choose a DS with a worse performance just because its implementation is simpler, to reduce development time, bug count, etc.
- The standard library of a modern language has to contain most of the most commmon DSs, to avoid lot of problems and speed up programming, etc.
- If the standard library doesn't contain a certain DS or if the DS required is very uncommon, or the performance profile you need for a time-critical part of your code is very sharp, then your language is supposed able to allow you to write your own DS (some scripting languages may need you to drop do a lower level language to do this).
- A modern language is supposed to have some built-in DSs. What ones to choose? This is a tricky question, but the answer I give is that a built-in data structure has to be very flexible, so it has to be efficient enough in a large variety of situations, without being optimal for any situations. This allows the programmers to use it in most situations, where max performance isn't required, so the programmer has to use DSs of the standard library (or even his/her own ones) only once in a while. Creating a very flexible DS is not easy, it requires lot of tuning, tons of benchmarks done on real code, your DS often needs to use some extra memory to be flexible (you can even add subsytems to such DS that collect its usage statistics during the runtime to adapt itself to the specific usage in the code).

>Don't get me wrong, it is useful to know which usage patterns give performance problems with the default data structures,<

If you want to write efficient programs such knowledge is very important, even in scripting languages.

>and if associative arrays would use a tree or some sorted structure for small sizes (avoiding the cost of hashing) I would not complain,<

Python hashes are optimized for being little too, and they don't use a tree.

>but I do not think (for example) that arrays should necessarily be very optimized for appending...<

>From the long thread it seems that allowing both fast slices, mutability and fast append isn't easy. I think all three features are important, so some compromise has to be found, because appending is a common enough operation and at the moment is slow or relly slow, too much slow.

Now you say that the built-in arrays don't need to be very optimized for appending. In the Python world they solve this problem mining real programs to collect real usage statistics. So they try to know if the append is actually used in real programs, and how often, where, etc.

Bye,
bearophile