Higher level built-in strings

bearophile bearophileHUGS at lycos.com
Mon Jul 19 13:40:05 PDT 2010


Walter Bright:
> 1. most string operations, such as copying and searching, even regular 
> expressions, work just fine using regular indices.
> 
> 2. doing the operations in (1) using code points and having to continually 
> decode the strings would result in disastrously slow code.

In my original post I have forgotten another difference over arrays:
5b) a method like ".unit()" that allows to index code units.
So "foo".unit(1) is always O(1). Lower level code can use this method as [] is used for arrays.

Copying is done on the bytes themselves, with a memcpy, no decoding necessary. If the point (9) (automatic LZO encoding) is used, then copying can be 2-3 times faster for long strings (because there is less data and you don't need to uncompress it to copy). (if such compression is added, then strings can need a third accessor method, to the true bytes).


> 3. the user can always layer a code point interface over the strings, but going
> the other way is not so practical.

This is true. But it makes the string usage unnecessarily low-level and hard...
A better design in a smart system language as D is to give strings a default high level "interface" that sees strings as what they are at high level, and add a second lower level interface when you need faster lower-level fiddling (so they have [] that returns code points and unit() that returns code units).

Bye,
bearophile


More information about the Digitalmars-d mailing list