Higher level built-in strings

Walter Bright newshound2 at digitalmars.com
Mon Jul 19 21:29:53 PDT 2010


bearophile wrote:
> Walter Bright:
>> 1. most string operations, such as copying and searching, even regular 
>> expressions, work just fine using regular indices.
>> 
>> 2. doing the operations in (1) using code points and having to continually
>>  decode the strings would result in disastrously slow code.
> 
> In my original post I have forgotten another difference over arrays: 5b) a
> method like ".unit()" that allows to index code units. So "foo".unit(1) is
> always O(1). Lower level code can use this method as [] is used for arrays.

This is backwards. The [i] should behave as expected for arrays. As it turns 
out, indexing by byte is *far* more common than indexing by code unit, in fact, 
I've never ever needed to index by code unit.

(Though it is sometimes necessary to step through by code unit, that's different 
from indexing by code unit.)


>> 3. the user can always layer a code point interface over the strings, but
>> going the other way is not so practical.
> 
> This is true. But it makes the string usage unnecessarily low-level and
> hard...

I don't believe that manipulating strings in D is hard, even if you do have to 
work with multibyte characters. You do have to be aware they are multibyte, but 
I think that just comes with being a programmer.


  A better design in a smart system language as D is to give strings a
> default high level "interface" that sees strings as what they are at high
> level, and add a second lower level interface when you need faster
> lower-level fiddling (so they have [] that returns code points and unit()
> that returns code units).

I have some moderate experience with using utf. First there's the D javascript 
engine, which is fully utf'd. The D string design fits in with it perfectly. 
Then there are chunks of C++ ascii-only code I've translated to D, and it then 
worked with utf-8 without further modification.

Based on that, I believe the D string design hits the sweet spot between 
efficiency and utility.



More information about the Digitalmars-d mailing list