Higher level built-in strings

Steven Schveighoffer schveiguy at yahoo.com
Mon Jul 19 13:25:26 PDT 2010


On Mon, 19 Jul 2010 16:04:21 -0400, Walter Bright  
<newshound2 at digitalmars.com> wrote:

> bearophile wrote:
>> This odd post comes from reading the nice part about strings of chapter  
>> 4 of
>> TDPL. In the last few years I have seen changes in how D strings are  
>> meant
>> and managed, changes that make them less and less like arrays  
>> (random-access
>> sequences of mutable code units) and more and more what they are at high
>> level (immutable bidirectional sequences of code points).
>
> Strings in D are deliberately meant to be arrays, not special things.  
> Other languages make them special because they have insufficiently  
> powerful arrays.

Andrei is changing that.  Already, isRandomAccessRange!(string) == false.   
I kind of don't like this direction, even though its clever.  What you end  
up with is phobos refusing to believe that a string or char[] is an array,  
but the compiler saying it is.

What I'd prefer is something where the compiler types string literals as  
string, a type defined by phobos which contains as its first member an  
immutable(char)[] (where the compiler puts the literal).  Then we can  
properly limit the other operations.

> As for indexing by code point, I also believe this is a mistake. It is  
> proposed often, but overlooks:
>
> 1. most string operations, such as copying and searching, even regular  
> expressions, work just fine using regular indices.
>
> 2. doing the operations in (1) using code points and having to  
> continually decode the strings would result in disastrously slow code.
>
> 3. the user can always layer a code point interface over the strings,  
> but going the other way is not so practical.

I agree here.  Anything that uses indexing to perform a linear operation  
is bound for the scrap heap.  But what about this:

foreach(c; str)

which types c as char (or immutable char), not dchar.  These are the  
subtle problems that we have with the dichotomy of phobos refusing to  
believe a string is an array, but the compiler believing it is.

I think the default inference for this should be dchar, and phobos can  
make that true as long as it controls the string type.

There are other points to consider:

1) a string *could be* indexed by character and return the code point  
being pointed to.
2) even slicing could be valid as long as the slice operator jumps back to  
the start of the dchar being encoded.  This might make for very tricky  
code, but then again, such is the cost of trying to slice something like a  
utf-8 string :)

But having the compiler force the string type to be an array, when it  
clearly isn't, doesn't help.  Give the runtime the choice, like it's done  
for AA's, and I think we may have something that is workable, and doesn't  
suck performance-wise.

-Steve


More information about the Digitalmars-d mailing list