[review] new string type

Steven Schveighoffer schveiguy at yahoo.com
Wed Dec 1 14:12:57 PST 2010


On Wed, 01 Dec 2010 03:30:07 -0500, foobar <foo at bar.com> wrote:

> Steven Schveighoffer Wrote:
> [snipped]
>> > 3. You have no access to the underlying array unless you're dealing  
>> with
>> > an
>> > actual array of dchar.
>>
>> I thought of adding some kind of access.  I wasn't sure the best way.
>>
>> I was thinking of allowing direct access via opCast, because I think
>> casting might be a sufficient red flag to let you know you are crossing
>> into dangerous waters.
>>
>> But it could just be as easy as making the array itself public.
>>
>
>> -Steve
>
> A string type should always maintain the invariant that it is a valid  
> unicode string. Therefore I don't like having an unsafe opCast or  
> providing direct access to the underlying array. I feel that there  
> should be a read-only property for that. Algorithms that manipulate  
> char[]'s should construct a new string instance which will validate the  
> char[] it is being built from is a valid utf string.

Copying is not a good idea, nor is runtime validation.  We can only  
protect the programmer so much.

The good news is that the vast majority of strings are literals, which  
should be properly constructed by the compiler, and immutable.

> This looks like a great start for a proper string type. There's still  
> the issue of literals that would require compiler/language changes.

That is essential, the compiler has to defer the type of string literals  
to the library somehow.

> There's one other issue that should be considered at some stage:  
> normalization and the fact that a single "character" can be constructed  
> from several code points. (acutes and such)

This is more solvable with a struct, but at this point, I'm not sure if  
it's worth worrying about.  How common is that need?

-Steve


More information about the Digitalmars-d mailing list