[review] new string type

spir denis.spir at gmail.com
Wed Dec 1 05:09:23 PST 2010


On Wed, 01 Dec 2010 03:30:07 -0500
foobar <foo at bar.com> wrote:

> Steven Schveighoffer Wrote:
> [snipped]
> > > 3. You have no access to the underlying array unless you're dealing with  
> > > an
> > > actual array of dchar.
> > 
> > I thought of adding some kind of access.  I wasn't sure the best way.
> > 
> > I was thinking of allowing direct access via opCast, because I think  
> > casting might be a sufficient red flag to let you know you are crossing  
> > into dangerous waters.
> > 
> > But it could just be as easy as making the array itself public.
> > 
> 
> > -Steve
> 
> A string type should always maintain the invariant that it is a valid unicode string. Therefore I don't like having an unsafe opCast or providing direct access to the underlying array. I feel that there should be a read-only property for that. Algorithms that manipulate char[]'s should construct a new string instance which will validate the char[] it is being built from is a valid utf string.

But then, why not store a dchar[] array systematically? Validation and decoding is the same job. Once decoded, all methods work as expected (eg s[3] returns the 4th code point) and blitz fast.

> This looks like a great start for a proper string type. There's still the issue of literals that would require compiler/language changes.

Yop...

> There's one other issue that should be considered at some stage: normalization and the fact that a single "character" can be constructed from several code points. (acutes and such) 

This is my next little project. May build on Steve's job. (But it's not necessary, dchar is enough as a base, I guess.)


Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



More information about the Digitalmars-d mailing list