[review] new string type

spir denis.spir at gmail.com
Wed Dec 1 04:07:38 PST 2010


On Tue, 30 Nov 2010 23:34:11 +0000 (UTC)
"Lars T. Kyllingstad" <public at kyllingen.NOSPAMnet> wrote:

> On Tue, 30 Nov 2010 13:52:20 -0500, Steven Schveighoffer wrote:
> 
> > On Tue, 30 Nov 2010 13:34:50 -0500, Jonathan M Davis
> > <jmdavisProg at gmx.com> wrote:
> > 
> > [...]
> > 
> >> 4. Indexing is no longer O(1), which violates the guarantees of the
> >> index operator.
> > 
> > Indexing is still O(1).
> > 
> >> 5. Slicing (other than a full slice) is no longer O(1), which violates
> >> the
> >> guarantees of the slicing operator.
> > 
> > Slicing is still O(1).
> > 
> > [...]
> 
> It feels extremely weird that the indices refer to code units and not 
> code points.  If I write
> 
>   auto str = mystring("hæ?");
>   writeln(str[1], " ", str[2]);
> 
> I expect it to print "æ ?", not "æ æ" like it does now.

If I understand correctly how _charStart works in combination with indexing and slicing, then here is something wrong in the type's interface.
After
	auto str = mystring("hæ?");
Either one provides a code unit index and gets a code unit:
	writeln(str[1], " ", str[2]); // "� �" (invalid utf code points)
Or one provides a code point index and gets a code point:
	writeln(str[1], " ", str[2]); // "æ ?"

But for string manipulation, wouldn't it be better that your string type systematically wraps a dchar[] array, whatever the original encoding? For indexing, slicing, finding, counting, etc... to be fast, I mean. Decoding beeing done only once at string creation time.

> On a side note:  It seems to me that the only reason to have char, wchar, 
> and dchar as separate types in the language is that arrays of said types 
> are UTF-encoded strings.  If a type such as the proposed one were to 
> become the default string type in D, it might as well wrap an array of 
> ubyte/ushort/uint, since direct user manipulation of the underlying array 
> will generally only happen in the rare cases when one wants to deal 
> directly with code units.

Yes, but then, see remark above.


Denis
-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com



More information about the Digitalmars-d mailing list