[review] new string type
spir
denis.spir at gmail.com
Wed Dec 1 04:07:38 PST 2010
On Tue, 30 Nov 2010 23:34:11 +0000 (UTC)
"Lars T. Kyllingstad" <public at kyllingen.NOSPAMnet> wrote:
> On Tue, 30 Nov 2010 13:52:20 -0500, Steven Schveighoffer wrote:
>
> > On Tue, 30 Nov 2010 13:34:50 -0500, Jonathan M Davis
> > <jmdavisProg at gmx.com> wrote:
> >
> > [...]
> >
> >> 4. Indexing is no longer O(1), which violates the guarantees of the
> >> index operator.
> >
> > Indexing is still O(1).
> >
> >> 5. Slicing (other than a full slice) is no longer O(1), which violates
> >> the
> >> guarantees of the slicing operator.
> >
> > Slicing is still O(1).
> >
> > [...]
>
> It feels extremely weird that the indices refer to code units and not
> code points. If I write
>
> auto str = mystring("hæ?");
> writeln(str[1], " ", str[2]);
>
> I expect it to print "æ ?", not "æ æ" like it does now.
If I understand correctly how _charStart works in combination with indexing and slicing, then here is something wrong in the type's interface.
After
auto str = mystring("hæ?");
Either one provides a code unit index and gets a code unit:
writeln(str[1], " ", str[2]); // "� �" (invalid utf code points)
Or one provides a code point index and gets a code point:
writeln(str[1], " ", str[2]); // "æ ?"
But for string manipulation, wouldn't it be better that your string type systematically wraps a dchar[] array, whatever the original encoding? For indexing, slicing, finding, counting, etc... to be fast, I mean. Decoding beeing done only once at string creation time.
> On a side note: It seems to me that the only reason to have char, wchar,
> and dchar as separate types in the language is that arrays of said types
> are UTF-encoded strings. If a type such as the proposed one were to
> become the default string type in D, it might as well wrap an array of
> ubyte/ushort/uint, since direct user manipulation of the underlying array
> will generally only happen in the rare cases when one wants to deal
> directly with code units.
Yes, but then, see remark above.
Denis
-- -- -- -- -- -- --
vit esse estrany ☣
spir.wikidot.com
More information about the Digitalmars-d
mailing list