[review] new string type

Ellery Newcomer ellery-newcomer at utulsa.edu
Wed Dec 1 18:13:35 PST 2010


On 12/01/2010 03:35 PM, Steven Schveighoffer wrote:
> On Tue, 30 Nov 2010 18:31:05 -0500, Ellery Newcomer
>>
>> There definitely is value in being able to index and slice into utf
>> strings without resulting in invalid utf, but I think the fact that it
>> indexes on code unit and returns code point is sufficiently strange
>> that it qualifies as abuse of operator overloading.
>
> Maybe :) The other alternative is to throw an exception if you try to
> access a code unit that is not the beginning of a code point.
>
> That might actually be less weird, I'll try doing that on the next
> iteration.

in my mind, the problem isn't so much indexing an intermediate code unit 
gets you earlier code units (it's a little strange, and I'm not sure 
whether greater strictness would be better - on the one hand, less 
strictness would be more tolerant of bugs and make it that much more 
difficult to detect them, but on the other hand if you were doing 
something like getting a random or approximate slice into your string, 
less strictness would mean that much less annoyance, though I have no 
idea why you would want to do that) as it is just the difference between 
the two and the confusion that it's bound to cause the noobies.

>
> I find that iteration over string characters using index is a very rare
> thing anyways, you either use foreach, which should give you dchars, or
> you use something like find, which should never give you an invalid index.
>
> -Steve

find was the counterargument I had in mind for keeping the operator 
overload, as something like

s[find(s,'\u2729') .. s.codeUnits]

is just a bit better than

s.codePointSliceAt(find(s,'\u2729'), s.codeUnits);

I really don't know.

One thing that strikes me, though, if you're going to keep opIndex, is 
that being able to do

foreach(size_t codeuniti, dchar c; s){

}

would be nice. Actually, it looks like you can do that with current strings.


More information about the Digitalmars-d mailing list