[review] new string type

Thu Dec 2 13:24:03 PST 2010

On Wed, 01 Dec 2010 21:13:35 -0500, Ellery Newcomer  
<ellery-newcomer at utulsa.edu> wrote:

> On 12/01/2010 03:35 PM, Steven Schveighoffer wrote:
>> On Tue, 30 Nov 2010 18:31:05 -0500, Ellery Newcomer
>>>
>>> There definitely is value in being able to index and slice into utf
>>> strings without resulting in invalid utf, but I think the fact that it
>>> indexes on code unit and returns code point is sufficiently strange
>>> that it qualifies as abuse of operator overloading.
>>
>> Maybe :) The other alternative is to throw an exception if you try to
>> access a code unit that is not the beginning of a code point.
>>
>> That might actually be less weird, I'll try doing that on the next
>> iteration.
>
> in my mind, the problem isn't so much indexing an intermediate code unit  
> gets you earlier code units (it's a little strange, and I'm not sure  
> whether greater strictness would be better - on the one hand, less  
> strictness would be more tolerant of bugs and make it that much more  
> difficult to detect them, but on the other hand if you were doing  
> something like getting a random or approximate slice into your string,  
> less strictness would mean that much less annoyance, though I have no  
> idea why you would want to do that) as it is just the difference between  
> the two and the confusion that it's bound to cause the noobies.

Yes, it does seem odd, but then again, how often do you need the  
individual characters of a string?  I wrote php code for about 6 months as  
a full time job before I found I needed to access individual characters,  
and then I had to look up how to do it :)  It's just not a common thing.

Typically, the index you use is calculated from something like find, and  
you don't care what it is, as long as it's storable and persistent.

>> I find that iteration over string characters using index is a very rare
>> thing anyways, you either use foreach, which should give you dchars, or
>> you use something like find, which should never give you an invalid  
>> index.
>>
>> -Steve
>
> find was the counterargument I had in mind for keeping the operator  
> overload, as something like
>
> s[find(s,'\u2729') .. s.codeUnits]
>
> is just a bit better than
>
> s.codePointSliceAt(find(s,'\u2729'), s.codeUnits);
>
> I really don't know.

Ugh, yes, and actually, that reminds me I should define opDollar.

>
> One thing that strikes me, though, if you're going to keep opIndex, is  
> that being able to do
>
> foreach(size_t codeuniti, dchar c; s){
>
> }
>
> would be nice. Actually, it looks like you can do that with current  
> strings.

At this point, you can't do that except via opApply, and I didn't want to  
inject that in fear that it would be pointed out as a drawback.

It would be nice if we could define a way to do that via ranges...

-Steve