[review] new string type
Steven Schveighoffer
schveiguy at yahoo.com
Thu Dec 2 13:24:03 PST 2010
On Wed, 01 Dec 2010 21:13:35 -0500, Ellery Newcomer
<ellery-newcomer at utulsa.edu> wrote:
> On 12/01/2010 03:35 PM, Steven Schveighoffer wrote:
>> On Tue, 30 Nov 2010 18:31:05 -0500, Ellery Newcomer
>>>
>>> There definitely is value in being able to index and slice into utf
>>> strings without resulting in invalid utf, but I think the fact that it
>>> indexes on code unit and returns code point is sufficiently strange
>>> that it qualifies as abuse of operator overloading.
>>
>> Maybe :) The other alternative is to throw an exception if you try to
>> access a code unit that is not the beginning of a code point.
>>
>> That might actually be less weird, I'll try doing that on the next
>> iteration.
>
> in my mind, the problem isn't so much indexing an intermediate code unit
> gets you earlier code units (it's a little strange, and I'm not sure
> whether greater strictness would be better - on the one hand, less
> strictness would be more tolerant of bugs and make it that much more
> difficult to detect them, but on the other hand if you were doing
> something like getting a random or approximate slice into your string,
> less strictness would mean that much less annoyance, though I have no
> idea why you would want to do that) as it is just the difference between
> the two and the confusion that it's bound to cause the noobies.
Yes, it does seem odd, but then again, how often do you need the
individual characters of a string? I wrote php code for about 6 months as
a full time job before I found I needed to access individual characters,
and then I had to look up how to do it :) It's just not a common thing.
Typically, the index you use is calculated from something like find, and
you don't care what it is, as long as it's storable and persistent.
>> I find that iteration over string characters using index is a very rare
>> thing anyways, you either use foreach, which should give you dchars, or
>> you use something like find, which should never give you an invalid
>> index.
>>
>> -Steve
>
> find was the counterargument I had in mind for keeping the operator
> overload, as something like
>
> s[find(s,'\u2729') .. s.codeUnits]
>
> is just a bit better than
>
> s.codePointSliceAt(find(s,'\u2729'), s.codeUnits);
>
> I really don't know.
Ugh, yes, and actually, that reminds me I should define opDollar.
>
> One thing that strikes me, though, if you're going to keep opIndex, is
> that being able to do
>
> foreach(size_t codeuniti, dchar c; s){
>
> }
>
> would be nice. Actually, it looks like you can do that with current
> strings.
At this point, you can't do that except via opApply, and I didn't want to
inject that in fear that it would be pointed out as a drawback.
It would be nice if we could define a way to do that via ranges...
-Steve
More information about the Digitalmars-d
mailing list