[review] new string type (take 2)

Fri Jan 14 09:03:28 PST 2011

On 1/14/11 5:06 AM, Steven Schveighoffer wrote:
> On Thu, 13 Jan 2011 23:03:35 -0500, Steven Wawryk <stevenw at acres.com.au>
> wrote:
>
>> On 14/01/11 02:25, Steven Schveighoffer wrote:
>> > On Wed, 12 Jan 2011 04:49:26 -0500, Steven Wawryk
>> <stevenw at acres.com.au>
>> > wrote:
>> >
>> >>
>> >> I like the direction you're taking but have some quibbles about
>> >> details. Specifically, I'd go for a more complete separation into
>> >> random-access code-unit ranges and bidirectional code-point ranges:
>> >
>> > Thanks for taking the time. I will respond to your points, but please
>> > make your rebuttals to the new thread I'm about to create with an
>> > updated string type.
>> >
>> >> I don't see a need for _charStart, opIndex, opSlice and codeUnits. If
>> >> the underlying T[] can be returned by a property, then these can be
>> >> done through the code-unit array, which is random-access.
>> >
>> > But that puts extra pain on the user for not much reason. Currently,
>> > strings slice in one operation, you are proposing that we slice in
>> three
>> > operations:
>> >
>> > 1. get the underlying array
>>
>> myString vs myString.data
>>
>> > 2. slice it
>>
>> Same for both.
>>
>> > 3. reconstruct a string based on the slice.
>>
>> myOtherString = find(myString, 'x');
>> vs
>> myOtherString = find(myString.data, 'x');
>>
>> You may see extra pain. I see extra control. The user is making it
>> explicit at what level (code-unit/code-point/grapheme/whatever) of
>> range he/she wants the called algorithm to be working on.
>
> Exactly, that is what my string type allows. You can either do it at the
> code-point (and probably grapheme, discussion in progress) level, or you
> can do it at the code-unit level. I don't see how restricting the user
> to only doing it at the code-unit level is not more painful.
>
>> > Plus, if you remove opIndex, you are restricting the usefulness of the
>> > range. Note that this string type already will decode dchars out of the
>> > front and back, why not just give that ability to the middle of the
>> string?
>>
>> Because at the code-point level it *isn't* a random-access range and
>> the index makes no sense at the code-point level, only at the
>> code-unit level. It's encouraging the confusion of 2 distinctly
>> different abstractions or "views" of the same data. All the slicing
>> and indexing you're artificially putting in the code-point range is
>> already available in the code-unit range, and its only benefit is to
>> allow the user to save typing ".data".
>
> I respectfully disagree. A stream built on fixed-sized units, but with
> variable length elements, where you can determine the start of an
> element in O(1) time given a random index absolutely provides
> random-access. It just doesn't provide length.

I equally respectfully disagree. I think random access is defined as 
accessing the ith element in O(1) time. That's not the case here.

Andrei