string is rarely useful as a function argument

Wed Dec 28 14:39:14 PST 2011

On 12/28/2011 11:12 PM, foobar wrote:
> On Wednesday, 28 December 2011 at 21:17:49 UTC, Timon Gehr wrote:
>>
>> I was educated enough not to make that mistake, because I read the
>> entire language specification before deciding the language was awesome
>> and downloading the compiler. I find it strange that the product
>> should be made less usable because we do not expect users to read the
>> manual. But it is of course a valid point.
>>
>
> That's awfully optimistic to expect people to read the manual.
>

Well, if the alternative is slowly butchering the language I will be 
awfully optimistic about it all day long.

>> There is nothing wrong with operating at the code unit level.
>> Efficient slicing is very desirable.
>>
>
> I agree that it's useful. It is however the incorrect abstraction level
> when you need a "string" which is by far the common case in user code.

I would not go as far as to call it 'incorrect'.

> i.e. if I need a name variable in a class: codeUnit[] name; // bug!
> string Name; // correct
>

 From a pragmatic viewpoint it does not matter because if string is used 
like this, then codeUnit[] does exactly the same thing. Nobody forces 
anyone to index or slice into a string variable when they don't need 
that functionality. All engineers have to work with leaky abstractions. 
Why is it such a big deal?

> I expect that most uses of code-unit arrays should be in the standard
> library anyway since it provides the string manipulation routines. It
> all boils down to making the common case trivial and the rare case
> possible.  You can use the underlying data structure (code units) if you
> need it but the default "string" is what people expect when thinking
> about what such a type does (a string of letters). D's already 80% there
> since Phobos already treats strings as bi-directional ranges of
> code-points which is much closer to the mental image of a string of
> letters, so I think this is about bringing the current design to its
> final conclusion.
>

Well, that mental image is just not the right one when dealing with Unicode.

>>
>> Exactly. It is acting less and less like an array of code units. But
>> it *is* an array of code units. If the general consensus is that we
>> need a string data type that acts at a different abstraction level by
>> default (with which I'd disagree, but apparently I don't have a
>> popular opinion here), then we need a string type in the standard
>> library to do that. Changing the language so that an array of code
>> units stops behaving like an array of code units is not a solution.
>>
>
> I agree that we should not break T[] for any T and instead introduce a
> library type. While I personally believe that such a change will expose
> hidden bugs (certainly when unaware programmers treat string as ASCII
> and the product is later on localized), it's a big disturbance in
> people's code and it's worth a consideration if the benefit worth the
> costs. Perhaps, some middle ground could be found such that existing
> code can rely on existing behavior and the new library type will be an
> opt-in.

What will such a type offer, except that it disallows indexing and slicing?