string is rarely useful as a function argument
Timon Gehr
timon.gehr at gmx.ch
Wed Dec 28 14:39:14 PST 2011
On 12/28/2011 11:12 PM, foobar wrote:
> On Wednesday, 28 December 2011 at 21:17:49 UTC, Timon Gehr wrote:
>>
>> I was educated enough not to make that mistake, because I read the
>> entire language specification before deciding the language was awesome
>> and downloading the compiler. I find it strange that the product
>> should be made less usable because we do not expect users to read the
>> manual. But it is of course a valid point.
>>
>
> That's awfully optimistic to expect people to read the manual.
>
Well, if the alternative is slowly butchering the language I will be
awfully optimistic about it all day long.
>> There is nothing wrong with operating at the code unit level.
>> Efficient slicing is very desirable.
>>
>
> I agree that it's useful. It is however the incorrect abstraction level
> when you need a "string" which is by far the common case in user code.
I would not go as far as to call it 'incorrect'.
> i.e. if I need a name variable in a class: codeUnit[] name; // bug!
> string Name; // correct
>
From a pragmatic viewpoint it does not matter because if string is used
like this, then codeUnit[] does exactly the same thing. Nobody forces
anyone to index or slice into a string variable when they don't need
that functionality. All engineers have to work with leaky abstractions.
Why is it such a big deal?
> I expect that most uses of code-unit arrays should be in the standard
> library anyway since it provides the string manipulation routines. It
> all boils down to making the common case trivial and the rare case
> possible. You can use the underlying data structure (code units) if you
> need it but the default "string" is what people expect when thinking
> about what such a type does (a string of letters). D's already 80% there
> since Phobos already treats strings as bi-directional ranges of
> code-points which is much closer to the mental image of a string of
> letters, so I think this is about bringing the current design to its
> final conclusion.
>
Well, that mental image is just not the right one when dealing with Unicode.
>>
>> Exactly. It is acting less and less like an array of code units. But
>> it *is* an array of code units. If the general consensus is that we
>> need a string data type that acts at a different abstraction level by
>> default (with which I'd disagree, but apparently I don't have a
>> popular opinion here), then we need a string type in the standard
>> library to do that. Changing the language so that an array of code
>> units stops behaving like an array of code units is not a solution.
>>
>
> I agree that we should not break T[] for any T and instead introduce a
> library type. While I personally believe that such a change will expose
> hidden bugs (certainly when unaware programmers treat string as ASCII
> and the product is later on localized), it's a big disturbance in
> people's code and it's worth a consideration if the benefit worth the
> costs. Perhaps, some middle ground could be found such that existing
> code can rely on existing behavior and the new library type will be an
> opt-in.
What will such a type offer, except that it disallows indexing and slicing?
More information about the Digitalmars-d
mailing list