string is rarely useful as a function argument

Wed Dec 28 22:45:46 PST 2011

On Wednesday, 28 December 2011 at 22:39:15 UTC, Timon Gehr wrote:
> On 12/28/2011 11:12 PM, foobar wrote:
>> On Wednesday, 28 December 2011 at 21:17:49 UTC, Timon Gehr 
>> wrote:
>>>
>>> I was educated enough not to make that mistake, because I 
>>> read the
>>> entire language specification before deciding the language 
>>> was awesome
>>> and downloading the compiler. I find it strange that the 
>>> product
>>> should be made less usable because we do not expect users to 
>>> read the
>>> manual. But it is of course a valid point.
>>>
>>
>> That's awfully optimistic to expect people to read the manual.
>>
>
> Well, if the alternative is slowly butchering the language I 
> will be awfully optimistic about it all day long.
>
>>> There is nothing wrong with operating at the code unit level.
>>> Efficient slicing is very desirable.
>>>
>>
>> I agree that it's useful. It is however the incorrect 
>> abstraction level
>> when you need a "string" which is by far the common case in 
>> user code.
>
> I would not go as far as to call it 'incorrect'.
>
>> i.e. if I need a name variable in a class: codeUnit[] name; // 
>> bug!
>> string Name; // correct
>>
>
> From a pragmatic viewpoint it does not matter because if string 
> is used like this, then codeUnit[] does exactly the same thing. 
> Nobody forces anyone to index or slice into a string variable 
> when they don't need that functionality. All engineers have to 
> work with leaky abstractions. Why is it such a big deal?
>
>
>> I expect that most uses of code-unit arrays should be in the 
>> standard
>> library anyway since it provides the string manipulation 
>> routines. It
>> all boils down to making the common case trivial and the rare 
>> case
>> possible.  You can use the underlying data structure (code 
>> units) if you
>> need it but the default "string" is what people expect when 
>> thinking
>> about what such a type does (a string of letters). D's already 
>> 80% there
>> since Phobos already treats strings as bi-directional ranges of
>> code-points which is much closer to the mental image of a 
>> string of
>> letters, so I think this is about bringing the current design 
>> to its
>> final conclusion.
>>
>
> Well, that mental image is just not the right one when dealing 
> with Unicode.
>
>>>
>>> Exactly. It is acting less and less like an array of code 
>>> units. But
>>> it *is* an array of code units. If the general consensus is 
>>> that we
>>> need a string data type that acts at a different abstraction 
>>> level by
>>> default (with which I'd disagree, but apparently I don't have 
>>> a
>>> popular opinion here), then we need a string type in the 
>>> standard
>>> library to do that. Changing the language so that an array of 
>>> code
>>> units stops behaving like an array of code units is not a 
>>> solution.
>>>
>>
>> I agree that we should not break T[] for any T and instead 
>> introduce a
>> library type. While I personally believe that such a change 
>> will expose
>> hidden bugs (certainly when unaware programmers treat string 
>> as ASCII
>> and the product is later on localized), it's a big disturbance 
>> in
>> people's code and it's worth a consideration if the benefit 
>> worth the
>> costs. Perhaps, some middle ground could be found such that 
>> existing
>> code can rely on existing behavior and the new library type 
>> will be an
>> opt-in.
>
> What will such a type offer, except that it disallows indexing 
> and slicing?

 From a pragmatic view point people can also continue programming 
in C++ instead of investing a lot of effort learning a new 
language.

The only difference between programming languages is the human 
interface aspect.  Anything you can program with D you could also 
do in assembly yet you prefer D because it's more convenient. In 
that regard, a code-unit array is definitely worse than a string 
type.

A programmer can choose to either change his 'naive' mental image 
or change the programming language. Most will do the latter. 
Computers need to adapt and be human friendly, not vice-versa.