string is rarely useful as a function argument

Thu Dec 29 10:01:01 PST 2011

On 12/29/2011 07:45 AM, foobar wrote:
> On Wednesday, 28 December 2011 at 22:39:15 UTC, Timon Gehr wrote:
>> On 12/28/2011 11:12 PM, foobar wrote:
>>> On Wednesday, 28 December 2011 at 21:17:49 UTC, Timon Gehr wrote:
>>>>
>>>> I was educated enough not to make that mistake, because I read the
>>>> entire language specification before deciding the language was awesome
>>>> and downloading the compiler. I find it strange that the product
>>>> should be made less usable because we do not expect users to read the
>>>> manual. But it is of course a valid point.
>>>>
>>>
>>> That's awfully optimistic to expect people to read the manual.
>>>
>>
>> Well, if the alternative is slowly butchering the language I will be
>> awfully optimistic about it all day long.
>>
>>>> There is nothing wrong with operating at the code unit level.
>>>> Efficient slicing is very desirable.
>>>>
>>>
>>> I agree that it's useful. It is however the incorrect abstraction level
>>> when you need a "string" which is by far the common case in user code.
>>
>> I would not go as far as to call it 'incorrect'.
>>
>>> i.e. if I need a name variable in a class: codeUnit[] name; // bug!
>>> string Name; // correct
>>>
>>
>> From a pragmatic viewpoint it does not matter because if string is
>> used like this, then codeUnit[] does exactly the same thing. Nobody
>> forces anyone to index or slice into a string variable when they don't
>> need that functionality. All engineers have to work with leaky
>> abstractions. Why is it such a big deal?
>>
>>
>>> I expect that most uses of code-unit arrays should be in the standard
>>> library anyway since it provides the string manipulation routines. It
>>> all boils down to making the common case trivial and the rare case
>>> possible. You can use the underlying data structure (code units) if you
>>> need it but the default "string" is what people expect when thinking
>>> about what such a type does (a string of letters). D's already 80% there
>>> since Phobos already treats strings as bi-directional ranges of
>>> code-points which is much closer to the mental image of a string of
>>> letters, so I think this is about bringing the current design to its
>>> final conclusion.
>>>
>>
>> Well, that mental image is just not the right one when dealing with
>> Unicode.
>>
>>>>
>>>> Exactly. It is acting less and less like an array of code units. But
>>>> it *is* an array of code units. If the general consensus is that we
>>>> need a string data type that acts at a different abstraction level by
>>>> default (with which I'd disagree, but apparently I don't have a
>>>> popular opinion here), then we need a string type in the standard
>>>> library to do that. Changing the language so that an array of code
>>>> units stops behaving like an array of code units is not a solution.
>>>>
>>>
>>> I agree that we should not break T[] for any T and instead introduce a
>>> library type. While I personally believe that such a change will expose
>>> hidden bugs (certainly when unaware programmers treat string as ASCII
>>> and the product is later on localized), it's a big disturbance in
>>> people's code and it's worth a consideration if the benefit worth the
>>> costs. Perhaps, some middle ground could be found such that existing
>>> code can rely on existing behavior and the new library type will be an
>>> opt-in.
>>
>> What will such a type offer, except that it disallows indexing and
>> slicing?
>
>
>  From a pragmatic view point people can also continue programming in C++
> instead of investing a lot of effort learning a new language.
>

I disagree.

Pragmatism: "Dealing with things sensibly and realistically in a way 
that is based on practical rather than theoretical considerations."

In practice, programming in D beats the pants off programming in C++.

> The only difference between programming languages is the human interface
> aspect.

No. There is also the aspect of how well it maps to the machine it will 
run on. An interface always has two sides.

> Anything you can program with D you could also do in assembly
> yet you prefer D because it's more convenient.

I prefer D because it is more productive.

> In that regard, a code-unit array is definitely worse than a string type.
>

A code-unit array type is a string type, albeit a simple one.

> A programmer can choose to either change his 'naive' mental image or
> change the programming language.  Most will do the latter.

A programmer does not care about how D strings work or he is happy that 
they are so simple to work with.

> Computers need to adapt and be human friendly, not vice-versa.

When I meet a computer that adapts itself in order to be human friendly, 
I'll buy you a cookie.