string is rarely useful as a function argument

Timon Gehr timon.gehr at gmx.ch
Fri Dec 30 11:55:42 PST 2011


On 12/30/2011 08:33 PM, Joshua Reusch wrote:
> Am 29.12.2011 19:36, schrieb Andrei Alexandrescu:
>> On 12/29/11 12:28 PM, Don wrote:
>>> On 28.12.2011 20:00, Andrei Alexandrescu wrote:
>>>> Oh, one more thing - one good thing that could come out of this thread
>>>> is abolition (through however slow a deprecation path) of s.length and
>>>> s[i] for narrow strings. Requiring s.rep.length instead of s.length and
>>>> s.rep[i] instead of s[i] would improve the quality of narrow strings
>>>> tremendously. Also, s.rep[i] should return ubyte/ushort, not
>>>> char/wchar.
>>>> Then, people would access the decoding routines on the needed
>>>> occasions,
>>>> or would consciously use the representation.
>>>>
>>>> Yum.
>>>
>>>
>>> If I understand this correctly, most others don't. Effectively, .rep
>>> just means, "I know what I'm doing", and there's no change to existing
>>> semantics, purely a syntax change.
>>
>> Exactly!
>>
>>> If you change s[i] into s.rep[i], it does the same thing as now. There's
>>> no loss of functionality -- it's just stops you from accidentally doing
>>> the wrong thing. Like .ptr for getting the address of an array.
>>> Typically all the ".rep" everywhere would get annoying, so you would
>>> write:
>>> ubyte [] u = s.rep;
>>> and use u from then on.
>>>
>>> I don't like the name 'rep'. Maybe 'raw' or 'utf'?
>>> Apart from that, I think this would be perfect.
>>
>> Yes, I mean "rep" as a short for "representation" but upon first sight
>> the connection is tenuous. "raw" sounds great.
>>
>> Now I'm twice sorry this will not happen...
>>
>
> Maybe it could happen if we
> 1. make dstring the default strings type --

Inefficient.

> code units and characters would be the same

Wrong.

> or 2. forward string.length to std.utf.count and opIndex to
> std.utf.toUTFindex

Inconsistent and inefficient (it blows up the algorithmic complexity).

>
> so programmers could use the slices/indexing/length (no lazyness
> problems), and if they really want codeunits use .raw/.rep (or better
> .utf8/16/32 with std.string.representation(std.utf.toUTF8/16/32)
>

Anyone who intends to write efficient string processing code needs this. 
Anyone who does not want to write string processing code will not need 
to index into a string -- standard library functions will suffice.

> But generally I liked the idea of just having an alias for strings...

Me too. I think the way we have it now is optimal. The only reason we 
are discussing this is because of fear that uneducated users will write 
code that does not take into account Unicode characters above code point 
0x80. But what is the worst thing that can happen?

1. They don't notice. Then it is not a problem, because they are 
obviously only using ASCII characters and it is perfectly reasonable to 
assume that code units and characters are the same thing.

2. They get screwed up string output, look for the reason, patch up 
their code with some functions from std.utf and will never make the same 
mistakes again.


I have *never* seen an user in D.learn complain about it. They might 
have been some I missed, but it is certainly not a prevalent problem. 
Also, just because an user can type .rep does not mean he understands 
Unicode: He is able to make just the same mistakes as before, even more 
so, as the array he is getting back has the _wrong element type_.



More information about the Digitalmars-d mailing list