Why is string.front dchar?
Maxim Fomin
maxim at maxim-fomin.ru
Mon Jan 20 03:55:54 PST 2014
On Monday, 20 January 2014 at 09:58:07 UTC, Jakob Ovrum wrote:
> On Thursday, 16 January 2014 at 06:59:43 UTC, Maxim Fomin wrote:
>> This is wrong. String in D is de facto (by implementation,
>> spec may say whatever is convenient for advertising D) array
>> of single bytes which can keep UTF-8 code units. No way string
>> type in D is always a string in a sense of code
>> points/characters. Sometimes it happens that string type
>> behaves like 'string', but if you put UTF-16 or UTF-32 text it
>> would remind you what string type really is.
>
> By implementation they are also UTF strings. String literals
> use UTF, `char.init` is 0xFF and `wchar.init` is 0xFFFF,
> foreach over narrow strings with `dchar` iterator variable type
> does UTF decoding etc.
>
> I don't think you know what you're talking about; putting
> UTF-16 or UTF-32 in `string` is utter madness and not trivially
> possible. We have `wchar`/`wstring` and `dchar`/`dstring` for
> UTF-16 and UTF-32, respectively.
>
import std.stdio;
void main()
{
string s = "о";
writeln(s.length);
}
This compiles and prints 2. This means that string type is
broken. It is broken in the way as I was attempting to explain.
>> This is attempt to explain problematic design as a wise action.
>
> No, it's not. Please leave crappy, unsubstantiated arguments
> like this out of these forums.
Note, that I provided examples why design is problematic. The
arguement isn't unsubstained.
>
>>> [1] http://dlang.org/type
>>
>> By the way, the link you provide says char is unsigned 8 bit
>> type which can keep value of UTF-8 code unit.
>
> Not *can*, but *does*. Otherwise it is an error in the program.
> The specification, compiler implementation (as shown above) and
> standard library all treat `char` as a UTF-8 code unit. Treat
> it otherwise at your own peril.
>
But such treating is nonsense. It is like treating integer or
floating number as sequence of bytes. You are essentially saying
that treating char as UTF-8 code unit is OK because language is
treating char as UTF-8 code unit.
> The only problem in the implementation here that you illustrate
> is that `['s', 'ä', 'д']` is of type `int[]`, which is a bug.
> It should be `dchar[]`. The length of `char[]` works as
> intended.
You are saying that length of char works as intended, which is
true, but shows that design is broken.
>> Problems with string type can be illustrated as possible
>> situation in domain of integers type. Assume that user wants
>> 'number' type which accepts both integers, floats and doubles
>> and treats them properly. This would require either library
>> solution or a new special type in a language which is
>> supported by both compiler and runtime library, which performs
>> operation at runtime on objects of number type according to
>> their effective type.
>>
>> D designers want to support such feature (to make the language
>> better), but as it happens in other situations, the support is
>> only limited: compiler allows to do
>>
>> alias immutable(int)[] number;
>> number my_number = [0, 3.14, 3.14l];
>
> I don't understand this example. The compiler does *not* allow
> that code; try it for yourself.
It does not allow because it is nonsense. However it does allows
equivalent nonsesnce in character types.
alias immutable(int)[] number;
number my_number = [0, 3.14, 3.14l]; // does not compile
alias immutable(char)[] string;
string s = "säд"; // compiles, however "säд" should default to
wstring or dstring
Same reasons which prevent sane person from being OK with int[]
number = [3.14l] should prevent him from being OK with string s =
"säд"
More information about the Digitalmars-d-learn
mailing list