Why is string.front dchar?

Jakob Ovrum jakobovrum at gmail.com
Mon Jan 20 01:58:05 PST 2014


On Thursday, 16 January 2014 at 06:59:43 UTC, Maxim Fomin wrote:
> This is wrong. String in D is de facto (by implementation, spec 
> may say whatever is convenient for advertising D) array of 
> single bytes which can keep UTF-8 code units. No way string 
> type in D is always a string in a sense of code 
> points/characters. Sometimes it happens that string type 
> behaves like 'string', but if you put UTF-16 or UTF-32 text it 
> would remind you what string type really is.

By implementation they are also UTF strings. String literals use 
UTF, `char.init` is 0xFF and `wchar.init` is 0xFFFF, foreach over 
narrow strings with `dchar` iterator variable type does UTF 
decoding etc.

I don't think you know what you're talking about; putting UTF-16 
or UTF-32 in `string` is utter madness and not trivially 
possible. We have `wchar`/`wstring` and `dchar`/`dstring` for 
UTF-16 and UTF-32, respectively.

>> Operations on code units are rare, which is why the standard 
>> library instead treats strings as ranges of code points, for 
>> correctness by default. However, we must not prevent the user 
>> from being able to work on arrays of code units, as many 
>> string algorithms can be optimized by not doing full UTF 
>> decoding. The standard library does this on many occasions, 
>> and there are more to come.
>
> This is attempt to explain problematic design as a wise action.

No, it's not. Please leave crappy, unsubstantiated arguments like 
this out of these forums.

>> [1] http://dlang.org/type
>
> By the way, the link you provide says char is unsigned 8 bit 
> type which can keep value of UTF-8 code unit.

Not *can*, but *does*. Otherwise it is an error in the program. 
The specification, compiler implementation (as shown above) and 
standard library all treat `char` as a UTF-8 code unit. Treat it 
otherwise at your own peril.

> UTF is irrelevant because the problem is in D implementation. 
> See 
> http://forum.dlang.org/thread/hoopiiobddbapybbwwoc@forum.dlang.org 
> (in particular 2nd page).
>
> The root of the issue is that D does not provide 'utf' type 
> which would handle correctly strings and characters 
> irrespective of the format. But instead the language pretends 
> that it supports such type by allowing to convert to single 
> byte char array both literals "sad" and "säд". And ['s', 'ä', 
> 'д'] is by the way neither char[], no wchar[], even not dchar[] 
> but sequence of integers, which compounds oddities in character 
> types.

The only problem in the implementation here that you illustrate 
is that `['s', 'ä', 'д']` is of type `int[]`, which is a bug. It 
should be `dchar[]`. The length of `char[]` works as intended.

> Problems with string type can be illustrated as possible 
> situation in domain of integers type. Assume that user wants 
> 'number' type which accepts both integers, floats and doubles 
> and treats them properly. This would require either library 
> solution or a new special type in a language which is supported 
> by both compiler and runtime library, which performs operation 
> at runtime on objects of number type according to their 
> effective type.
>
> D designers want to support such feature (to make the language 
> better), but as it happens in other situations, the support is 
> only limited: compiler allows to do
>
> alias immutable(int)[] number;
> number my_number = [0, 3.14, 3.14l];

I don't understand this example. The compiler does *not* allow 
that code; try it for yourself.


More information about the Digitalmars-d-learn mailing list