Why is string.front dchar?
Jakob Ovrum
jakobovrum at gmail.com
Mon Jan 20 01:58:05 PST 2014
On Thursday, 16 January 2014 at 06:59:43 UTC, Maxim Fomin wrote:
> This is wrong. String in D is de facto (by implementation, spec
> may say whatever is convenient for advertising D) array of
> single bytes which can keep UTF-8 code units. No way string
> type in D is always a string in a sense of code
> points/characters. Sometimes it happens that string type
> behaves like 'string', but if you put UTF-16 or UTF-32 text it
> would remind you what string type really is.
By implementation they are also UTF strings. String literals use
UTF, `char.init` is 0xFF and `wchar.init` is 0xFFFF, foreach over
narrow strings with `dchar` iterator variable type does UTF
decoding etc.
I don't think you know what you're talking about; putting UTF-16
or UTF-32 in `string` is utter madness and not trivially
possible. We have `wchar`/`wstring` and `dchar`/`dstring` for
UTF-16 and UTF-32, respectively.
>> Operations on code units are rare, which is why the standard
>> library instead treats strings as ranges of code points, for
>> correctness by default. However, we must not prevent the user
>> from being able to work on arrays of code units, as many
>> string algorithms can be optimized by not doing full UTF
>> decoding. The standard library does this on many occasions,
>> and there are more to come.
>
> This is attempt to explain problematic design as a wise action.
No, it's not. Please leave crappy, unsubstantiated arguments like
this out of these forums.
>> [1] http://dlang.org/type
>
> By the way, the link you provide says char is unsigned 8 bit
> type which can keep value of UTF-8 code unit.
Not *can*, but *does*. Otherwise it is an error in the program.
The specification, compiler implementation (as shown above) and
standard library all treat `char` as a UTF-8 code unit. Treat it
otherwise at your own peril.
> UTF is irrelevant because the problem is in D implementation.
> See
> http://forum.dlang.org/thread/hoopiiobddbapybbwwoc@forum.dlang.org
> (in particular 2nd page).
>
> The root of the issue is that D does not provide 'utf' type
> which would handle correctly strings and characters
> irrespective of the format. But instead the language pretends
> that it supports such type by allowing to convert to single
> byte char array both literals "sad" and "säд". And ['s', 'ä',
> 'д'] is by the way neither char[], no wchar[], even not dchar[]
> but sequence of integers, which compounds oddities in character
> types.
The only problem in the implementation here that you illustrate
is that `['s', 'ä', 'д']` is of type `int[]`, which is a bug. It
should be `dchar[]`. The length of `char[]` works as intended.
> Problems with string type can be illustrated as possible
> situation in domain of integers type. Assume that user wants
> 'number' type which accepts both integers, floats and doubles
> and treats them properly. This would require either library
> solution or a new special type in a language which is supported
> by both compiler and runtime library, which performs operation
> at runtime on objects of number type according to their
> effective type.
>
> D designers want to support such feature (to make the language
> better), but as it happens in other situations, the support is
> only limited: compiler allows to do
>
> alias immutable(int)[] number;
> number my_number = [0, 3.14, 3.14l];
I don't understand this example. The compiler does *not* allow
that code; try it for yourself.
More information about the Digitalmars-d-learn
mailing list