Implicit enum conversions are a stupid PITA

KennyTM~ kennytm at gmail.com
Sun Mar 28 10:17:25 PDT 2010


On Mar 28, 10 18:56, Yigal Chripun wrote:
> KennyTM~ Wrote:
>
>> On Mar 26, 10 18:52, yigal chripun wrote:
>>> KennyTM~ Wrote:
>>>
>>>> On Mar 26, 10 05:46, yigal chripun wrote:
>>>>>
>>>>> while it's true that '?' has one unicode value for it, it's not true for all sorts of diacritics and combine code-points. So your approach is to pass the responsibility for that to the end user which in 99.9999% will not handle this correctlly.
>>>>>
>>>>
>>>> Non-issue. Since when can a character literal store>   1 code-point?
>>>
>>> character != code-point
>>>
>>> D chars are really as you say code-points and not always complete characters.
>>>
>>> here's a use case for you:
>>> you want to write a fully unicode aware search engine.
>>> If you just try to match the given sequnce of code-points in the search term, you will miss valid matches since, for instance you do not take into account permutations of the order of combining marks.
>>> you can't just assume that the code-point value identifies the character.
>>
>> Stop being off-topic. '?' is of type char, not string. A char always
>> holds an octet of UTF-8 encoded sequence. The numerical content is
>> unique and well-defined*. Therefore adding 4 to '?' also has a meaning.
>>
>> * If you're paranoid you may request the spec to ensure the character is
>> in NFC form.
>
> Huh? You jump in in the middle of conversation and I'm off-topic?
>

Yes. The original discussion is on implicit conversion, which leads to 
whether ('x' + 1) is semantically correct. How will this be related to 
search engine?

(Technically even this is off-topic. The title said implicit *enum* 
conversion.)

> Now, to get back to the topic at hand:
>
> D's current design is:
> char/dchar/wchar are integral types that can contain any value/encoding even though D prefers Unicode. This is not enforced.
> e.g. you can have a valid wchar which you increment by 1 and get an invalid wchar.
>

Wrong. Read the specs: http://digitalmars.com/d/1.0/type.html, 
http://digitalmars.com/d/2.0/type.html

  * char  = unsigned 8 bit UTF-8
  * wchar = unsigned 16 bit UTF-16
  * dchar = unsigned 32 bit UTF-32

To contain any encoding, use ubyte.

> Instead, Let's have proper well defined semantics in D:
>
> Design A:
> char/wchar/dchar are defined to be Unicode code-points for the respective encodings. These is enforces by the language so if you want to define a different encoding you must use something like bits!8
> arithmetic on code-points is defined according to the Unicode  standard.
>
> Design B:
> char represents a (perhaps multi-byte) character.
> Arithmetic on this type is *not* defined.
>
> In either case these types should not be treated as plain integral types.




More information about the Digitalmars-d mailing list