Implicit enum conversions are a stupid PITA

Yigal Chripun yigal100 at gmail.com
Sun Mar 28 03:56:43 PDT 2010


KennyTM~ Wrote:

> On Mar 26, 10 18:52, yigal chripun wrote:
> > KennyTM~ Wrote:
> >
> >> On Mar 26, 10 05:46, yigal chripun wrote:
> >>>
> >>> while it's true that '?' has one unicode value for it, it's not true for all sorts of diacritics and combine code-points. So your approach is to pass the responsibility for that to the end user which in 99.9999% will not handle this correctlly.
> >>>
> >>
> >> Non-issue. Since when can a character literal store>  1 code-point?
> >
> > character != code-point
> >
> > D chars are really as you say code-points and not always complete characters.
> >
> > here's a use case for you:
> > you want to write a fully unicode aware search engine.
> > If you just try to match the given sequnce of code-points in the search term, you will miss valid matches since, for instance you do not take into account permutations of the order of combining marks.
> > you can't just assume that the code-point value identifies the character.
> 
> Stop being off-topic. '?' is of type char, not string. A char always 
> holds an octet of UTF-8 encoded sequence. The numerical content is 
> unique and well-defined*. Therefore adding 4 to '?' also has a meaning.
> 
> * If you're paranoid you may request the spec to ensure the character is 
> in NFC form.

Huh? You jump in in the middle of conversation and I'm off-topic?

Now, to get back to the topic at hand:

D's current design is:
char/dchar/wchar are integral types that can contain any value/encoding even though D prefers Unicode. This is not enforced.  
e.g. you can have a valid wchar which you increment by 1 and get an invalid wchar. 

Instead, Let's have proper well defined semantics in D:

Design A: 
char/wchar/dchar are defined to be Unicode code-points for the respective encodings. These is enforces by the language so if you want to define a different encoding you must use something like bits!8
arithmetic on code-points is defined according to the Unicode  standard. 

Design B: 
char represents a (perhaps multi-byte) character. 
Arithmetic on this type is *not* defined.

In either case these types should not be treated as plain integral types.



More information about the Digitalmars-d mailing list