Implicit enum conversions are a stupid PITA
Yigal Chripun
yigal100 at gmail.com
Sun Mar 28 03:56:43 PDT 2010
KennyTM~ Wrote:
> On Mar 26, 10 18:52, yigal chripun wrote:
> > KennyTM~ Wrote:
> >
> >> On Mar 26, 10 05:46, yigal chripun wrote:
> >>>
> >>> while it's true that '?' has one unicode value for it, it's not true for all sorts of diacritics and combine code-points. So your approach is to pass the responsibility for that to the end user which in 99.9999% will not handle this correctlly.
> >>>
> >>
> >> Non-issue. Since when can a character literal store> 1 code-point?
> >
> > character != code-point
> >
> > D chars are really as you say code-points and not always complete characters.
> >
> > here's a use case for you:
> > you want to write a fully unicode aware search engine.
> > If you just try to match the given sequnce of code-points in the search term, you will miss valid matches since, for instance you do not take into account permutations of the order of combining marks.
> > you can't just assume that the code-point value identifies the character.
>
> Stop being off-topic. '?' is of type char, not string. A char always
> holds an octet of UTF-8 encoded sequence. The numerical content is
> unique and well-defined*. Therefore adding 4 to '?' also has a meaning.
>
> * If you're paranoid you may request the spec to ensure the character is
> in NFC form.
Huh? You jump in in the middle of conversation and I'm off-topic?
Now, to get back to the topic at hand:
D's current design is:
char/dchar/wchar are integral types that can contain any value/encoding even though D prefers Unicode. This is not enforced.
e.g. you can have a valid wchar which you increment by 1 and get an invalid wchar.
Instead, Let's have proper well defined semantics in D:
Design A:
char/wchar/dchar are defined to be Unicode code-points for the respective encodings. These is enforces by the language so if you want to define a different encoding you must use something like bits!8
arithmetic on code-points is defined according to the Unicode standard.
Design B:
char represents a (perhaps multi-byte) character.
Arithmetic on this type is *not* defined.
In either case these types should not be treated as plain integral types.
More information about the Digitalmars-d
mailing list