dchar undefined behaviour
tsbockman via Digitalmars-d
digitalmars-d at puremagic.com
Thu Oct 22 18:31:44 PDT 2015
While working on updating and improving Lionello Lunesu's
proposed fix for DMD issue #259, I have come across a value range
propagation related issue with the dchar type.
The patch adds VRP-based compile-time evaluation of integer type
comparisons, where possible. This caused the following issue:
The compiler will now optimize out attempts to handle invalid,
out-of-range dchar values. For example:
dchar c = cast(dchar) uint.max;
if(c > 0x10FFFF)
writeln("invalid");
else
writeln("OK");
With constant folding for integer comparisons, the above will
print "OK" rather than "invalid", as it should. The predicate (c
> 0x10FFFF) is simply *assumed* to be false, because the current
starting range.imax for a dchar expression is dchar.max.
So, this leads to the question: is making use of dchar values
greater than dchar.max considered undefined behaviour, or not?
1. If it is UB, then there is quite a lot of D code (including
std.uni) which must be corrected to use uint instead of dchar
when dealing with values which could possibly fall outside the
officially supported range.
2. If it is not UB, then the compiler needs to be updated to stop
assuming that dchar values greater than dchar.max are impossible.
This basically just means removing some of dchar's special
treatment, and running it through more of the same code paths as
uint.
At the moment, I strongly prefer #2, but I suppose #1 could make
sense if people think code which might have to deal with invalid
code points can be isolated sufficiently from other unicode
processing.
More information about the Digitalmars-d
mailing list