dchar undefined behaviour

Dmitry Olshansky via Digitalmars-d digitalmars-d at puremagic.com
Sat Oct 24 02:05:43 PDT 2015


On 24-Oct-2015 02:45, Anon wrote:
> On Friday, 23 October 2015 at 21:22:38 UTC, Vladimir Panteleev wrote:
>> That doesn't sound right. In fact, this puts into question why
>> dchar.max is at the value it is now. It might be the current maximum
>> at the current version of Unicode, but this seems like a completely
>> pointless restriction that breaks forward-compatibility with future
>> Unicode versions, meaning that D programs compiled today may be unable
>> to work with Unicode text in the future because of a pointless
>> artificial limitation.
>
> Unless UTF-16 is deprecated and completely removed from all systems
> everywhere, there is no way for Unicode Consortium to increase the limit
> beyond U+10FFFF. That limit is not arbitrary, but based on the technical
> limitations of what UTF-16 can actually represent. UTF-8 and UTF-32 both
> have room for expansion, but have been defined to match UTF-16's
> limitations.

Exactly. Unicode officially limited UTf-8 to 10FFFF in Unicode 6.0 or 
so. Previously it was expected to (maybe) expand beyond but it was 
decided to stay with 10FFFF pretty much indefinitely because of UTF-16.

Also; only ~114k of codepoints have assigned meaning, we are looking at 
900K+ unassigned values reserved today.

-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list