std.d.lexer requirements
Walter Bright
newshound2 at digitalmars.com
Thu Aug 2 10:59:30 PDT 2012
On 8/2/2012 8:46 AM, Dmitry Olshansky wrote:
>> Keep a 6 character buffer in your consumer. If you read a char with the
>> high bit set, start filling that buffer and then decode it.
>>
> 4 bytes is enough.
>
> Since Unicode 5(?) the range of codepoints was defined to be 0...0x10FFFF
> specifically so that it could be encoded in 4 bytes of UTF-8.
Yeah, but I thought 6 bytes would future proof it! (Inevitably, the Unicode
committee will add more.)
>
> P.S. Looks like I'm too late for this party ;)
>
>
It affects you strongly, too, so I'm glad to see you join in.
More information about the Digitalmars-d
mailing list