std.d.lexer requirements

Walter Bright newshound2 at digitalmars.com
Thu Aug 2 10:59:30 PDT 2012


On 8/2/2012 8:46 AM, Dmitry Olshansky wrote:
>> Keep a 6 character buffer in your consumer. If you read a char with the
>> high bit set, start filling that buffer and then decode it.
>>
> 4 bytes is enough.
>
> Since Unicode 5(?) the range of codepoints was defined to be 0...0x10FFFF
> specifically so that it could be encoded in 4 bytes of UTF-8.

Yeah, but I thought 6 bytes would future proof it! (Inevitably, the Unicode 
committee will add more.)

>
> P.S. Looks like I'm too late for this party ;)
>
>

It affects you strongly, too, so I'm glad to see you join in.



More information about the Digitalmars-d mailing list