Challenge: write a really really small front() for UTF8

Dmitry Olshansky dmitry.olsh at gmail.com
Mon Mar 24 04:47:36 PDT 2014


24-Mar-2014 04:44, Simen Kjærås пишет:
> On 2014-03-24 00:32, Mike wrote:
>> On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
>>> Here's a baseline: http://goo.gl/91vIGc. Destroy!
>>>
>>> Andrei
>>
>> This example only considers encodings of up to 4 bytes, but UTF-8 can
>> encode code points in as many as 6 bytes.  Is that not a concern?
>>
>> Mike
>
> RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 to
> conform to constraints in UTF-16, removing all 5- and 6-byte sequences.

More importantly Unicode standard explicitly fixed the range of code 
points to that of representable in UTF-16. Starting with the 5th version 
of the standard if memory serves me right.


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list