Challenge: write a really really small front() for UTF8
Daniel N
ufo at orbiting.us
Mon Mar 24 05:21:54 PDT 2014
On Monday, 24 March 2014 at 11:48:00 UTC, Dmitry Olshansky wrote:
>> RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8
>> to
>> conform to constraints in UTF-16, removing all 5- and 6-byte
>> sequences.
>
> More importantly Unicode standard explicitly fixed the range of
> code points to that of representable in UTF-16. Starting with
> the 5th version of the standard if memory serves me right.
I did some hacks using C at work with _pext_u32, it's an
absolutely wonderful instruction(pext) with the corresponding
pdep.
http://software.intel.com/sites/landingpage/IntrinsicsGuide/
And ridiculously fast according to Agner(Latency 3, Throughput 1):
http://www.agner.org/optimize/instruction_tables.pdf
I think we should add this as an intrinsic to D as well(if it
isn't already, but I couldn't find it)... it could do wonders for
utf decoding.
I'm currently too busy to submit a complete solution, but please
feel free to use my idea if you think it sounds promising.
More information about the Digitalmars-d
mailing list