[Issue 14519] Get rid of unicode validation in string processing

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Fri May 20 07:20:03 PDT 2016


--- Comment #38 from Martin Nowak <code at dawg.eu> ---
(In reply to Vladimir Panteleev from comment #36)
> Question, is there any overhead in actually verifying the validity of UTF-8
> streams, or is all overhead related to error handling (i.e. inability to be
> nothrow)?

I think it's fairly measurable b/c you need to add lots of additional checks
and branches (though highly predictable ones).
While my initial decode implementation
https://github.com/MartinNowak/phobos/blob/1b0edb728c/std/utf.d#L577-L651 was
transmogrify into 200 lines in the meantime
https://github.com/dlang/phobos/blob/acafd848d8/std/utf.d#L1167-L1369, you can
still use it to benchmark validation.
I did run a lot of benchmarks when introducing that function, and the code path
for decoding just remains slow, even with the throwing code path removed out of
normal control flow.


More information about the Digitalmars-d-bugs mailing list