[Issue 14519] Get rid of unicode validation in string processing

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Fri May 20 07:20:03 PDT 2016


https://issues.dlang.org/show_bug.cgi?id=14519

--- Comment #38 from Martin Nowak <code at dawg.eu> ---
(In reply to Vladimir Panteleev from comment #36)
> Question, is there any overhead in actually verifying the validity of UTF-8
> streams, or is all overhead related to error handling (i.e. inability to be
> nothrow)?

I think it's fairly measurable b/c you need to add lots of additional checks
and branches (though highly predictable ones).
While my initial decode implementation
https://github.com/MartinNowak/phobos/blob/1b0edb728c/std/utf.d#L577-L651 was
transmogrify into 200 lines in the meantime
https://github.com/dlang/phobos/blob/acafd848d8/std/utf.d#L1167-L1369, you can
still use it to benchmark validation.
I did run a lot of benchmarks when introducing that function, and the code path
for decoding just remains slow, even with the throwing code path removed out of
normal control flow.

--


More information about the Digitalmars-d-bugs mailing list