[Issue 14919] New: utf error
via Digitalmars-d-bugs
digitalmars-d-bugs at puremagic.com
Thu Aug 13 23:54:25 PDT 2015
https://issues.dlang.org/show_bug.cgi?id=14919
Issue ID: 14919
Summary: utf error
Product: D
Version: D2
Hardware: x86_64
OS: Linux
Status: NEW
Severity: enhancement
Priority: P1
Component: dmd
Assignee: nobody at puremagic.com
Reporter: code at dawg.eu
Related/Alternative to issue 14519 (see
https://issues.dlang.org/show_bug.cgi?id=14519#c24).
When I `readText` a file a lot of time is already spent on utf validation.
But we don't take advantage of that and revalidate utf in almost every
algorithm.
The idea from issue 14519 to replace invalid chars with a replacement makes the
validation a little cheaper (b/c of the cost of dmd's EH, see issue 12442) but
still incurs a high overhead.
I suggest that we make a clean distinction between unvalidated ubyte[] data and
treat all char/wchar/dchar[] strings as valid.
The compiler already checks string literals and a few of string reading
functions do it as well. Unfortunately byLine and readln currently don't
validate utf.
This could be a much more performant approach to correct utf handling.
--
More information about the Digitalmars-d-bugs
mailing list