Crash in byCodeUnit() <- byDchar() when converting faulty text to HTML
"Nordlöw" via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Jun 15 16:09:23 PDT 2014
I'm using the following snippet to convert a UTF-8 string to HTML
/** Convert character $(D c) to HTML representation. */
string toHTML(C)(C c) @safe pure if (isSomeChar!C)
{
import std.conv: to;
if (c == '&') return "&"; // ampersand
else if (c == '<') return "<"; // less than
else if (c == '>') return ">"; // greater than
else if (c == '\"') return """; // double quote
else if (0 < c && c < 128)
return to!string(cast(char)c);
else
return "&#" ~ to!string(cast(int)c) ~ ";";
}
static if (__VERSION__ >= 2066L)
{
/** Convert string $(D s) to HTML representation. */
auto encodeHTML(string s) @safe pure
{
import std.utf: byDchar;
import std.algorithm: joiner, map;
return s.byDchar.map!toHTML.joiner("");
}
}
Note that it uses Walter's new std.utf.byDchar.
But it triggers
core.exception.RangeError at std/utf.d(2703): Range violation
----------------
Stack trace:
#1: ?? line (0)
#2: ?? line (0)
#3:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.d
line (2703)
#4:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.d
line (3232)
#5:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d
line (510)
#6:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d
line (3440)
#7:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d
line (3540)
#8:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/range.d
line (1861)
#9:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d
line (2172)
#10:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d
line (2843)
#11:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d
line (3167)
#12:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d
line (526)
#13:
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/stdio.d
line (1168)
for non-utf-8 input.
Is this intentional?
utf.d on line 2703 is inside byCodeUnit().
When I use byChar() i doesn't crash but then I get incorrect
conversions.
Could somebody explain the different between byChar, byWchar and
byDchar?
More information about the Digitalmars-d-learn
mailing list