Crash in byCodeUnit() <- byDchar() when converting faulty text to HTML

"Nordlöw" via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Jun 15 16:09:23 PDT 2014


I'm using the following snippet to convert a UTF-8 string to HTML

/** Convert character $(D c) to HTML representation. */
string toHTML(C)(C c) @safe pure if (isSomeChar!C)
{
     import std.conv: to;
     if      (c == '&')  return "&"; // ampersand
     else if (c == '<')  return "<"; // less than
     else if (c == '>')  return ">"; // greater than
     else if (c == '\"') return """; // double quote
     else if (0 < c && c < 128)
         return to!string(cast(char)c);
     else
         return "&#" ~ to!string(cast(int)c) ~ ";";
}

static if (__VERSION__ >= 2066L)
{
     /** Convert string $(D s) to HTML representation. */
     auto encodeHTML(string s) @safe pure
     {
         import std.utf: byDchar;
         import std.algorithm: joiner, map;
         return s.byDchar.map!toHTML.joiner("");
     }
}

Note that it uses Walter's new std.utf.byDchar.

But it triggers

core.exception.RangeError at std/utf.d(2703): Range violation
----------------
Stack trace:
#1: ?? line (0)
#2: ?? line (0)
#3: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.d 
line (2703)
#4: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/utf.d 
line (3232)
#5: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d 
line (510)
#6: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d 
line (3440)
#7: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/algorithm.d 
line (3540)
#8: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/range.d 
line (1861)
#9: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d 
line (2172)
#10: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d 
line (2843)
#11: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d 
line (3167)
#12: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/format.d 
line (526)
#13: 
/home/per/opt/x86_64-unknown-linux-gnu/dmd/bin/../import/std/stdio.d 
line (1168)

for non-utf-8 input.

Is this intentional?

utf.d on line 2703 is inside byCodeUnit().

When I use byChar() i doesn't crash but then I get incorrect 
conversions.

Could somebody explain the different between byChar, byWchar and 
byDchar?


More information about the Digitalmars-d-learn mailing list