utf.d codeLength asserts false on certain input
Anonymouse
asdf at asdf.net
Tue Mar 27 23:29:57 UTC 2018
My IRC bot is suddenly seeing crashes. It reads characters from a
Socket into an ubyte[] array, then idups parts of that (full
lines) into strings for parsing. Parsing involves slicing such
strings into meaningful segments; sender, event type, target
channel/user, message content, etc. I can assume all of them to
be char[]-compliant except for the content field.
Running it in a debugger I see I'm tripping an assert in utf.d[1]
when calling stripRight on a content slice[2].
> /++
> Returns the number of code units that are required to
> encode the code point
> $(D c) when $(D C) is the character type used to encode it.
> +/
> ubyte codeLength(C)(dchar c) @safe pure nothrow @nogc
> if (isSomeChar!C)
> {
> static if (C.sizeof == 1)
> {
> if (c <= 0x7F) return 1;
> if (c <= 0x7FF) return 2;
> if (c <= 0xFFFF) return 3;
> if (c <= 0x10FFFF) return 4;
> assert(false); // <--
> }
> // ...
This trips it:
> import std.string;
>
> void main()
> {
> string s = "\355\342\256 \342\245\341⮢\256\245
> ᮮ\241饭\250\245".stripRight; // <-- asserts false
> }
The real backtrace:
> #0 _D3std3utf__T10codeLengthTaZQpFNaNbNiNfwZh (c=26663461) at
> /usr/include/dlang/dmd/std/utf.d:2530
> #1 0x000055555578d7aa in
> _D3std6string__T10stripRightTAyaZQrFQhZ14__foreachbody2MFNaNbNiNfKmKwZi (this=0x7fffffff99c0, __applyArg1=@0x7fffffff9978: 26663461, __applyArg0=@0x7fffffff9970: 17) at /usr/include/dlang/dmd/std/string.d:2918
> #2 0x00007ffff7a47014 in _aApplyRcd2 () from
> /usr/lib/libphobos2.so.0.78
> #3 0x000055555578d731 in
> _D3std6string__T10stripRightTAyaZQrFNaNiNfQnZQq (str=...) at
> /usr/include/dlang/dmd/std/string.d:2915
> #4 0x00005555558e0cc7 in
> _D8kameloso3irc17parseSpecialcasesFNaNfKSQBnQBh9IRCParserKSQCf7ircdefs8IRCEventKAyaZv (slice=..., event=...,parser=...) at source/kameloso/irc.d:1184
Should that not be an Exception, as it's based on input? I'm not
sure where the character 26663461 came from. Even so, should it
assert?
I don't know what to do right now. I'd like to avoid sanitizing
all lines. I could catch an Exception but not so much an
AssertError.
[1]: https://github.com/dlang/phobos/blob/master/std/utf.d#L2522
[2]:
https://github.com/zorael/kameloso/blob/master/source/kameloso/irc.d#L1184
More information about the Digitalmars-d-learn
mailing list