array.reverse segfaults

Wed Oct 22 08:39:19 PDT 2008

On Wed, Oct 22, 2008 at 9:46 AM, Denis Koroskin <2korden at gmail.com> wrote:
> On Wed, 22 Oct 2008 15:21:03 +0400, Moritz Warning <moritzwarning at web.de>
> wrote:
>
>> On Wed, 22 Oct 2008 13:10:20 +0200, Tomas Lindquist Olsen wrote:
>>
>>> Tomas Lindquist Olsen wrote:
>>>>
>>>> Moritz Warning wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> This piece of code segfaults on Debian Linux (with dmd 1.035): Can
>>>>> someone tell me why?
>>>>>
>>>>> char[] get(char[] str)
>>>>> {
>>>>>    return new char[](4);
>>>>> }
>>>>>
>>>>> void main(char[][] args)
>>>>> {
>>>>>    char[] str =  get("abc");
>>>>>   char[] reversed = str.reverse; // <-- access violation
>>>>> }
>>>>
>>>> Simpler version:
>>>>
>>>> void main()
>>>> {
>>>>    char[4] str;
>>>>    str.reverse;
>>>> }
>>>>
>>>> Crashes in _adReverseChar when trying to memmove (3 - 255) bytes ;)
>>>>
>>>> My best guess is that is just doesn't handle char.init values properly!
>>>
>>> When it tries to get the lower stride, it gets 0xFF from the table, but
>>> it doesn't check if this value is usable.
>>>
>>> Probably just ignoring these invalid bytes would make it work. But I
>>> think the real question is, what should _adReverseChar really do on
>>> invalid UTF-8 input?
>>
>> I think it should do the same as on an invalid pointer: result in
>> undefined behavior (=> segfault).
>
> It should not pass the assert(isValidUtf8String(str)) prior to in-place
> reverse, thus throwing an exception in debug mode.
> Release behaviour is a subject to debat, but I think it should be more
> robust. Given wrong input it may produce whatever wrong output, but
> segfault? That's too bold.
>

I'd expect it to work like every other piece of code in the runtime
that deals with unicode and throw a UtfException or whatever it is.