Bad array indexing is considered deadly

Thu Jun 1 03:13:25 PDT 2017

On 5/31/17 6:42 PM, Jonathan M Davis via Digitalmars-d wrote:
> On Wednesday, May 31, 2017 19:17:16 Moritz Maxeiner via Digitalmars-d wrote:
>> On Wednesday, 31 May 2017 at 13:04:52 UTC, Steven Schveighoffer
>>
>> wrote:
>>> [...]
>>>
>>> What are your thoughts? Have you run into this? If so, how did
>>> you solve it?
>>
>> It is not that accessing the array out of bounds *leading* to
>> data corruption that is the issue here, but that in general you
>> have to assume that the index *being* out of bounds is itself the
>> *result* of *already occurred* data corruption; and if data
>> corruption occurred for the index, you *cannot* assume that
>> *only* the index has been affected. The runtime cannot simply
>> assume the index being out of bounds is not the result of already
>> occurred data corruption, because that is inherently unsafe, so
>> it *must* terminate asap as the default.
>>
>> If you get the index as the input to your process - and thus
>> *know* that it being out of bounds is not the result of previous
>> data corruption - then you should check this yourself before
>> accessing the array and handle it appropriately (e.g. via
>> Exception).
>
> I don't think that you even need to worry about whether memory corruption
> occurred prior to indexing the array with an invalid index. The fact that
> the array was indexed with an invalid index is a bug. What caused the bug
> depends entirely on the code. Whether it's a memory corruption or something
> else is irrelevant. The contract of indexing arrays is that only valid
> indices be passed. If an invalid index has been passed, then the contract
> has been violated, and by definition, there's a bug in the program, so the
> runtime has no choice but to throw an Error or otherwise kill the program.
> Given the contract, the only alternative would be to use assertions and only
> check when not compiling with -release, but that would be a serious problem
> for @safe code, and it really wouldn't help Steven's situation. Either way,
> the contract of indexing arrays is such that passing an invalid index is a
> bug, and no program should be doing it. The reason that the index is invalid
> is pretty much irrelevant to the discussion. It's a bug regardless.

Yes, it's definitely a bug, and that is not something I'm arguing 
against. The correct handling is to throw something, and prevent the 
corruption in doing so.

The problem is that the act of throwing itself makes the program 
unusable after that. I'm not on Nick's side saying that everything 
should be Exception, especially not out of memory.

But the result of throwing an Error means your entire program has now 
been *made* invalid, even if it wasn't before. Therefore you must close 
it. I feel this is a mistake. A bad index can come from anywhere, and to 
assume it's from memory corruption is a huge leap.

What would have been nice is to have a level between Error and 
Exception, and to throw that when a bug such as this occurs. Something 
that a framework can catch, but @safe code couldn't. I feel that when 
these decisions were made, the concept of a single-process fiber-based 
server wasn't planned for.

> We _could_ make it so that the contract of indexing arrays is such that
> you're allowed to pass invalid values, but then the runtime would _always_
> have to check the indices (even in @system code), and arrays in general
> could never be used in code that was nothrow without a bunch of extra
> try-catch blocks. It would be like how auto-decoding and UTFException screws
> over our ability to have nothrow code with strings, only it would be for
> _all_ arrays. So, the result would be annoying for a lot of code as well as
> less efficient.

Right, we can't pick that path now anyway. Too much code would break.

>
> The vast majority of array code is written in a way that invalid indices are
> simple never used, and having it so that indexing an array could throw an
> Exception would cause serious problems for a lot of code - especially when
> the code is already written in a way that such an exception will never be
> thrown (similar to how format can't be nothrow even when you know you've
> passed the correct arguments, and it will never throw).
>
> As such, it really doesn't make sense to force all programs to deal with
> arrays throwing Exceptions due to bad indices. If a program can't guarantee
> that it's going to be passing a valid index to an array, then it needs to
> validate the index first. And if that needs to be done frequently, it makes
> a lot of sense to either create a wrapper function for indexing arrays which
> does the check or to outright wrap arrays such that opIndex on that type
> does the check and throws an Exception before the invalid index is passed to
> the array. And if the wrapper function is @trusted, it _should_ make it so
> that druntime doesn't check the index, avoiding having redundant checks.
>
> I can understand Steven's frustration, but I really think that we're better
> off the way it is now, even if it's not ideal for his current use case.

It just means that D is an inferior platform for a web framework, unless 
you use the process-per-request model so the entire thing doesn't go 
down for one page request. But that obviously is going to cause 
performance problems.

Which is unfortunate, because vibe.d is a great platform for web 
development, other than this. You could go Adam's route and just put the 
blinders on, but I think that's not a sustainable practice.

-Steve