Bad array indexing is considered deadly

Thu Jun 1 11:29:53 PDT 2017

On 6/1/2017 3:26 AM, Steven Schveighoffer wrote:
> On 5/31/17 9:05 PM, Walter Bright wrote:
>> On 5/31/2017 6:04 AM, Steven Schveighoffer wrote:
>>> Technically this is a programming error, and a bug. But memory hasn't
>>> actually been corrupted.
>>
>> Since you don't know where the bad index came from, such a conclusion
>> cannot be drawn.
> 
> You could say that about any error. You could say that about malformed unicode 
> strings, malformed JSON data, file not found. In this mindset, everything should 
> be an Error, and nothing should be recoverable.

What's missing here is looking carefully at a program and deciding what are 
input (and environmental) errors and what are program bugs. The former are 
recoverable, the latter are not.

For example, malformed unicode strings. Joel Spolsky wrote about this issue long 
ago, in that data in a program should be compartmentalized into untrusted and 
trusted data.

Untrusted data comes from the input, and stays untrusted until it is validated. 
Malformed untrusted data are recoverable. Once it is validated, it becomes 
trusted data. Any malformations in trusted data are programming bugs. It should 
be clear in a well designed program what data is trusted and what data is 
untrusted. Spolsky suggests using different types for them so they are distinct.

For your date case, the date was not validated, and was fed into an array, where 
the invalid date overflowed the array bounds. The program was relying on the 
array bounds checking to validate the data.

I'd argue this is a problematic program design because:

1. It's inefficient. Data should be validated once in a clear location in the 
program. Arrays appear all over the place, and tend to be in hot locations. 
Validating the same data over and over is highly inefficient.

2. Array bounds checking can be turned off by a compiler switch. Program data 
validation should not be silently disabled in such an unexpected manner.

3. Arrays are a ubiquitous data structure. They are used all over the place. 
There is no way to distinguish "this is a data validation use" and "this must be 
valid data".

4. It would be surprising to anyone familiar with D looking at your code to 
realize that an array access is data validation rather than bug checking.

5. Arrays are sometimes optimized by removing the bounds checking. This should 
not turn off data validation.

6. @safe code is intended to find programming bugs, not validate input data.

7. Just because code is marked @safe doesn't mean memory corruption is 
impossible. Even if @safe is perfect, programs have @trusted and @system code 
too, and those may have memory corrupting bugs.

8. It does not distinguish array overflow from programming bugs / corruption 
from invalid program input.