Bad array indexing is considered deadly
Walter Bright via Digitalmars-d
digitalmars-d at puremagic.com
Thu Jun 1 11:29:53 PDT 2017
On 6/1/2017 3:26 AM, Steven Schveighoffer wrote:
> On 5/31/17 9:05 PM, Walter Bright wrote:
>> On 5/31/2017 6:04 AM, Steven Schveighoffer wrote:
>>> Technically this is a programming error, and a bug. But memory hasn't
>>> actually been corrupted.
>>
>> Since you don't know where the bad index came from, such a conclusion
>> cannot be drawn.
>
> You could say that about any error. You could say that about malformed unicode
> strings, malformed JSON data, file not found. In this mindset, everything should
> be an Error, and nothing should be recoverable.
What's missing here is looking carefully at a program and deciding what are
input (and environmental) errors and what are program bugs. The former are
recoverable, the latter are not.
For example, malformed unicode strings. Joel Spolsky wrote about this issue long
ago, in that data in a program should be compartmentalized into untrusted and
trusted data.
Untrusted data comes from the input, and stays untrusted until it is validated.
Malformed untrusted data are recoverable. Once it is validated, it becomes
trusted data. Any malformations in trusted data are programming bugs. It should
be clear in a well designed program what data is trusted and what data is
untrusted. Spolsky suggests using different types for them so they are distinct.
For your date case, the date was not validated, and was fed into an array, where
the invalid date overflowed the array bounds. The program was relying on the
array bounds checking to validate the data.
I'd argue this is a problematic program design because:
1. It's inefficient. Data should be validated once in a clear location in the
program. Arrays appear all over the place, and tend to be in hot locations.
Validating the same data over and over is highly inefficient.
2. Array bounds checking can be turned off by a compiler switch. Program data
validation should not be silently disabled in such an unexpected manner.
3. Arrays are a ubiquitous data structure. They are used all over the place.
There is no way to distinguish "this is a data validation use" and "this must be
valid data".
4. It would be surprising to anyone familiar with D looking at your code to
realize that an array access is data validation rather than bug checking.
5. Arrays are sometimes optimized by removing the bounds checking. This should
not turn off data validation.
6. @safe code is intended to find programming bugs, not validate input data.
7. Just because code is marked @safe doesn't mean memory corruption is
impossible. Even if @safe is perfect, programs have @trusted and @system code
too, and those may have memory corrupting bugs.
8. It does not distinguish array overflow from programming bugs / corruption
from invalid program input.
More information about the Digitalmars-d
mailing list