Bad array indexing is considered deadly

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Thu Jun 1 12:04:19 PDT 2017


On Thu, Jun 01, 2017 at 11:29:53AM -0700, Walter Bright via Digitalmars-d wrote:
[...]
> Untrusted data comes from the input, and stays untrusted until it is
> validated. Malformed untrusted data are recoverable. Once it is
> validated, it becomes trusted data. Any malformations in trusted data
> are programming bugs. It should be clear in a well designed program
> what data is trusted and what data is untrusted. Spolsky suggests
> using different types for them so they are distinct.
> 
> For your date case, the date was not validated, and was fed into an
> array, where the invalid date overflowed the array bounds. The program
> was relying on the array bounds checking to validate the data.

+1.  I think this is the root of the problem.  Data that comes from
outside sources must never, ever be trusted, until they are validated.
Any errors that occur during validation are recoverable, because you
*know* they are caused by wrong data from outside.

Once the data is validated, any further errors involving that data are
program bugs: either your validation code was incorrect / incomplete, or
there is a program logic error that led to an inconsistent state. In
this case, aborting the program is the only sane response, especially in
an online services setting, because your broken validation code may have
let through maliciously-crafted data that can lead to an exploit (better
nip it in the bud before the exploit proceeds any further), or the
internal program logic is inconsistent, so proceeding further is UB.

Feeding unvalidated, tainted data directly into inner program logic like
indexing an array is a bad idea.  The data ought to be validated first.

I like Spolsky's idea of using separate types for tainted / verified
input. Let the compiler statically verify that you at least made an
attempt at validating your program's inputs (though obviously it can
only go so far -- the compiler can't guarantee that your validation code
is actually correct).  The problem, though, is that D currently doesn't
have tainted types, so for example you can't tell at a glance whether a
given string is untrusted user input or validated data, it's all just
`string`.  I wonder if tainted types could be something worth adding
either to the language or to Phobos.


[...]
> 8. It does not distinguish array overflow from programming bugs /
> corruption from invalid program input.

Yes, I think this conflation is the root cause of this problem.
Validation should be explicit, and separate from inner program logic.
Mixing the two together only serves to confuse the issue.


T

-- 
If you think you are too small to make a difference, try sleeping in a closed room with a mosquito. -- Jan van Steenbergen


More information about the Digitalmars-d mailing list