Bad array indexing is considered deadly

Thu Jun 1 17:30:39 PDT 2017

On Thu, Jun 01, 2017 at 10:09:36PM +0000, cym13 via Digitalmars-d wrote:
> On Thursday, 1 June 2017 at 19:04:19 UTC, H. S. Teoh wrote:
> > I like Spolsky's idea of using separate types for tainted / verified
> > input. Let the compiler statically verify that you at least made an
> > attempt at validating your program's inputs (though obviously it can
> > only go so far -- the compiler can't guarantee that your validation
> > code is actually correct).  The problem, though, is that D currently
> > doesn't have tainted types, so for example you can't tell at a
> > glance whether a given string is untrusted user input or validated
> > data, it's all just `string`.  I wonder if tainted types could be
> > something worth adding either to the language or to Phobos.
> 
> I'm not familiar with the idea, do we need more than the following?
> 
> struct Tainted {
>     T _basetype;
>     alias _basetype this;
> }
> 
> 
> void main(string[] args) {
>     auto ts = Tainted!string("Hello");
>     writeln(ts);
> }
> 
> It's a PoC, ok, but it lets you use ts like any variable of the base
> type, it lets you convert one easily to the other, but this conversion
> has to be explicit. So, real question, what more do we need?
[...]

Actually, I re-read Spolsky's blog post[1] again, and apparently he didn't
actually recommend using the type system for enforcing this, but a
naming convention that would make code stick out when it's doing
something funny.

[1] https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/

So, for example, you'd name all tainted strings with the prefix `us`,
and all functions that return tainted strings are prefixed with `us`,
including any string identifiers you might use to refer to the tainted
data.  E.g.:

	string usName = usGetParam(httpRequest, "name");
	...
	database.cache("usName", usName);
	...
	string usData = database.read("usName");
	...
	// sEscapeHtmlUs means it converts unsafe data (...Us) to safe
	// data (s...) by escaping dangerous characters.
	string sData = sEscapeHtmlUs(usData);
	...
	// sWrite means it requires safe data
	sWrite(html, "<p>Your name is %s</p>", sData);

The idea is that if you see a line of code where the prefixes don't
match, then you immediately know there's a problem. For example:

	// Uh-oh, we assigned unsafe data to a variable that should only
	// hold safe data.
	string sName = usGetParam(httpRequest, "name");

	// Uh-oh, we wrote unsafe data into a database field that should
	// only contain safe data.
	database.cache("sName", usName);

	// Uh-oh, we're printing unsafe data via a function that assumes
	// its input is safe.
	sWrite(html, "<p>Your name is %s</p>", usData);

This is not bad, since with some practice you could immediately identify
code that's probably wrong (mixing s- and us- prefixes wrongly, or
identifier with no prefix, meaning the code needs to be reviewed and the
identifier renamed accordingly).

The problem is that this is still in the realm of coding by convention.
What I had in mind was more along the lines of what you proposed, that
you'd actually use the type system to enforce a distinction between safe
and unsafe data, so that the compiler will reject code that tries to mix
the two without an explicit conversion.

I haven't thought too deeply about how to actually implement this, but
here's my initial idea: any function that reads data from external
sources (network, filesystem, environment) will return Tainted!string or
Tainted!(T[]) rather than string or T[]. Unlike what you proposed above,
the Tainted wrapper will *not* allow implicit conversion to the
underlying type, because otherwise it defeats the purpose (pass
Tainted!T to a function that expects T, and the compiler will
automatically cast it to T for you: no good).  So you cannot pass this
data directly to a function that expects string or T[].  However, they
will allow some way of accessing the wrapped data, so that the
validation function can inspect the data to ensure that it's OK, then
explicitly cast it to the underlying type.

Sketch of code:

	struct Tainted(T)
	{
		// Note: outside code cannot directly access payload.
		private T payload;

		T validate(alias isClean)()
			if (is(typeof(isClean(T.init)) == bool))
		{
			// Do not allow isClean to escape references to
			// payload (?is this correct usage?). Requires
			// -dip1000.
			scope _p = payload;

			if (isClean(_p))
				return payload;
			throw new Exception("Bad data");
		}

		T cleanse(alias cleaner)()
			if (is(typeof(cleaner(T.init)) == T))
		{
			// Prevent cleaner() from cheating and simply
			// returning the payload (?necessary?). Requires
			// -dip1000. The idea being to force the
			// creation of safe data from the payload, e.g.,
			// a HTML-escaped string from a raw string.
			scope _p = payload;

			return cleaner(_p);
		}
	}

	// Note: returns Tainted!T instead of T.
	Tainted!T readParam(T)(HttpRequest req, string paramName);

	// Note: requires string, not Tainted!string
	void writeToOutput(string s);

	void handleRequest(HttpRequest req)
	{
		string[7] daysOfWeek = [
			"Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"
		];

		// Returns Tainted!int
		auto day = req.readParam!int("dayOfWeek");

		// Compile error: cannot index array with Tainted!int
		//writeToOutput(daysOfWeek[day]);

		// Check range and return unwrapped int if OK, throw
		// Exception otherwise.
		auto checkedDay = day.validate!(d => d >= 0 && d < daysOfWeek.length);

		writeToOutput(daysOfWeek[checkedDay]); // OK

		// Returns Tainted!string
		auto name = req.readParam!string("name");

		// Compile error: cannot pass Tainted!string to writeToOutput.
		//writeToOutput(name);

		// Unwrap to string if does not contain meta-characters,
		// throw Exception otherwise.
		auto safeName = name.validate!hasNoMetaCharacters;

		writeToOutput(safeName); // OK

		// Cleanse the string by escaping metacharacters.
		auto escapedName = name.cleanse!escapeHtmlMetaChars;
		writeToOutput(escapedName); // OK
	}

This is just a rough sketch, of course.  A more complete implementation
would have to consider what to do about code that obtains unsafe data
directly from OS interfaces like core.stdc.stdlib.fread that isn't
wrapped by Tainted.

Also, it would have to address what to do about functions like
File.rawRead(), that writes to a user-provided buffer, since the caller
could just read the tainted data directly from the buffer, bypassing any
Tainted protections.

T

-- 
I'm still trying to find a pun for "punishment"...