Interpolated strings and SQL

Timon Gehr timon.gehr at gmx.ch
Tue Jan 9 22:23:01 UTC 2024


On 1/9/24 21:01, Walter Bright wrote:
> On 1/9/2024 4:35 AM, Timon Gehr wrote:
>> However, let's for the sake of argument assume that, miraculously, 
>> `execi` can read the format string at compile time, then:
> 
> Adam's implementation of execi() also runs at run time, not compile time.
> ...

Adam's `execi` partially runs at compile time and partially of course it 
will ultimately run at run time (like code generated by a metaprogram 
tends to do).

The SQL statement is prepared at compile time. Therefore, by 
construction, it cannot depend on any runtime parameters, preventing an 
SQL injection. (And it can be checked at compile time, like people are 
already doing with less convenient syntax).

> 
>> - With this signature, if you pass a manually-constructed string to 
>> it, it would just accept the SQL injection.
> 
> It was just a proof of concept piece of code.

So is Adam's example code. In any case:

I am talking about the function _signature_. Whatever crazy advanced 
thing you do in the implementation, the signature that DIP1027 expects 
`execi` to have is fundamentally significantly less safe.

> execi could check for 
> format strings that contain ?n sequences. It could also check the number 
> of %s formats against the number of arguments.
> ...

That does not fix the security issue.

> 
>  > But you get a useful error message that exactly pinpoints what the 
> problem is.
>  > Also, they could be supported, which is the point.
>> - It does not give a proper error message for nested istrings.
> 
> execi could be extended to reject arguments that contain %s sequences. 

And now suddenly you can no longer store anything that looks like a 
format string in your data base.

> Or, if there was an embedded istring, the number of %s formats can be 
> checked against the number of arguments.

Maybe at runtime. But why introduce this failure mode in the first place?

> An embedded istring would show a mismatch.
> ...

The error message would be phrased in overly general terms and hence be 
confusing.

> I expect that use of nested istrings would be exceedingly rare. If they 
> are used, wrapping them in text() will make work.

Depends on how exactly they are used. For the SQL case, not allowing 
them is a decent option.

> Besides, would a 
> nested istring in an sql call be intended as part of the sql format, or 
> would a text string be the intended result?
> ...

Whatever it is, with DIP1036e and compile-time SQL construction, user 
data does not make it into the SQL expression sent to the database.

> 
>> - It has to manually parse the format string. It iterates over each 
>> character of the original format string.
> 
> Correct. And it does not need to iterate over and remove all the 
> Interpolation arguments.

Adam's implementation does the filtering at compile time.

The function body will be something like:

auto statement = Statement(db, "...?1...?2...?3..."); // replace ... by 
query
int number = 0;
statement.bind(++number, firstArg);
statement.bind(++number, secondArg);
statement.bind(++number, thirdArg);


But yes, DIP1036e does make some concessions and it will indeed pass 
empty struct arguments in case the function is not inlined (could use 
pragma(inline, true) to avoid it.)

> Nor does it need the extra two arguments, which 
> aren't free of cost.
> ...

Are you really going to argue that some extra empty struct arguments are 
in some way more expensive than runtime query construction including 
format string parsing and query construction using GC strings?

But anyway, if you think interpolation is not worth runtime overhead 
that would perhaps need to be mitigated using additional features or an 
improved calling convention, that's up to you, but then DIP1027 loses too.

> 
>> - It (ironically!) constructs a new format string, the original one 
>> was useless.
> 
> Yes, it converts the format specifiers to the sql ones. Why is this a 
> problem?
> ...

You argued earlier like it is in some way an ironic benefit of DIP1027 
that the DB interface requires something that is similar to a format 
string under the hood. Well, it does not require the kind of format 
string that DMD is generating.

> 
>> - If you pass a bad format string to it (for example, by specifying a 
>> manual format), it will just do nonsense, while DIP1036e avoids bad 
>> format strings by construction.
> 
> What happens when ?3 is included in a DIP1036 istring? `i"string ?3 
> ($betty)" ? I didn't see any check for that.

That's a fair point in general, but I was specifically talking about the 
format string that you pass into the function that accepts the istring, 
not similar kinds of strings that may or may not be generated in the 
implementation.

In any case, DIP1027 istrings can also create a format string with `?3`, 
and there no way to check within `execi` if that `?3` came from 
malicious data that was read as input to the program or was put there by 
an incompetent programmer.

> Of course, one could add such a check to the 1036 execi.
> ...

With DIP1036e the check could be done at compile time.

> printf format strings are checked by the compiler,

As a one-off special case that only supports a specific kind of format 
string.

> and writef format strings are checked by writef.

`writef` allows the format string to be passed as a template parameter 
if compile-time parsing and checking is requested. DIP1027 does not 
naturally support this.

> execi is also capable of being extended 
> to check the format string to ensure the format matches the args.

With DIP1027, you'd have to do it at runtime.


More information about the Digitalmars-d mailing list