Interpolated strings and SQL

Nickolay Bukreyev buknik95 at ya.ru
Wed Jan 10 06:20:22 UTC 2024


On Tuesday, 9 January 2024 at 23:21:34 UTC, Walter Bright wrote:
> A compile time way is DIP1027 can be modified to reject any 
> arguments that consist of tuples with other than one element. 
> This would eliminate nested istring tuples at compile time.

To sum up, it works with nested istrings poorly; it may even be 
sensible to forbid them entirely for DIP1027. Glad we’ve reached 
a consensus on this point. This case doesn’t seem crucial at the 
moment though; now we can focus on more relevant questions.

> DIP1036 cannot detect other problems with the string literals. 
> It seems like a lot of complexity to deal with only one issue 
> with malformed strings at compile time rather than runtime.

DIP1036 provides full CTFE capabilities at your disposal. You can 
validate _anything_ about a format string; any 
compile-time-executable hypothetical `validateSql(query)` will 
fit. I guess none of the examples presented so far featured such 
validation because it usually tends to be long and not 
illustrative.

However, another Adam’s example [does 
perform](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/07-html.d#L13) non-trivial compile-time validation. Here is how it is [implemented](https://github.com/adamdruppe/interpolation-examples/blob/a8a5d4d4ee37ee9ae3942c4f4e8489011c3c4673/lib/html.d#L97).

>> Constructing it at compile time is essential so that we can 
>> validate the generated SQL and abort compilation, as Paolo 
>> [demonstrated](https://forum.dlang.org/post/qbtbyxcglwijjbeygtvi@forum.dlang.org).
>
> That only checks one aspect of correctness - nested string 
> interpolations.

They check a lot more. I agree it is hard to spot the error 
messages in the linked post so I’ll copy them here:

     relation "snapshotsssss" does not exist. SQL: select size_mm, 
size_px from snapshotsssss where snapshot_id = $1

     role "dummyuser" can't select on table "snapshots". SQL: 
select size_mm, size_px from snapshots where snapshot_id = $1

As you can see, they check sophisticated business logic expressed 
in terms of relational databases. And all of that happens at 
compile time. Isn’t that a miracle?

>> I explained 
>> [here](https://forum.dlang.org/post/qkvxnbqjefnvjyytfana@forum.dlang.org) why these two arguments are valuable. Aren’t free of cost—correct unless you enable inlining. `execi` may require some changes (like `filterOutEmpty` I showed above) to make them free of cost, but it is doable.
>
> You'd have to also make every formatted writer a template,

Err… every formatted writer has to be a template anyway, doesn’t 
it? It needs to accept argument lists that may contain values of 
arbitrary types.

> …and add the filter to them.

Yeah. I admit this is a problem. As a rule of thumb, the most 
obvious code should yield the best results. With DIP1036, this is 
not the case at the moment: when you pass an interpolation 
sequence to a function not specifically designed for it, it 
wastes more stack space than necessary and passes useless junk in 
registers.

Others have mentioned that DIP1027 performs much worse in terms 
of speed (due to runtime parsing). While that is undoubtable, I 
think DIP1036 should be tweaked to behave as good as possible.

There was an idea in this thread to improve the ABI so that it 
ignores empty structs, but I’m rather sceptical about it.

Instead, let us note there are basically two patterns of usage 
for istrings:

1. Passing to a function that processes an istring and does 
something non-trivial. `execi` is a good example.
2. Passing to a function that simply stringifies every fragment, 
one after another. `writeln` is a good example.

Something counterintuitive, case 1 is easier to address: the 
function already traverses the received sequence and transforms 
it. So it is only necessary to write it in such way that it is 
inline-friendly.

By the way, what functions do we have in Phobos that fall into 
the case-2 category? `write`/`writeln`, `std.conv.text`, 
`std.logger.core.log`, and… is that all? Must be something else!..

Turns out there are only a handful of relevant functions in the 
entire stdlib. It shouldn’t be hard to put a filter in each of 
them. It also hints they are probably not that common in the wild.

However, when one encounters a third-party `write`-like function 
that is unaware of `InterpolationHeader`/etc., they should have a 
means to fix it from outside, i.e., without touching its source 
and ideally without writing a wrapper by hand. Unfortunately, I 
could not come up with a satisfactory solution for this. Will 
keep thinking. Perhaps someone else manages to find it faster.

---

An idea in a different direction. Currently, 
`InterpolationHeader`/etc. structs interoperate with `write`-like 
functions seamlessly (at the expense of passing zero-sized 
arguments) due to the fact they all have an appropriate 
`toString` method. If we remove those methods (and do nothing 
else), then `write(i"a$(x)b")` would produce something like:

     
InterpolationHeader()InterpolatedLiteral!"a"()InterpolatedExpression!"x"()42InterpolatedLiteral!"b"()InterpolationFooter()

The program, rather than introducing a silent inefficiency, 
immediately tells the user they need to account for these types.

---

And one more idea. Current implementation of DIP1036 can emit 
empty chunks—i.e., `InterpolatedLiteral!""`—see for example 
`i"$(x)"`. If I was making a guess why it does so, I would say it 
strives to produce consistent, regular sequences. On the one 
hand, it might ease the job of interpolation-sequence handlers: 
they can count on the fact that expressions and literals always 
alternate inside a sequence. On the other, they have to check if 
a literal is empty and drop it if it is so it actually makes 
their job harder.

I do not know whether not producing empty literals in the first 
place would be a positive or negative change. But it is something 
worth to consider.

---

Slightly off-topic: when I was thinking about this, I was 
astonished by the fact istrings can work with 
`readf`/`formattedRead`/`scanf`. Just wanted to share this 
observation.

```d
readf(i" $(&x) $(&y)");
```

> The compiler will indeed reject it (The error message would be 
> a bit baffling to those who don't know what Interpolation types 
> are)

This is true. I suppose the docs should mention 
`InterpolationHeader` and friends when talking about istrings, 
explain what an istring is lowered to, and show examples. Then a 
programmer who’ve read the docs will have a mental association 
between “istring” and “InterpolationHeader/Footer/etc.” Those who 
don’t read the docs—well, they won’t have. Only googling will 
save them.

To be honest, I’m not concerned about this point too much.

> along with any attempt to call execi() with a pre-constructed 
> string.
>
> The end result is that to do manipulation with istring tuples, 
> the programmer is alternately faced with adding Interpolation 
> elements or filtering them out. Is that really what we want?

I’d argue it is wonderful that `execi` cannot be called with a 
pre-constructed string. The API should provide another function 
instead—say, `execDynamicStatement(Sqlite, string, Args...)`. 
`execi` should be used for statically known SQL with interpolated 
arguments, and `execDynamicStatement`—for arbitrary SQL 
constructed at runtime. A verbose name is intentional to 
discourage its usage in favour of `execi`.

> P.S. most keyboarding bugs result from neglecting to add needed 
> syntax, not typing extra stuff.

That makes sense. Though you’ll never guess what beast can be 
spawned by uncareful refactoring. Extra protection won’t harm, 
especially if it’s zero-cost.

P.S. Zero-initialization of variables is one of D’s cool 
features, indeed.


More information about the Digitalmars-d mailing list