enum Format

Steven Schveighoffer schveiguy at gmail.com
Thu Jan 11 02:22:18 UTC 2024


On Wednesday, 10 January 2024 at 19:53:48 UTC, Walter Bright 
wrote:
>> And you can get rid of the runtime overhead by adding a 
>> `pragma(inline, true)` `writeln` overload. (I guess with DMD 
>> that will still bloat the executable,
>
> I didn't mention the other kind of bloat - the rather massive 
> number and size of template names being generated that go into 
> the object file, as well as all the uncalled functions 
> generated only to be removed by the linker.

Yes, DIP1036e has a lot of extra templates generated, and the 
mangled name is going to be large.

Let's skip for a moment the template that writeln will generate 
(which I agree isn't ideal, but also is somewhat par for the 
course).

This shouldn't be a huge problem for the interpolation *types* 
because the type doesn't get included in the binary. It is a big 
problem for the `toString` function, because that *is* included.

However, we can mitigate the ones that return `null`:

```d
string __interpNull() => null;

struct InterpolatedExpression(string expr)
{
   alias toString = __interpNull;
}

... // and so on
```

I tested this and it does work. So this reduces all the 
`toString` member functions from `InterpolatedExpression` (and 
`InterpolationPrologue` and `InterpolationEpilog`, but those are 
not templated structs anyway) to one function in the binary.

But we can't do this for `InterpolatedLiteral` (which by the way 
is improperly described in Atila's DIP, the associated `toString` 
member function should return the literal).

We can do possibly a couple things here to mitigate:

1. We can modify how `std.format` works so it will accept the 
following as a `toString` hook:

```d
struct S
{
    enum toString = "I am an S";
}
```

This means, no function calls, no extra long symobls in the 
binary (since it's an enum, it should not go in), and I think 
even the compilation will be faster.

2. We modify it to be aware of `InterpolationLiteral` types, and 
avoid depending on the `toString` API. After all, we own both 
Phobos and druntime, we can coordinate the release.

And as a further suggestion, though this is kind of off-topic, we 
may look into ways to have templates that *don't* make it into 
the binary explicitly. Basically, they are marked as shims or 
forwarders by the library author, and just serve as a way to 
write nicer syntax. This could help in more than just the 
interpolation DIP.

>
> As far as I can tell, the only advantage of DIP1036 is the use 
> of inserted templates to "key" the tuples to specific 
> functions. Isn't that what the type system is supposed to do? 
> Maybe the real issue is that a format string should be a 
> different type than a conventional string.

No. While I agree that having a different *type* makes it more 
useful and easier to hook, there is a fundamental problem being 
solved with the compile-time literals being passed to the 
function. Namely, tremendous power is available to validate, 
parse, prepare, etc. string data at compile time, for use during 
runtime. This simply *is not possible* with 1027.

The runtime benefits are huge:
* No need to allocate anything (`@nogc`, `-betterC`, etc. all 
available)
* You get compiler errors instead of runtime errors (if you put 
in the work)
* It's possible generate "perfect forwarding" to another function 
that does use another form. For example, `printf`.
* If you inline the call, it can be as if you called the 
forwarded function directly with the exactly correct parameters.

And I want to continue to point out, that a constructed "format 
string" mechanism just is inferior, regardless if it is another 
type, as long as you don't need formatting specifiers (and 
arguably, it's just a difference in taste otherwise). The 
compiler parsed it out, it knows the separate pieces. Giving 
those pieces directly to the library is both the most efficient 
way, and also the most obvious way. The "format string" 
mechanism, while making sense for writef, *must* add an element 
of complexity to the receiving function, since it now has to know 
what "language" the translated string is. e.g. with DIP1027, one 
must know that `%s` is special and what it represents, and the 
user must know to escape `%s` to avoid miscommunication. With 
1036e, there is no format string, so there is no complication 
there, or confusion. The value being passed is right where you 
would expect it, and you don't have to parse a separate thing to 
know.

Note in YAIDIP, this was done partly through an interpolation 
header, which had all the compile-time information, and then 
strings and interpolated data were interspersed. I find this also 
a workable solution, and could even do without the strings being 
passed interspersed (as I said, we have control over `writeln` 
and `text`), but I think the ordering of the tuple to match what 
the actual string literal looks like is so intuitive, and we 
would be losing that if we did some kind of "format header" 
mechanism.

-Steve


More information about the Digitalmars-d mailing list