Wanted: Format character for source code literal

Fri May 7 23:51:13 UTC 2021

On Thursday, 6 May 2021 at 08:49:16 UTC, Berni44 wrote:
> On Wednesday, 5 May 2021 at 19:53:10 UTC, Q. Schroll wrote:
>> The new `format` implementation could do three things when 
>> encountering `%D` for formatting an object of a type with 
>> custom formatting:
>
> For me, this seems to be the wrong way to think about it. 
> `format` doesn't encounter specifiers, but objects (in the 
> wider sense). And in case of structs, classes and so on it 
> delegates the handling of formatting to them, without even 
> looking at the specifier (with the exception of `%s` which 
> sometimes plays a special role).

The role of `%s` is special, but not too special either. It just 
gives a best effort result where other formats would just fail. 
The task to return a string representation that can be 
interpreted back is nothing to be delegated to a user-defined 
routine.

> It's then up to that struct or class to define the meaning of 
> `%D` for that specific struct or class.

This makes `%D` unreliable for meta-programming. And this is 
_the_ problem I have with this, because creating a 
compiler-readable string from an object is a meta-programming 
tool. I have no idea _what else_ you'd even do with it.

Here's the showstopper: Adding a `toString` that accepts format 
specifiers becomes a potentially breaking change as it will 
change the meaning of `%D` silently.

>> Because `%D` for `bool`, integers ([...]), `floats`, arrays, 
>> and AAs is nothing different from `%s`.
>
> That's not true: bytes need a cast, longs a trailing 'L',

It depends what you want to do with it. If you want the immediate 
type of the literal to be what you plugged in, then yes. If being 
equal suffices, `"1"` and `"true"` are the same.

> like reals, floating point numbers are truncated with `%s` and 
> don't provide the correct value

_That,_ on the other hand, _is_ a problem. I don't know how big 
that problem practically is because `real` cannot even be 
formatted at CTFE and `double` and `float` aren't that common of 
things at compile-time. I guess the only sane result for floating 
point values is `%a` with sufficient digits anyways and that is 
largely apart from `%s` even if you add a gigantic precision.

It's a breaking change fixing `%s` for floating point values in 
the sense that the representation consists of enough decimals to 
accurately represent the number.

> and so on. There are a lot of subtle differences

The problem of strings and chars is obvious, the case for exact 
types is, too. Floating point types didn't cross my mind, but 
please elaborate, what else is it? I'm honestly interested.

If `%(%s%)` does not give you proper char or string, I'd consider 
it a bug.

> and that's why I think it would be a good thing to have this 
> new format character.

I agree with you that a new format is necessary to achieve this 
if done with a format character to begin with. I do question 
whether format characters are the right approach. To me, this 
looks more like a code generation tool than value formatting.

>> The only part where you'd need something different than `%s` 
>> is characters, strings. That would be handy to have, I must 
>> admit. You can mimic it using arrays tho
>
> That was actually the starting point for me that led me to a 
> desire for having `%D`: `%s` for arrays tries to mimic the 
> intended result of `%D` (but fails at several places to do so 
> correctly) and therefore treats characters and strings special. 
> This led to the abuse of the `-`-flag (in `"%-(...%)`) which 
> now causes a lot of problems. I thought long about how this 
> could be fixed: With `%D` available, there would be a smoother 
> transition be possible, because people using `%s` inside of 
> `%(...%)` could just replace it with `%D` to get the current 
> result and that eventually will make it possible to give `%s` 
> (and the `-`-flag) its correct meaning back. (Of course this 
> still needs deprecation cycles and maybe a preview switch or 
> what else - it's still not easy.)

The `%-(...%)` a hack, but it can be questioned whether removing 
it is even worth the trouble. It just breaks things. The minus 
has otherwise no meaning for arrays. It's just weird.

>> And it's almost perfect! It works for character types, numeric 
>> types, arrays, and AAs, too.
>
> As I wrote above: That might look so at first sight, but it 
> isn't the case.

Right. I was a little enthusiastic about it.

>> The `$` only has that meaning if it's preceded by a number. 
>> `%`*N*`$`*…c* has a meaning for *N* a number and *c* a 
>> character possibly preceded by other formatting stuff. But 
>> `%$` is undefined in the sense that it is an error to use it.
>
> But people will start to use it with width and other parameters 
> and will report issues. Let along, that it will complicate the 
> format spec parser significantly and thus might even introduce 
> more bugs. I'm sorry, but with `%$` you'll opening the box of 
> pandora.

It requires a single check: Is the `%` character followed by `$`? 
The whole point of `%$` would be that it is not customizable. You 
cannot add any specification. If something comes before `$`, it 
isn't `%$`, and if something comes behind, it's not part of the 
format specifier, but just text.

---

I've been thinking about this a little. What is your goal? Maybe 
we're talking at cross purposes. I guess you want a format 
specifier that formats any _built-in_ type in a way that 
represents the object precisely. In a sense, you want a good `%s` 
and not a not-really-the-best-effort `%s`. My understanding was 
you want to represent objects as strings in a way that can be 
used by the compiler to reconstruct the object, and for what else 
than meta-programming would one do that? It's in a sense trivial 
for built-in types because it's a finite set of types.

Thinking about it, you can easily wrap objects in a struct and 
make it do The Right Thing™. It doesn't complicate the `format` 
implementation.