Wanted: Format character for source code literal

Q. Schroll qs.il.paperinik at gmail.com
Wed May 5 19:53:10 UTC 2021


## Discussion

On Wednesday, 5 May 2021 at 08:46:05 UTC, Berni44 wrote:
> On Tuesday, 4 May 2021 at 18:02:50 UTC, Q. Schroll wrote:
>> So you're stuck between a rock and a hard place: Give `%D` 
>> preference over custom format specifiers rendering those that 
>> use `%D` invalid or let `%D` do its custom stuff if 
>> *potentially* supported rendering `%D` useless in generic code 
>> where most of its use-cases would lie.
>
> I fear, I can't follow you. Seems like I don't get your point. 
> Maybe you can give an example?

I'm speaking of aggregate types (structs, classes, etc.) that 
implement `toString` that takes a `FormatSpec` parameter 
alongside the sink to describe the format according to which it 
should be formatted. An example is `std.typecons.Tuple` which 
apart from `%s` accepts `%(...%)` and `%(...%|...%)`. If you try 
to format it with `%D`, it throws a `FormatException`. But like 
any aggregate type, it could start accepting `%D` tomorrow.

The new `format` implementation could do three things when 
encountering `%D` for formatting an object of a type with custom 
formatting:
1. Because it accepts custom formatting, use it, even if it fails 
(throws `FormatException`).
2. Because it accepts custom formatting `try` it. If it fails 
(i.e. throws `FormatException`), fall back to non-custom `%D` 
behavior. (If it succeeds, use the successful result.)
3. Ignore the custom formatting because `%D` is special.

None of these solutions is great.
1. means `%D` cannot be relied upon in generic code, i.e. where 
the type of what you're formatting isn't up to you but someone 
else. _Relied upon_ means in the way you intend `%D` to be used: 
A compiler-readable representation of the object.
2. It could fail in other ways. (Still the best.)
3. Breaks code, at least theoretically. Also, even if today no 
one actually uses `%D`, it might be the perfect match for a 
future aggregate type, but you blocked it.

>>> In my opinion, the main idea behind this formatting routines 
>>> is, to have a simple and short way for formatting output. We 
>>> could use your idea for every other format character too, 
>>> like: `format("%s = %s", character('𝜋'), 
>>> scientificFloatingPoint(3.14))`. We don't do that, because 
>>> it's more convenient to write `format("%c = %e", '𝜋', 3.14)`.
>>
>> Yes, you could. But you could use format specifiers like 
>> `%-3.8f` *without losses* to get to the same result.
>
> ??? Again I'm stuck. What has `%-3.8f` with what I wrote above 
> to do?

Er, you started with scientific notation stuff. My point is that 
introducing _new constructs_ in the format specification such as 
width and precision is would not be an issue if it weren't there 
already, but introducing a format specification _character_ with 
special meaning is.

>> And that's *the* difference between introducing a format 
>> specifier character that should have generic meaning and 
>> introducing, well, anything else. There was no problem 
>> introducing separators like `%,3d` and neither would there be 
>> a problem introducing `%y` for `int` or `double` (whatever it 
>> does), or, for a concrete example, `%S` for `bool` to return 
>> `TRUE` instead of `true`.
>>
>> The problem is introducing *generic* format specifier 
>> *characters*.
>
> What is the difference between "generic" (which as far as I 
> understand you oppose) and adding `%D` for bool, integers, 
> floats, characters, strings, arrays and AAs (which you sound as 
> being OK with, and which is, what I plan to do)?

Because `%D` for `bool`, integers (note that according to Walter, 
`bool` is an integer type), `floats`, arrays, and AAs is nothing 
different from `%s`. The only part where you'd need something 
different than `%s` is characters, strings. That would be handy 
to have, I must admit. [You can mimic it using arrays 
tho](https://run.dlang.io/is/vPOnNx):
```D
auto str = format("prefix %s %(%s%) %s postfix", "before", [ 
"a\nbc" ], "after");
assert(str == `prefix before "a\nbc" after postfix`);
```

And it's almost perfect! It works for character types, numeric 
types, arrays, and AAs, too. Only for user-defined types, you 
have no control, because it does what the user-defined `toString` 
implementation defines `%s` to do. In fact, `%s` might not even 
work with a user-defined type! It could throw an exception (a 
`FormatException` if it's reasonable).

The only thing it doesn't do is respecting `wstring` and 
`dstring` literals. I cannot really estimate if that would be a 
problem, but I guess for the most part, it wouldn't.

>> What we could do is special casing `%$` to mean what you want. 
>> Currently, no matter what type you're formatting, `%$` is an 
>> error in `FormatSpec`. You can give it semantics, no problem, 
>> including one that ignores custom formatting. Even better, 
>> `%$` looks like it's a special case and not some odd-but-legal 
>> custom specifier.
>
> Using `$` would cause real troubles, because it's already used 
> for positional arguments. What would `format("%1$d", 'a');` be 
> supposed to produce? `'a'd` or `97`?

The `$` only has that meaning if it's preceded by a number. 
`%`*N*`$`*…c* has a meaning for *N* a number and *c* a character 
possibly preceded by other formatting stuff. But `%$` is 
undefined in the sense that it is an error to use it.

>> Changing the meaning of `%D` begs for trouble.
>
> `%D` has currently no meaning, so we cannot change it; we can 
> just add it.

`%D` *potentially* has a meaning for existing (or future) 
user-defined types. On the other hand, `%$` has not, because it's 
not up to a user-defined type to define its meaning but to 
`format` (`FormatSpec` to be precise) because currently, 
`FormatSpec` does not support `%$` to begin with.

> I hope, we can figure this out somehow - I sense, that you've 
> got an important point, but I don't understand it. Seems like 
> we are talking past each other.

I guess you thought primarily about the built-in types while I 
primarily thought about user-defined types. I'm happy to clarify.

## Implementation

Now, let's talk about the implementation. It's far easier to talk 
about that in terms of a function. Let's call it `unMixin` 
because the goal is that `mixin(unMixin(obj))` results in `obj` 
or a copy of `obj`. On the other hand, we cannot expect 
`unMixin(mixin(str))` to return `str` because `str` could contain 
unnecessary information and even if it doesn't, it can contain 
context-dependent information that `unMixin` cannot generally 
retrieve.

Simplest example: If `unMixin(1)` returns `"1"`, we're good for 
`1`. If it returns `"cast(int) 1"`, we're also good.


More information about the Digitalmars-d mailing list