Wanted: Format character for source code literal
Q. Schroll
qs.il.paperinik at gmail.com
Wed May 5 19:53:10 UTC 2021
## Discussion
On Wednesday, 5 May 2021 at 08:46:05 UTC, Berni44 wrote:
> On Tuesday, 4 May 2021 at 18:02:50 UTC, Q. Schroll wrote:
>> So you're stuck between a rock and a hard place: Give `%D`
>> preference over custom format specifiers rendering those that
>> use `%D` invalid or let `%D` do its custom stuff if
>> *potentially* supported rendering `%D` useless in generic code
>> where most of its use-cases would lie.
>
> I fear, I can't follow you. Seems like I don't get your point.
> Maybe you can give an example?
I'm speaking of aggregate types (structs, classes, etc.) that
implement `toString` that takes a `FormatSpec` parameter
alongside the sink to describe the format according to which it
should be formatted. An example is `std.typecons.Tuple` which
apart from `%s` accepts `%(...%)` and `%(...%|...%)`. If you try
to format it with `%D`, it throws a `FormatException`. But like
any aggregate type, it could start accepting `%D` tomorrow.
The new `format` implementation could do three things when
encountering `%D` for formatting an object of a type with custom
formatting:
1. Because it accepts custom formatting, use it, even if it fails
(throws `FormatException`).
2. Because it accepts custom formatting `try` it. If it fails
(i.e. throws `FormatException`), fall back to non-custom `%D`
behavior. (If it succeeds, use the successful result.)
3. Ignore the custom formatting because `%D` is special.
None of these solutions is great.
1. means `%D` cannot be relied upon in generic code, i.e. where
the type of what you're formatting isn't up to you but someone
else. _Relied upon_ means in the way you intend `%D` to be used:
A compiler-readable representation of the object.
2. It could fail in other ways. (Still the best.)
3. Breaks code, at least theoretically. Also, even if today no
one actually uses `%D`, it might be the perfect match for a
future aggregate type, but you blocked it.
>>> In my opinion, the main idea behind this formatting routines
>>> is, to have a simple and short way for formatting output. We
>>> could use your idea for every other format character too,
>>> like: `format("%s = %s", character('𝜋'),
>>> scientificFloatingPoint(3.14))`. We don't do that, because
>>> it's more convenient to write `format("%c = %e", '𝜋', 3.14)`.
>>
>> Yes, you could. But you could use format specifiers like
>> `%-3.8f` *without losses* to get to the same result.
>
> ??? Again I'm stuck. What has `%-3.8f` with what I wrote above
> to do?
Er, you started with scientific notation stuff. My point is that
introducing _new constructs_ in the format specification such as
width and precision is would not be an issue if it weren't there
already, but introducing a format specification _character_ with
special meaning is.
>> And that's *the* difference between introducing a format
>> specifier character that should have generic meaning and
>> introducing, well, anything else. There was no problem
>> introducing separators like `%,3d` and neither would there be
>> a problem introducing `%y` for `int` or `double` (whatever it
>> does), or, for a concrete example, `%S` for `bool` to return
>> `TRUE` instead of `true`.
>>
>> The problem is introducing *generic* format specifier
>> *characters*.
>
> What is the difference between "generic" (which as far as I
> understand you oppose) and adding `%D` for bool, integers,
> floats, characters, strings, arrays and AAs (which you sound as
> being OK with, and which is, what I plan to do)?
Because `%D` for `bool`, integers (note that according to Walter,
`bool` is an integer type), `floats`, arrays, and AAs is nothing
different from `%s`. The only part where you'd need something
different than `%s` is characters, strings. That would be handy
to have, I must admit. [You can mimic it using arrays
tho](https://run.dlang.io/is/vPOnNx):
```D
auto str = format("prefix %s %(%s%) %s postfix", "before", [
"a\nbc" ], "after");
assert(str == `prefix before "a\nbc" after postfix`);
```
And it's almost perfect! It works for character types, numeric
types, arrays, and AAs, too. Only for user-defined types, you
have no control, because it does what the user-defined `toString`
implementation defines `%s` to do. In fact, `%s` might not even
work with a user-defined type! It could throw an exception (a
`FormatException` if it's reasonable).
The only thing it doesn't do is respecting `wstring` and
`dstring` literals. I cannot really estimate if that would be a
problem, but I guess for the most part, it wouldn't.
>> What we could do is special casing `%$` to mean what you want.
>> Currently, no matter what type you're formatting, `%$` is an
>> error in `FormatSpec`. You can give it semantics, no problem,
>> including one that ignores custom formatting. Even better,
>> `%$` looks like it's a special case and not some odd-but-legal
>> custom specifier.
>
> Using `$` would cause real troubles, because it's already used
> for positional arguments. What would `format("%1$d", 'a');` be
> supposed to produce? `'a'd` or `97`?
The `$` only has that meaning if it's preceded by a number.
`%`*N*`$`*…c* has a meaning for *N* a number and *c* a character
possibly preceded by other formatting stuff. But `%$` is
undefined in the sense that it is an error to use it.
>> Changing the meaning of `%D` begs for trouble.
>
> `%D` has currently no meaning, so we cannot change it; we can
> just add it.
`%D` *potentially* has a meaning for existing (or future)
user-defined types. On the other hand, `%$` has not, because it's
not up to a user-defined type to define its meaning but to
`format` (`FormatSpec` to be precise) because currently,
`FormatSpec` does not support `%$` to begin with.
> I hope, we can figure this out somehow - I sense, that you've
> got an important point, but I don't understand it. Seems like
> we are talking past each other.
I guess you thought primarily about the built-in types while I
primarily thought about user-defined types. I'm happy to clarify.
## Implementation
Now, let's talk about the implementation. It's far easier to talk
about that in terms of a function. Let's call it `unMixin`
because the goal is that `mixin(unMixin(obj))` results in `obj`
or a copy of `obj`. On the other hand, we cannot expect
`unMixin(mixin(str))` to return `str` because `str` could contain
unnecessary information and even if it doesn't, it can contain
context-dependent information that `unMixin` cannot generally
retrieve.
Simplest example: If `unMixin(1)` returns `"1"`, we're good for
`1`. If it returns `"cast(int) 1"`, we're also good.
More information about the Digitalmars-d
mailing list