Indentation-aware multi-line string literals (and/or an equivalent compile-time function)
WraithGlade
wraithglade at protonmail.com
Wed Feb 26 23:18:10 UTC 2025
Hello good people of the D forum!
There's an idea I've long wished was available in programming
languages (off and on for many years whenever it occurs to me)
that would likely be very simple to implement and yet also very
useful in many contexts.
I'm not even aware of any programming language that has this
feature, despite how simple and widely useful it would be, and so
I think this is a great opportunity for the improving the D
language!
Basically, the idea is that there should be a variant of raw
("WYSIWYG") strings (a.k.a. multi-line strings when newlines are
contained within the string) which is aware of the indentation
level of the code it is being used in and compensates accordingly
so that the programmer does not have to write the text in a way
that does not *visually* respect the current indentation level of
the code.
I think perhaps it should also remove leading and trailing
newlines (but not internal newlines), such that it is also useful
for cleanly writing larger bodies of text into code in a way that
doesn't look crammed in vertically either.
Here is a comparative example of one possible syntax for such:
```
//Traditional multi-line string syntax (ugly, jarring):
string s1 = `Line 1
Line 2
Line 3`;
//Indentation-aware multi-line string syntax (clean):
string s2 = ``
Line 1
Line 2
Line 3
``;
static assert (s1 == s2);
```
Another possible implementation that occurred to me would be that
a compile-time-usable string function could be added that when
appended to a string would accomplish the same effect as the
above without any runtime overhead. That case could look like so:
```
string s = `
Line 1
Line 2
Line 3
`.unindent;
```
(or something like that)
I'm not certain which approach would be better, but I am certain
that it would be a widely useful feature to have included with
D's standard library and documentation since it is such a
desirable and common use case.
**Ideas for implementation and possible nuances:**
The algorithm for interpreting the indentation of such
indentation-aware string literals could work by scanning
backwards from the start of the opening delimiter of the string
to find the first non-whitespace character of that line and then
use that to determine the what the current indentation level is.
If the indentation level is ambiguous due to the presence of a
mix of spaces and tabs then the compiler can simply report an
error and refuse to compile that literal until it is made
unambiguous through the absence of mixed spaces and tabs.
Removing all leading and trailing newlines (except the natural
ending newline of the last line, and not deleting any *internal*
newlines) would make it easier to ensure that large bodies of
text remain readable. The ability to use extra whitespace can
help in such cases and what the right amount is could vary a bit
potentially.
Alternatively, perhaps only the 1st or 2nd leading and/or
trailing newlines could be removed, in order to enforce a
standard amount of newlines for the included text body.
Another idea is that the literal and/or compile-time function
could be parameterized so that whether or not to trim/strip the
leading and/or trailing newlines (and/or other things) could be
specified.
**A few use case examples:**
- Using D as an ad-hoc text templating system for markup
languages such as HTML and such, without the resulting inline
text in the generating D code looking ugly.
- Code generation for other programming languages (similar to the
above item) and any related compiled and interpreted uses where D
acts as a generator, keeping the text cleaner.
- Handling moderate to large bodies of text, such as can be found
in many terminal-based programs and/or hobbyist video games (e.g.
roguelikes) and many other general application contexts in a form
that is clean enough that it is no longer so often necessary to
maintain separate text files that have to be loaded as files.
This could be very useful I think, even though it seems so simple.
Seemingly trivial workflow factors like this can have a much
bigger effect on what one uses something for than one may expect.
**My own current use case context:**
For example, I myself am planning on eventually converting my
personal website (which is 100% static and currently uses only
straight HTML & CSS to avoid computational waste and arbitrary
formatting restrictions) to be generated from D source files
instead of working directly with HTML files and all their myriad
limitations and oddities.
I searched for "static site generators" and "web templating
systems" for that use case but was very put off by the fact that
they were nearly always over-engineered, riddled with
dependencies, vendor-locked, and/or made lots of rigid
assumptions about the format and contents of the pages and
directories they generate. In contrast, a D-based system for
generating a static web page would be far cleaner since it would
allow completely arbitrary computational generality instead of
falling victim to the "inner platform effect" anti-pattern and
such.
That (using D to generate the site and any arbitrary other files
I want) is what I plan to do regardless of whether this feature
makes it in, but I was reminded of my long-time desired feature
of indentation-aware strings in a programming language when I
realized that the only real shortcoming (in my mind, for what I
want) for the generation of my site from D code is that
multi-line strings would look ugly. I will workaround that in the
meantime when the time comes.
It's true of course that I could import and/or load strings from
files separately, but that is often (for many cases at least) not
as pleasant as having the strings and their usage context
*directly available in the code right alongside their context*,
which would be made much cleaner by having indentation-aware
multi-line strings of any possible length.
I suspect many other people would get good use out of it too.
The fact that I've never seen the feature in any other language
despite it being so obviously useful would also be a good
opportunity for D to claim first (or at least early or uncommon)
dibs on the feature's presence potentially!
Indentation-aware multi-line strings would also be a very natural
fit for D's already strong support for cleanly allowing for
arbitrary nesting of structures, such as its ability to put an
`import` statement nested at any point inside functions or code
blocks. Thus, the idea is also very naturally "D-like" in that
respect I think.
It would be really useful to have that built in to the language!
What do you guys and gals think?
More information about the dip.ideas
mailing list