The state of string interpolation...one year later
Jonathan Marler
johnnymarler at gmail.com
Sun Mar 17 06:01:35 UTC 2019
It's been about a year since I submitted an implementation for
interpolated strings:
https://github.com/dlang/dmd/pull/7988
In that time, various people have been popping up asking about
it. There has been alot of discussion around this feature on the
forums and the place we left off was with Andrei saying that we
should:
* continue to explore alternative library solutions
* focus on improving existing features instead of adding new
features
At the request of Andrei, I implemented a small library solution
as well (https://github.com/dlang/phobos/pull/6339) but the
leadership never followed up with it. And that's ok, they only
have so much time and they need to prioritize how they feel is
best.
With that, I read through some discussion and thought it could be
helpful to summarize my thoughts on the matter since people
continue to ask questions about it.
In my mind, there's really only one reason for string
interpolation...
Better Syntax
In many ways syntax isn't that important. There's alot of
subjectivity around it, but sometimes a change can make it
objectively better. Any time you can make syntax objectively
better, you're making code easier to read, write and maintain.
Better syntax means it's easier to write "correct code" and
harder to write "incorrect code".
I recall Atila arguing that the syntax without string
interpolation wasn't that bad. Then he provided this example
(https://forum.dlang.org/post/jahvdekidbugougmyhgb@forum.dlang.org):
text("a is", a, ", b is ", b, " and the sum is: ", a + b)
Ironically, his example had a mistake, but it was hard to notice.
Look at the same example with string interpolation:
text("a is$a, b is $b and the sum is: $(a + b)")
You could say that "better syntax" is one of the main reasons D
exists. It's the main if not the only reason for alot of
features like UFCS and foreach.
Andrei's biggest critique is that we should firt try to implement
this in a library...and he's completely right to ask that
question. The problem is that over the years, no one's been able
to achieve a library solution that results in a nice syntax.
Having a poor syntax is a bad sign for a feature that only exists
to improve syntax. However, even if we could make the syntax
better, there are still a handful of reasons why a library
solution can't measure up to language support. Including the
"poor syntax", the following are the 5 CONS I see with a library
solution:
CON 1. The syntax is "not nice". This defeats the entire point
of interpolated strings. Saying that interpolated strings aren't
popular because people are not using a library for them is like
saying Elvis isn't popular because people don't like elvis
impersonators. People not liking a poor imitation of something
doesn't say anything about how they feel about the genuine
article.
CON 2. Real error messages. What's one of the most annoying
parts of mixins? Error messages. When you get an error in a
mixin, you don't get a line of code to go fix, you get an
"imaginary" line that doesn't exist. With library solutions, you
can't point syntax errors inside interpolated strings to source
locations. That information is not available to the language.
When you get a string, you don't know where each character inside
that string originated from, only the compiler knows that.
CON 3. Performance. No matter what we do, any library solution
will never be as fast as a language solution. The reason why
performance is especially important here, is because bad
performance means developers will have to chose between better
syntax or faster compilation. We already see this today with
templates and mixins. With a language implementation, developers
can have both.
CON 4. IDE/Editor Support. A library solution won't be able to
have IDE/Editor support for syntax highlighting, auto-complete,
etc. When the editor sees an interpolated string, it will be
able to highlight the code inside it just like normal code.
CON 5. Full solution requires full copy of lexer/parser. One big
problem with a library solution is that it will be hard for a
library to delimit interpolated expresions. For full support, it
will need to have a full implementation of the compiler's
lexer/parser. Without that, it will have limitations on the kind
of code that can be inside an interpolated string. Take the
following (contrived) example:
foreach (i; 0 .. 10)
{
mixin(interp(`$( i ~ ")" ) entry $(array[i])`));
}
The library solution needs to parse that interpolated string but
needs to know that the right paren at `")"` is actually just a
string literal inside the expression and not a right paren to
delimit the end of the expression. This is a contrived example,
but if you have anything less than a full lexer/parser then
developers are going to have a hard time being able to know what
can and can't go inside an interpolated expression. By having
interpolated strings as a part of the langugage, the
implementation has full access to the lexer/parser, so it doesn't
need to force any limitation on the syntax available inside
interpolated string expressions.
---
Now I'm not saying that the CONS of the library solution justify
the addition of interpolated strings to the language. I focused
on that because that is Andrei's main sticking point. Even if
everyone agrees that library solution's don't work (and we can't
enhance the language to make them work), we still need to show
that the feature is going to be popular/useful enough to justify
a new type of string literal. The usefulness of the feature
needs to outweight the work to support it. The more features we
add to D, the more developers need to learn to understand it.
That being said, I consider the implementation and complexity it
adds to be quite minimal (see the PR for more details). As for
the usefullness, I can say personally I would use this feature to
replace almost all my usages of writefln/format and writeln which
would be a big shift for my projects. Instead of:
writefln("My name is %s and my age is %s and my favorite hex is
%s", name, age, favnum);
I will be writing:
writeln(i"My name is $name and my age is $age and my favorite hex
$(favnum.formatHex)");
When I generate code, instead of:
return
returnType ~ ` ` ~ name ~ `(` ~ type ~ ` left, ` ~ type ~
` right)
{
return cast(` ~ returnType ~ `)(left ` ~ op ~ `
right);
}
`;
It will be
return text(iq{
$returnType $name($type left, $type right)
{
return cast($returnType)(left $op right);
}
});
When I generate HTML documents in my cgi library, instead of:
writeln(`<html><body>
<title>`, title, `</title>
<name>`, name, `</name><age>`, age, `</age>
<a href="`, link, `">`, linkName, `</a>
</body></html>
`);
or even:
writefln(`<html><body>
<title>%s</title>
<name>%s</name><age>%s</age>
<a href="%s">%s</a>
</body></html>
`, title, name, age, link, linkName);
It will be:
writeln(i`<html><body>
<title>$title</title>
<name>$name</name><age>$age</age>
<a href="$link">$linkName</a>
</body></html>
`);
When I first saw interpolated strings I didn't immediately
realize the benefit of them. Using them eliminates the problem
of keeping format strings in sync with arguments. It also avoids
the "noise problem" you get when you alternate between code and
expressions inside a function call, i.e. `writeln("a is", a, ", b
is ", b)`. That pretty much sums up the benefits in my mind.
So what's next? I'm curious where leadership currently stands.
What's their thoughts on the library solutions that have been
presented? What do they think of the 5 CONS I've presented that
all library solutions will have? What's their opinion on the
usefullness of the feature? For me personally, I am surprised at
the amount of interest this feature continues to garner. I think
the feature is a net positive for D, but then again I don't think
it's a "make or break" feature. Just a "nice addition". Anyway
those are my thoughts. Sorry for the long post. I hope it's
helpful and ultimately makes D better.
More information about the Digitalmars-d
mailing list