The state of string interpolation...one year later

Sun Mar 17 06:01:35 UTC 2019

It's been about a year since I submitted an implementation for 
interpolated strings:

https://github.com/dlang/dmd/pull/7988

In that time, various people have been popping up asking about 
it. There has been alot of discussion around this feature on the 
forums and the place we left off was with Andrei saying that we 
should:

* continue to explore alternative library solutions
* focus on improving existing features instead of adding new 
features

At the request of Andrei, I implemented a small library solution 
as well (https://github.com/dlang/phobos/pull/6339) but the 
leadership never followed up with it.  And that's ok, they only 
have so much time and they need to prioritize how they feel is 
best.

With that, I read through some discussion and thought it could be 
helpful to summarize my thoughts on the matter since people 
continue to ask questions about it.

In my mind, there's really only one reason for string 
interpolation...

     Better Syntax

In many ways syntax isn't that important.  There's alot of 
subjectivity around it, but sometimes a change can make it 
objectively better.  Any time you can make syntax objectively 
better, you're making code easier to read, write and maintain.  
Better syntax means it's easier to write "correct code" and 
harder to write "incorrect code".

I recall Atila arguing that the syntax without string 
interpolation wasn't that bad. Then he provided this example 
(https://forum.dlang.org/post/jahvdekidbugougmyhgb@forum.dlang.org):

     text("a is", a, ", b is ", b, " and the sum is: ", a + b)

Ironically, his example had a mistake, but it was hard to notice. 
Look at the same example with string interpolation:

     text("a is$a, b is $b and the sum is: $(a + b)")

You could say that "better syntax" is one of the main reasons D 
exists.  It's the main if not the only reason for alot of 
features like UFCS and foreach.

Andrei's biggest critique is that we should firt try to implement 
this in a library...and he's completely right to ask that 
question.  The problem is that over the years, no one's been able 
to achieve a library solution that results in a nice syntax.  
Having a poor syntax is a bad sign for a feature that only exists 
to improve syntax.  However, even if we could make the syntax 
better, there are still a handful of reasons why a library 
solution can't measure up to language support.  Including the 
"poor syntax", the following are the 5 CONS I see with a library 
solution:

CON 1. The syntax is "not nice".  This defeats the entire point 
of interpolated strings. Saying that interpolated strings aren't 
popular because people are not using a library for them is like 
saying Elvis isn't popular because people don't like elvis 
impersonators.  People not liking a poor imitation of something 
doesn't say anything about how they feel about the genuine 
article.

CON 2. Real error messages.  What's one of the most annoying 
parts of mixins?  Error messages.  When you get an error in a 
mixin, you don't get a line of code to go fix, you get an 
"imaginary" line that doesn't exist.  With library solutions, you 
can't point syntax errors inside interpolated strings to source 
locations.  That information is not available to the language.  
When you get a string, you don't know where each character inside 
that string originated from, only the compiler knows that.

CON 3. Performance.  No matter what we do, any library solution 
will never be as fast as a language solution. The reason why 
performance is especially important here, is because bad 
performance means developers will have to chose between better 
syntax or faster compilation.  We already see this today with 
templates and mixins.  With a language implementation, developers 
can have both.

CON 4. IDE/Editor Support.  A library solution won't be able to 
have IDE/Editor support for syntax highlighting, auto-complete, 
etc.  When the editor sees an interpolated string, it will be 
able to highlight the code inside it just like normal code.

CON 5. Full solution requires full copy of lexer/parser.  One big 
problem with a library solution is that it will be hard for a 
library to delimit interpolated expresions.  For full support, it 
will need to have a full implementation of the compiler's 
lexer/parser.  Without that, it will have limitations on the kind 
of code that can be inside an interpolated string.  Take the 
following (contrived) example:

foreach (i; 0 .. 10)
{
     mixin(interp(`$( i ~ ")" ) entry $(array[i])`));
}

The library solution needs to parse that interpolated string but 
needs to know that the right paren at `")"` is actually just a 
string literal inside the expression and not a right paren to 
delimit the end of the expression.  This is a contrived example, 
but if you have anything less than a full lexer/parser then 
developers are going to have a hard time being able to know what 
can and can't go inside an interpolated expression.  By having 
interpolated strings as a part of the langugage, the 
implementation has full access to the lexer/parser, so it doesn't 
need to force any limitation on the syntax available inside 
interpolated string expressions.

---

Now I'm not saying that the CONS of the library solution justify 
the addition of interpolated strings to the language.  I focused 
on that because that is Andrei's main sticking point.  Even if 
everyone agrees that library solution's don't work (and we can't 
enhance the language to make them work), we still need to show 
that the feature is going to be popular/useful enough to justify 
a new type of string literal.  The usefulness of the feature 
needs to outweight the work to support it.  The more features we 
add to D, the more developers need to learn to understand it.  
That being said, I consider the implementation and complexity it 
adds to be quite minimal (see the PR for more details). As for 
the usefullness, I can say personally I would use this feature to 
replace almost all my usages of writefln/format and writeln which 
would be a big shift for my projects.  Instead of:

writefln("My name is %s and my age is %s and my favorite hex is 
%s", name, age, favnum);

I will be writing:

writeln(i"My name is $name and my age is $age and my favorite hex 
$(favnum.formatHex)");

When I generate code, instead of:

     return
         returnType ~ ` ` ~ name ~ `(` ~ type ~ ` left, ` ~ type ~ 
` right)
         {
             return cast(` ~ returnType ~ `)(left ` ~ op ~ ` 
right);
         }
     `;
  It will be

      return text(iq{
         $returnType $name($type left, $type right)
         {
             return cast($returnType)(left $op right);
         }
     });

When I generate HTML documents in my cgi library, instead of:

     writeln(`<html><body>
     <title>`, title, `</title>
     <name>`, name, `</name><age>`, age, `</age>
     <a href="`, link, `">`, linkName, `</a>
     </body></html>
`);

or even:

     writefln(`<html><body>
     <title>%s</title>
     <name>%s</name><age>%s</age>
     <a href="%s">%s</a>
     </body></html>
`, title, name, age, link, linkName);

It will be:

     writeln(i`<html><body>

     <title>$title</title>
     <name>$name</name><age>$age</age>
     <a href="$link">$linkName</a>
     </body></html>
`);

When I first saw interpolated strings I didn't immediately 
realize the benefit of them.  Using them eliminates the problem 
of keeping format strings in sync with arguments.  It also avoids 
the "noise problem" you get when you alternate between code and 
expressions inside a function call, i.e. `writeln("a is", a, ", b 
is ", b)`. That pretty much sums up the benefits in my mind.

So what's next? I'm curious where leadership currently stands.  
What's their thoughts on the library solutions that have been 
presented? What do they think of the 5 CONS I've presented that 
all library solutions will have?  What's their opinion on the 
usefullness of the feature? For me personally, I am surprised at 
the amount of interest this feature continues to garner.  I think 
the feature is a net positive for D, but then again I don't think 
it's a "make or break" feature.  Just a "nice addition".  Anyway 
those are my thoughts. Sorry for the long post.  I hope it's 
helpful and ultimately makes D better.