DIP 1027--String Interpolation--Final Review Feedback Thread

Mon Feb 3 08:58:03 UTC 2020

On Monday, 3 February 2020 at 06:54:14 UTC, Walter Bright wrote:
> On 2/2/2020 7:06 PM, Adam D. Ruppe wrote:
>>> Mixing Conventional Format Arguments With Interpolated Strings
>> 
>> This DIP proposes leaving all % unmodified in the string, yet 
>> it also injects % characters into the string. This is a 
>> mistake - it forces user code to be aware of implementation 
>> details and carefully encode all special characters. Web 
>> programmers learned the hard way the problems of sloppy 
>> encoding.
>
> A % is injected only in the case of $Argument where %s is 
> injected. This is the default format. If other formats are 
> desired, ${FormatString} is the syntax. If the user wants a %% 
> in the rest of the string, he can add it. The user is expected 
> to know what the intended target of the format string is and 
> cater to it.
>
>
>> As it stands, consuming functions have no way to tell if the 
>> first %s from `i"%s $foo"` is meant to go with the subsequent 
>> argument `foo` or the latter; it will throw off all future 
>> processing.
>>
>> The DIP must be amended to specify that ALL % characters in 
>> the i"" string are replaced with %% in the yielded string.
>
> This is entirely up to the user to use %% where appropriate. 
> Your proposed change will inadvertently wed it to the printf 
> format.
>
>
>>> W and D Interpolated Strings
>> 
>> should work, the DIP rationale is poor. This is an arbitrary 
>> limitation and inconsistency with the rest of the language.
>
> I doubt anyone would use it. We can always add it later if 
> desired, but removing it would be painful. It's not optimal to 
> add features unless there is a clear and present need for it.
>
> The original specification carefully treated the various string 
> encodings equally. But as 20 years have passed, it's become 
> very clear that UTF-8 is the hands-down winner and the W and D 
> formats are aberrations.
>

Sorry, UTF-16 is real and exists. D capacity to process it 
directly is one of its strong points (even if a little bit 
neglected in phobos). There are a lot of corpora that are encoded 
in UTF-16 and forcing to process them in UTF-8 is annoying as the 
conversion step is more costly than often realized (it breaks 
memory mapping of files and direct slicing introducing memory 
allocations where none would be necessary). It is for this reason 
that wstring should not be neglected. For dstring it is indeed 
more difficult to make a case.