dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Mon Nov 15 10:19:20 UTC 2021

On Friday, 12 November 2021 at 10:42:15 UTC, kdevel wrote:
> On Thursday, 11 November 2021 at 07:58:54 UTC, Ola Fosheim 
> Grøstad wrote:
>> On Thursday, 11 November 2021 at 01:31:46 UTC, Elronnd wrote:
>>> I agree this should be required.  If you want something which 
>>> is not valid UTF-8, _do not put it into a string_.  Use 
>>> ubyte[].
>>
>> Exactly.
>
> [...]
>
>> The compiler could do such checks in an extra-solid-debug-mode.
>
> This requires lots of changes or additions
>
> ```
> import std.stdio;
> import std.file;
>
> void main ()
> {
>    ubyte [] filename = [ 'a', 0x80, 'b', '\0' ]; // valid 
> filename in some OS
>    auto s = readText (filename);
> }
> ```
>
> This does not yet compile:
>
>    [...]
>           R = ubyte[]`
>      must satisfy one of the following constraints:
>    `       isSomeChar!(ElementType!R)
>           is(StringTypeOf!R)`

One idea that has come up would be compile time checking of 
strings.

But thinking about the garbage in garbage out concept in general, 
maybe functions should really just accept data and it's the 
callers responsibility that it's valid.

This becomes a philosophical discussion, but could maybe be 
interesting (increased compile times ofc, but could be worth it). 
This would be more of a D3 thing. The Erlang path is fail fast. 
Fix the error at it's root.

Don't get me wrong, I understand why phobos is the way it is now, 
and it works. It's more in the "ideas to explore" category. One 
might say "but what about external data, I don't know if that's 
valid". The answer there would be to sanitize it before passing 
it to the function. It would also be better from a composability 
viewpoint.

In summary: Keep the functions themselves short and friendly. 
Make the data in correct. Put the constraints outside the 
function.

Pros and cons as with everything ofc