dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Guillaume Piolat first.last at gmail.com
Wed Nov 10 11:47:16 UTC 2021


On Wednesday, 10 November 2021 at 10:23:31 UTC, Ola Fosheim 
Grøstad wrote:
> On Friday, 5 November 2021 at 10:13:13 UTC, Guillaume Piolat 
> wrote:
>> Well you only know that it is meant to be utf8 in the context 
>> of the auto-decoding foreach (which must still exist). string 
>> in actual programs may contains binary files, strings in other 
>> codepages encodings.
>
> I had a look at the [documentation]( 
> https://dlang.org/spec/arrays.html#strings ) today, and it said:
>
> «char[] strings are in UTF-8 format.»
>
> I would assume that this is normative? Maybe change the 
> documentation to use more forceful specification language so 
> that it says: «char[] strings MUST be in UTF-8 format.»

I'm not sure what is intended.

import("file.stuff") yields string.
So there is at least one gap, as it is often used with binary 
files that ain't UTF-8.

Also look at that signature: 
https://dlang.org/phobos/std_utf.html#validate
By spec it shall only return true then.

It seems in practice it doesn't have to be utf-8 until you use 
something that assume it is. Which is ok for me.


More information about the Digitalmars-d mailing list