dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Ola Fosheim Grøstad ola.fosheim.grostad at gmail.com
Wed Nov 10 10:23:31 UTC 2021


On Friday, 5 November 2021 at 10:13:13 UTC, Guillaume Piolat 
wrote:
> Well you only know that it is meant to be utf8 in the context 
> of the auto-decoding foreach (which must still exist). string 
> in actual programs may contains binary files, strings in other 
> codepages encodings.

I had a look at the [documentation]( 
https://dlang.org/spec/arrays.html#strings ) today, and it said:

«char[] strings are in UTF-8 format.»

I would assume that this is normative? Maybe change the 
documentation to use more forceful specification language so that 
it says: «char[] strings MUST be in UTF-8 format.»

So, I think a messed up ```string``` should be considered a type 
error and it would be good if the compiler checked this 
statically where possible (e.g. literals) and simply assumed it 
to hold when parsing strings (like in a ```for``` loop).

In C++ I use ```span<uint8_t>``` for raw string-slices and 
```span<char8_t>``` for utf8 string-slices. I find that to be 
quite clear. In C++ these are distinct types.

(newbies need a wrapper that is foolproof)





More information about the Digitalmars-d mailing list