dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Elronnd elronnd at elronnd.net
Thu Nov 11 01:31:46 UTC 2021


On Wednesday, 10 November 2021 at 10:23:31 UTC, Ola Fosheim 
Grøstad wrote:
> I had a look at the [documentation]( 
> https://dlang.org/spec/arrays.html#strings ) today, and it said:
>
> «char[] strings are in UTF-8 format.»
>
> I would assume that this is normative? Maybe change the 
> documentation to use more forceful specification language so 
> that it says: «char[] strings MUST be in UTF-8 format.»
>
> So, I think a messed up ```string``` should be considered a 
> type error and it would be good if the compiler checked this 
> statically where possible (e.g. literals) and simply assumed it 
> to hold when parsing strings (like in a ```for``` loop).

I agree this should be required.  If you want something which is 
not valid UTF-8, _do not put it into a string_.  Use ubyte[].

Go further: require a runtime check on cast from ubyte[] to 
char[] (expensive), and on slicing char[] (cheap).  (If you abuse 
unions you are on your own; but obviously that is not allowed in 
@safe code, so has the same limitations as e.g. boundschecking.)


More information about the Digitalmars-d mailing list