dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead
Elronnd
elronnd at elronnd.net
Thu Nov 11 01:31:46 UTC 2021
On Wednesday, 10 November 2021 at 10:23:31 UTC, Ola Fosheim
Grøstad wrote:
> I had a look at the [documentation](
> https://dlang.org/spec/arrays.html#strings ) today, and it said:
>
> «char[] strings are in UTF-8 format.»
>
> I would assume that this is normative? Maybe change the
> documentation to use more forceful specification language so
> that it says: «char[] strings MUST be in UTF-8 format.»
>
> So, I think a messed up ```string``` should be considered a
> type error and it would be good if the compiler checked this
> statically where possible (e.g. literals) and simply assumed it
> to hold when parsing strings (like in a ```for``` loop).
I agree this should be required. If you want something which is
not valid UTF-8, _do not put it into a string_. Use ubyte[].
Go further: require a runtime check on cast from ubyte[] to
char[] (expensive), and on slicing char[] (cheap). (If you abuse
unions you are on your own; but obviously that is not allowed in
@safe code, so has the same limitations as e.g. boundschecking.)
More information about the Digitalmars-d
mailing list