dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead
Elronnd
elronnd at elronnd.net
Thu Nov 4 07:51:11 UTC 2021
On Thursday, 4 November 2021 at 02:26:20 UTC, Walter Bright wrote:
> https://issues.dlang.org/show_bug.cgi?id=22473
>
> I've tried to fix this before, but too many people objected.
>
> Are we fed up with this yet? I sure am.
>
> Who wants to take up this cudgel and fix the durned thing once
> and for all?
>
> (It's unclear if it would even break existing code.)
Assuming the comment by Ali on the linked bug is right, I think
the current behaviour is correct.
Your complaints:
> It can't be turned off
Sure it can. You can choose to iterate in another fashion; say,
by creating your own iterator which folds invalid utf8 into
replacement characters.
> it throws
Is it better to produce an incorrect result?
A high-quality, non-throwing mechanism for error handling exists.
It consists of an _optional_ value which must be explicitly
unwrapped. It is also an out-of-band signal; how will I
distinguish invalid utf8 from a correctly-encoded replacement
character?
> it may allocate with the gc
So? If that is the sort of thing you care about, then you will
@nogc and find an alternate solution. Lots of core language
features allocate, like arrays and hash tables.
> it's slow
In the hot path it's the same speed. In the slow path,
performance doesn't matter. In any case, it's useless to give an
incorrect result faster.
(Notably, this is not exactly _auto_ decoding; it is explicitly
requested decoding. And your proposed modification doesn't
change that fact.)
What is (potentially) questionable imo is that given foreach (c;
a), c will be inferred to be dchar; you have to explicitly ask
for char. Perhaps that default should be reversed. (This will
definitely break code, though, and may not be worth it.)
If you want an iterator that generates replacement characters for
invalid utf8, just create one. But the default translation
should be faithful, and that means not generating any result if
none can be generated.
More information about the Digitalmars-d
mailing list