dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

Elronnd elronnd at elronnd.net
Thu Nov 4 07:51:11 UTC 2021


On Thursday, 4 November 2021 at 02:26:20 UTC, Walter Bright wrote:
> https://issues.dlang.org/show_bug.cgi?id=22473
>
> I've tried to fix this before, but too many people objected.
>
> Are we fed up with this yet? I sure am.
>
> Who wants to take up this cudgel and fix the durned thing once 
> and for all?
>
> (It's unclear if it would even break existing code.)

Assuming the comment by Ali on the linked bug is right, I think 
the current behaviour is correct.

Your complaints:

> It can't be turned off

Sure it can.  You can choose to iterate in another fashion; say, 
by creating your own iterator which folds invalid utf8 into 
replacement characters.

> it throws

Is it better to produce an incorrect result?

A high-quality, non-throwing mechanism for error handling exists. 
  It consists of an _optional_ value which must be explicitly 
unwrapped.  It is also an out-of-band signal; how will I 
distinguish invalid utf8 from a correctly-encoded replacement 
character?

> it may allocate with the gc

So?  If that is the sort of thing you care about, then you will 
@nogc and find an alternate solution.  Lots of core language 
features allocate, like arrays and hash tables.

> it's slow

In the hot path it's the same speed.  In the slow path, 
performance doesn't matter.  In any case, it's useless to give an 
incorrect result faster.


(Notably, this is not exactly _auto_ decoding; it is explicitly 
requested decoding.  And your proposed modification doesn't 
change that fact.)


What is (potentially) questionable imo is that given foreach (c; 
a), c will be inferred to be dchar; you have to explicitly ask 
for char.  Perhaps that default should be reversed.  (This will 
definitely break code, though, and may not be worth it.)

If you want an iterator that generates replacement characters for 
invalid utf8, just create one.  But the default translation 
should be faithful, and that means not generating any result if 
none can be generated.


More information about the Digitalmars-d mailing list