dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead

kdevel kdevel at vogtner.de
Mon Nov 8 23:20:22 UTC 2021


On Monday, 8 November 2021 at 12:02:12 UTC, Ola Fosheim Grøstad 
wrote:
[...]
> ReplacementChar is not the result of an approximation failure, 
> it is corruption of the input (or maybe a foreign encoding).

As in this line I can write down the replacement character '�' 
since it is a valid Unicode codepoint (U+FFFD). It even 
round-trips correctly. I think the iconv-library [1] has a nice 
approach: it stops the conversion among others if it encounters 
an invalid input sequence.

The ideal conversion without throwing or using the replacement 
character is IMHO generating a list of pairs of ranges, named 
"left" and "right". Left contains sucessfully parsed data, right 
invalid data. For valid utf-8 input this list has only one 
element. The left element of this pair contains the conversion 
and the right is empty. From this representation one can easily 
compute all required presentations.

[1] https://man7.org/linux/man-pages/man3/iconv.3.html


More information about the Digitalmars-d mailing list