dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead
kdevel
kdevel at vogtner.de
Mon Nov 8 23:20:22 UTC 2021
On Monday, 8 November 2021 at 12:02:12 UTC, Ola Fosheim Grøstad
wrote:
[...]
> ReplacementChar is not the result of an approximation failure,
> it is corruption of the input (or maybe a foreign encoding).
As in this line I can write down the replacement character '�'
since it is a valid Unicode codepoint (U+FFFD). It even
round-trips correctly. I think the iconv-library [1] has a nice
approach: it stops the conversion among others if it encounters
an invalid input sequence.
The ideal conversion without throwing or using the replacement
character is IMHO generating a list of pairs of ranges, named
"left" and "right". Left contains sucessfully parsed data, right
invalid data. For valid utf-8 input this list has only one
element. The left element of this pair contains the conversion
and the right is empty. From this representation one can easily
compute all required presentations.
[1] https://man7.org/linux/man-pages/man3/iconv.3.html
More information about the Digitalmars-d
mailing list