dmd foreach loops throw exceptions on invalid UTF sequences, use replacementDchar instead
Patrick Schluter
Patrick.Schluter at bbox.fr
Sat Nov 6 08:33:07 UTC 2021
On Saturday, 6 November 2021 at 06:17:55 UTC, Alexey wrote:
> On Saturday, 6 November 2021 at 05:36:07 UTC, H. S. Teoh wrote:
>>
>> Unfortunately, codepoint != grapheme. This was the fundamental
>> error with autodecoding that made it so bad. It costs us a
>> performance hit but doesn't even produce the right results in
>> return.
>>
>> And even more unfortunately, grapheme segmentation is an
>> extremely convoluted (i.e. slow) operation that normally you
>> would *not* want to do it unless your code absolutely has to.
>>
>>
>> T
>
> ```D
> struct graphstring
> {
> grapheme[] grapheme_elements;
> }
>
> struct grapheme
> {
> dchar[] codepoints;
> }
>
> ```
> Would this really be _that_ slow? also, there is no need to do
> error checks on every action which user may do with
> graphstrings: no need to check on concatenations or slicings,
> for instance. but do checks on conversions from other
> string/ubyte[] types and to those types.
This is 1 grapheme A̶͙̜͚̫̬̻ͅ
(U+0041 U+0336 U+0359 U+0345 U+031c U+035a U+032b U+032c U+033b)
but 9 codepoints (9 dchar, 9 wchar, 17 char (0x41 0xcc 0xb6 0xcd
0x99 0xcd 0x85 0xcc 0x9c 0xcd 0x9a 0xcc 0xab 0xcc 0xac 0xcc 0xbb)
More information about the Digitalmars-d
mailing list