coreutils with D trials, wc, binary vs well formed utf
Dukc
ajieskola at gmail.com
Mon May 24 22:31:19 UTC 2021
> Is there a(n easy-ish) way to fix up that wc.d source in the
> blog to fallback to byte stream mode when a utf-8 reader fails
> an encoding?
Rewrite `toLine`:
```
Line toLine(char[] l) pure
{ import std.array : array;
import std.algorithm : filter;
import std.utf : byDchar, replacementDchar;
auto valid = l.byDchar.filter!(c => c!=replacementDchar).array;
return Line(valid.byCodePoint.walkLength,
valid.splitter.walkLength);
}
```
This just ignores malformed UTF without counting it in. It has
one problem though, for some reason it seems to ignore one
character after a malformed cone unit. I don't know why.
More information about the Digitalmars-d-learn
mailing list