coreutils with D trials, wc, binary vs well formed utf

Dukc ajieskola at gmail.com
Mon May 24 22:31:19 UTC 2021


> Is there a(n easy-ish) way to fix up that wc.d source in the 
> blog to fallback to byte stream mode when a utf-8 reader fails 
> an encoding?

Rewrite `toLine`:

```
Line toLine(char[] l) pure
{ import std.array : array;
   import std.algorithm : filter;
   import std.utf : byDchar, replacementDchar;

   auto valid = l.byDchar.filter!(c => c!=replacementDchar).array;
   return Line(valid.byCodePoint.walkLength, 
valid.splitter.walkLength);
}
```

This just ignores malformed UTF without counting it in. It has 
one problem though, for some reason it seems to ignore one 
character after a malformed cone unit. I don't know why.


More information about the Digitalmars-d-learn mailing list