proposal string std.utf:sanitizeUTF(string) which returns an always valid UTF8 string

Timothee Cour via Digitalmars-d digitalmars-d at puremagic.com
Sun Dec 18 18:29:16 PST 2016


I keep running into issues due to auto-decoding (arguably a significant
design flaw of phobos) when using strings from external sources (which may
not be 100% valid UTF8) eg see stracktrace [1] on
getSomeExternalString().splitLines,

Could we have something like `sanitizeUTF` in std.utf, to allow for a
simple fix when running into such UTF8 issues see proposal implementation
[2]; the fix would then be:
```
getSomeExternalString().splitLines,
=>
getSomeExternalString().sanitizeUTF.splitLines,
```


[1]
core.exception.AssertError at std/utf.d(2254): Assertion failure
----------------
??:? _d_assert [0x4f4e63]
??:? void std.utf.__assert(int) [0x53a304]
??:? pure nothrow @nogc @safe ubyte
std.utf.codeLength!(char).codeLength(dchar) [0xa5d78191]
??:? pure nothrow @nogc @safe int
std.string.stripRight!(immutable(char)[]).stripRight(immutable(char)[]).__foreachbody2(ref
ulong, ref dchar) [0xa5c42bd9]
??:? _aApplyRcd2 [0x4f9bd1]
??:? pure @nogc @safe immutable(char)[]
std.string.stripRight!(immutable(char)[]).stripRight(immutable(char)[])
[0xa5c42b5c]
??:? pure @property @nogc @safe immutable(char)[]
std.algorithm.iteration.stripRight.MapResult.front() [0xa5cda053]
??:? pure @safe immutable(char)[]
std.array.join!(std.algorithm.iteration.stripRight.MapResult,
immutable(char)[]).join(std.algorithm.iteration.stripRight.MapResult,
immutable(char)[]) [0xa5cda39a]


[2] sanitizeUTF proposal:
// TODO: rangeify to make it work in more situations
string sanitizeUTF(string a){
  import std.utf;
  Appender!string b;
  while(a.length){
    b~=decodeFront!(Yes.useReplacementDchar)(a);
  }
  return b.data;
}
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20161218/2982dff2/attachment.html>


More information about the Digitalmars-d mailing list