Need to do some "dirty" UTF-8 handling

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Jun 25 15:17:36 PDT 2011


On 26.06.2011 1:49, Nick Sabalausky wrote:
> "Andrej Mitrovic"<andrej.mitrovich at gmail.com>  wrote in message
> news:mailman.1215.1309019944.14074.digitalmars-d-learn at puremagic.com...
>> I've had a similar requirement some time ago. I've had to copy and
>> modify the phobos function std.utf.decode for a custom text editor
>> because the function throws when it finds an invalid code point. This
>> is way too slow for my needs. I'm actually displaying invalid code
>> points with special marks (just like Scintilla), so I need decoding to
>> work as fast as possible.
>>
>> The new function simply replaces throwing exceptions with flagging a
>> boolean.
> I think I may end up doing something like that :/
>
> I was hoping to be able to do something vaguely sensible like this:
>
> string newStr;
> foreach(dchar dc; str)
> {
>      if(isValidDchar(dc))
>          newStr ~= dc;
>      else
>          newStr ~= 'X';
> }
> str = newStr;
>
> But that just blows up in my face.
>
>
std.encoding to the rescue?
It looks like a well established module that was forgotten for some reason.

And here I'm wondering what a function named sanitize could do :)

-- 
Dmitry Olshansky



More information about the Digitalmars-d-learn mailing list