Need to do some "dirty" UTF-8 handling

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Jun 25 17:04:04 PDT 2011


On 26.06.2011 3:25, Nick Sabalausky wrote:
> "Dmitry Olshansky"<dmitry.olsh at gmail.com>  wrote in message
> news:iu5n32$2vjd$1 at digitalmars.com...
>> On 26.06.2011 1:49, Nick Sabalausky wrote:
>>> "Andrej Mitrovic"<andrej.mitrovich at gmail.com>   wrote in message
>>> news:mailman.1215.1309019944.14074.digitalmars-d-learn at puremagic.com...
>>>> I've had a similar requirement some time ago. I've had to copy and
>>>> modify the phobos function std.utf.decode for a custom text editor
>>>> because the function throws when it finds an invalid code point. This
>>>> is way too slow for my needs. I'm actually displaying invalid code
>>>> points with special marks (just like Scintilla), so I need decoding to
>>>> work as fast as possible.
>>>>
>>>> The new function simply replaces throwing exceptions with flagging a
>>>> boolean.
>>> I think I may end up doing something like that :/
>>>
>>> I was hoping to be able to do something vaguely sensible like this:
>>>
>>> string newStr;
>>> foreach(dchar dc; str)
>>> {
>>>       if(isValidDchar(dc))
>>>           newStr ~= dc;
>>>       else
>>>           newStr ~= 'X';
>>> }
>>> str = newStr;
>>>
>>> But that just blows up in my face.
>>>
>>>
>> std.encoding to the rescue?
>> It looks like a well established module that was forgotten for some
>> reason.
>>
>> And here I'm wondering what a function named sanitize could do :)
>>
> Ahh, I didn't even notice that module.

Same here, It's just a couple of days(!) ago I somehow managed to find 
decode in the wrong place (in std.encoding  instead of std.utf). And it 
looked useful, but I never heard about it. Seriously, how many totally 
irrelevant old modules we have around here? (hint: std.gregorian!)
> Even if it's imperfect and goes away, it looks like it'll at least get the
> job done for me. And the encoding conversions should even give me an easy
> way to save at least some of the invalid chars (which wasn't really a
> requirement of mine, but it'll still be nice).
>
>
Yeah, given the amount of necessary work in the Phobos realm it could 
hang around for quite sometime ;)

-- 
Dmitry Olshansky



More information about the Digitalmars-d-learn mailing list