Need to do some "dirty" UTF-8 handling
Jonathan M Davis
jmdavisProg at gmx.com
Sat Jun 25 18:20:18 PDT 2011
On 2011-06-25 17:04, Dmitry Olshansky wrote:
> On 26.06.2011 3:25, Nick Sabalausky wrote:
> > "Dmitry Olshansky"<dmitry.olsh at gmail.com> wrote in message
> > news:iu5n32$2vjd$1 at digitalmars.com...
> >
> >> On 26.06.2011 1:49, Nick Sabalausky wrote:
> >>> "Andrej Mitrovic"<andrej.mitrovich at gmail.com> wrote in message
> >>> news:mailman.1215.1309019944.14074.digitalmars-d-learn at puremagic.com...
> >>>
> >>>> I've had a similar requirement some time ago. I've had to copy and
> >>>> modify the phobos function std.utf.decode for a custom text editor
> >>>> because the function throws when it finds an invalid code point. This
> >>>> is way too slow for my needs. I'm actually displaying invalid code
> >>>> points with special marks (just like Scintilla), so I need decoding to
> >>>> work as fast as possible.
> >>>>
> >>>> The new function simply replaces throwing exceptions with flagging a
> >>>> boolean.
> >>>
> >>> I think I may end up doing something like that :/
> >>>
> >>> I was hoping to be able to do something vaguely sensible like this:
> >>>
> >>> string newStr;
> >>> foreach(dchar dc; str)
> >>> {
> >>>
> >>> if(isValidDchar(dc))
> >>>
> >>> newStr ~= dc;
> >>>
> >>> else
> >>>
> >>> newStr ~= 'X';
> >>>
> >>> }
> >>> str = newStr;
> >>>
> >>> But that just blows up in my face.
> >>
> >> std.encoding to the rescue?
> >> It looks like a well established module that was forgotten for some
> >> reason.
> >>
> >> And here I'm wondering what a function named sanitize could do :)
> >
> > Ahh, I didn't even notice that module.
>
> Same here, It's just a couple of days(!) ago I somehow managed to find
> decode in the wrong place (in std.encoding instead of std.utf). And it
> looked useful, but I never heard about it. Seriously, how many totally
> irrelevant old modules we have around here? (hint: std.gregorian!)
>
> > Even if it's imperfect and goes away, it looks like it'll at least get
> > the job done for me. And the encoding conversions should even give me an
> > easy way to save at least some of the invalid chars (which wasn't really
> > a requirement of mine, but it'll still be nice).
>
> Yeah, given the amount of necessary work in the Phobos realm it could
> hang around for quite sometime ;)
Oh, it'll probably be around for a while. It'll take time before a replacement
is devised. After, std.stream is still around, isn't it? And there's actually
supposedly a plan regarding its replacement's implementation. There's no such
thing with regards to std.encoding. I just thought that I should point out
that it's likely to be replaced at some point (hopefully with something much
better).
- Jonathan M Davis
More information about the Digitalmars-d-learn
mailing list