Need to do some "dirty" UTF-8 handling

Jonathan M Davis jmdavisProg at gmx.com
Sat Jun 25 15:35:04 PDT 2011


On 2011-06-25 15:17, Dmitry Olshansky wrote:
> On 26.06.2011 1:49, Nick Sabalausky wrote:
> > "Andrej Mitrovic"<andrej.mitrovich at gmail.com>  wrote in message
> > news:mailman.1215.1309019944.14074.digitalmars-d-learn at puremagic.com...
> > 
> >> I've had a similar requirement some time ago. I've had to copy and
> >> modify the phobos function std.utf.decode for a custom text editor
> >> because the function throws when it finds an invalid code point. This
> >> is way too slow for my needs. I'm actually displaying invalid code
> >> points with special marks (just like Scintilla), so I need decoding to
> >> work as fast as possible.
> >> 
> >> The new function simply replaces throwing exceptions with flagging a
> >> boolean.
> > 
> > I think I may end up doing something like that :/
> > 
> > I was hoping to be able to do something vaguely sensible like this:
> > 
> > string newStr;
> > foreach(dchar dc; str)
> > {
> > 
> >      if(isValidDchar(dc))
> >      
> >          newStr ~= dc;
> >      
> >      else
> >      
> >          newStr ~= 'X';
> > 
> > }
> > str = newStr;
> > 
> > But that just blows up in my face.
> 
> std.encoding to the rescue?
> It looks like a well established module that was forgotten for some reason.

It's also likely going away. It was an experiment of sorts which Andrei 
considers a failure. We need something to replace it, but as I understand it, 
it doesn't solve all of the problems that it's supposed to, and those it does 
solve, it doesn't necessarily solve in the best way. So, an improved 
replacement is going to need to be devised, but I wouldn't expect std.encoding 
to stick around in the long run.

- Jonathan M Davis


More information about the Digitalmars-d-learn mailing list