Regex and UTF-8

Dmitry Olshansky dmitry.olsh at gmail.com
Fri Nov 18 08:33:29 PST 2011


On 18.11.2011 17:58, Andrea Fontana wrote:
> I build a data access layer in c++. This layer works with mongo db where
> string are always encoded using UTF-8. I've ported this layer in D using
> swig. String is written correctly in console but when i use std.regex
> sometimes it gives an exception:
>
> core.exception.UnicodeException at src
> <mailto:core.exception.UnicodeException at src>/rt/util/utf.d(290): invalid
> UTF-8 sequence
>
> Byte sequence (for better undestanding) is:
> [83, 195, 179, 32]
>
> And the string was "Sò " (with accented o and a space)
>
> I'm not a utf expert, so Is it a wrong utf-8 encoding or it is a bug on
> utf.d?
>

Which version of std.regex are you using - the one from git master or 
the one in the latest release?
If it's the former then I'm willing to look into this thing on weekend, 
if you can get a hold of a pair: string + pattern that fails like this.


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list