Regex and UTF-8

Andrea Fontana advmail at katamail.com
Fri Nov 18 05:58:31 PST 2011


I build a data access layer in c++. This layer works with mongo db where
string are always encoded using UTF-8. I've ported this layer in D using
swig. String is written correctly in console but when i use std.regex
sometimes it gives an exception:

core.exception.UnicodeException at src/rt/util/utf.d(290): invalid UTF-8
sequence

Byte sequence (for better undestanding) is:
[83, 195, 179, 32]

And the string was "Sò " (with accented o and a space)

I'm not a utf expert, so Is it a wrong utf-8 encoding or it is a bug on
utf.d? 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20111118/6d7a7560/attachment.html>


More information about the Digitalmars-d mailing list