<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">

<HTML>

<HEAD>

  <META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=UTF-8">

  <META NAME="GENERATOR" CONTENT="GtkHTML/4.1.92">

</HEAD>

<BODY>

It seems related to toLower too...<BR>

<BR>

Here the line with exception:<BR>

<BR>

s = replace(s, regex(`[^"a-zA-Z0-9àòèéìù\.]`, "g"), " ").toLower();<BR>

<BR>

Where s is a string with that sequence...<BR>

<BR>

Using dmd 2.056<BR>

<BR>

Il giorno ven, 18/11/2011 alle 20.33 +0400, Dmitry Olshansky ha scritto:

<BLOCKQUOTE TYPE=CITE>

<PRE>

On 18.11.2011 17:58, Andrea Fontana wrote:

> I build a data access layer in c++. This layer works with mongo db where

> string are always encoded using UTF-8. I've ported this layer in D using

> swig. String is written correctly in console but when i use std.regex

> sometimes it gives an exception:

>

> core.exception.UnicodeException@src

> <<A HREF="mailto:core.exception.UnicodeException@src">mailto:core.exception.UnicodeException@src</A>>/rt/util/utf.d(290): invalid

> UTF-8 sequence

>

> Byte sequence (for better undestanding) is:

> [83, 195, 179, 32]

>

> And the string was "Sò " (with accented o and a space)

>

> I'm not a utf expert, so Is it a wrong utf-8 encoding or it is a bug on

> utf.d?

>

Which version of std.regex are you using - the one from git master or 

the one in the latest release?

If it's the former then I'm willing to look into this thing on weekend, 

if you can get a hold of a pair: string + pattern that fails like this.

</PRE>

</BLOCKQUOTE>

</BODY>

</HTML>