Regex and utf8

Koroskin Denis 2korden at gmail.com
Sun Jul 20 13:13:20 PDT 2008


On Sun, 20 Jul 2008 23:45:34 +0400, Walter Bright  
<newshound1 at digitalmars.com> wrote:

> Roman Balitskiy wrote:
>> When I try to parse cyrillic text I get "Error: 4invalid UTF-8  
>> sequence". I use dmd 1.030 on Ubuntu 8.04 with utf8 locale. I have  
>> tryed upcomming gdc 0.25 with the same results.
>>  	if (auto m = std.regexp.search(`ab&#1078;def`, `[&#1078;]`))   //  
>> Here is cyrillic letter 'je'
>> 		writefln("%s[%s]%s", m.pre, m.match(0), m.post);
>>
>
>
> The back quotes are for wysiwyg strings, and the UTF translation doesn't  
> happen. Try using "" strings instead.

Nope, it doesn't help. However, removing square brackets does.



More information about the Digitalmars-d mailing list