Regex and utf8

Walter Bright newshound1 at digitalmars.com
Sun Jul 20 15:23:23 PDT 2008


Koroskin Denis wrote:
> On Sun, 20 Jul 2008 23:45:34 +0400, Walter Bright 
> <newshound1 at digitalmars.com> wrote:
> 
>> Roman Balitskiy wrote:
>>> When I try to parse cyrillic text I get "Error: 4invalid UTF-8 
>>> sequence". I use dmd 1.030 on Ubuntu 8.04 with utf8 locale. I have 
>>> tryed upcomming gdc 0.25 with the same results.
>>>      if (auto m = std.regexp.search(`ab&#1078;def`, `[&#1078;]`))   
>>> // Here is cyrillic letter 'je'
>>>         writefln("%s[%s]%s", m.pre, m.match(0), m.post);
>>>
>>
>>
>> The back quotes are for wysiwyg strings, and the UTF translation 
>> doesn't happen. Try using "" strings instead.
> 
> Nope, it doesn't help. However, removing square brackets does.

That's a bug with the regex engine, then. Who wants to put it in 
bugzilla? <g>



More information about the Digitalmars-d mailing list