regex issue

Dmitry Olshansky dmitry.olsh at gmail.com
Tue Mar 20 03:28:09 PDT 2012


On 19.03.2012 23:24, Jay Norwood wrote:
> On Monday, 19 March 2012 at 13:55:39 UTC, Dmitry Olshansky wrote:
>> That's right, however counting is completely separate from regex,
>> you'd want to use std.algorithm count:
>> count(match(....,"\n"));
>>
>> or more unicode-friendly:
>> count(match(...., regex("$","m")); //note the multi-line flag
>
Ehm, forgot "g" flag myself, so it would be

count(match(...., regex("$","gm"));

and

count(match(...., regex("\n","g"));

Note that if your task is to split buffer by exactly '\n' byte then loop 
with memchr is about as fast as it gets, no amount of magic compiler 
optimizations would make other generic ways better (even theoretically). 
What they *could* do is bring the difference lower.

> This only sets l_cnt to 1
>
> void wcp_cnt_match1 (string fn)
> {
> string input = cast(string)std.file.read(fn);
> enum ctr = ctRegex!("$","m");
> ulong l_cnt = std.algorithm.count(match(input,ctr));
> }
>
> This works ok, but though concise it is not very fast
>
> void wcp (string fn)
> {
> string input = cast(string)std.file.read(fn);
> ulong l_cnt = std.algorithm.count(input,"\n");
> }
>
>

BTW I suggest to separate I/O from actual work or better yet, time both 
separately via std.datetime.StopWatch.

> This fails to build, so I'd guess is missing \p
>
> void wcp (string fn)
> {
> enum ctr = ctRegex!("\p{WhiteSpace}","m");
> }
>
> ------ Build started: Project: a7, Configuration: Release Win32
> ------
> Building Release\a7.exe...
> a7.d(210): undefined escape sequence \p
>

Not a bug, a compiler escape sequence.
How do you think \n works in your non-regex examples ? ;)


-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list