regex issue

Jay Norwood jayn at prismnet.com
Tue Mar 20 08:26:02 PDT 2012


On Tuesday, 20 March 2012 at 10:28:11 UTC, Dmitry Olshansky wrote:
> Note that if your task is to split buffer by exactly '\n' byte 
> then loop with memchr is about as fast as it gets, no amount of 
> magic compiler optimizations would make other generic ways 
> better (even theoretically). What they *could* do is bring the 
> difference lower.
>

ok, I'll use memchr.

  >> This works ok, but though concise it is not very fast
>>
>> void wcp (string fn)
>> {
>> string input = cast(string)std.file.read(fn);
>> ulong l_cnt = std.algorithm.count(input,"\n");
>> }
>>
>>
>
> BTW I suggest to separate I/O from actual work or better yet, 
> time both separately via std.datetime.StopWatch.

I'm timing with the stopwatch.  I have separate functions where 
I've measured empty func, just the file reads with empty loop, so 
I can see the deltas.  All these are being executed inside a 
parallel foreach loop ... so 7 threads reading different files, 
and since that is the end target, the overall measurement in the 
context is more meaningful to me.  The file io is on the order of 
25ms for chunk reads or 30ms for full file reads in these 
results, as it is all reads of about 20MB for the full test from 
a 510 series ssd drive with sata3.  The reads are being done in 
parallel by the threads in the threadpool.  Each file is 2MB.   
So any total times you see in my comments are for 10 tasks being 
executed in a parallel foreach loop, with the file read portion 
previously timed at around 30ms.
>
>> This fails to build, so I'd guess is missing \p
>>
>> void wcp (string fn)
>> {
>> enum ctr = ctRegex!("\p{WhiteSpace}","m");
>> }
>>
>> ------ Build started: Project: a7, Configuration: Release Win32
>> ------
>> Building Release\a7.exe...
>> a7.d(210): undefined escape sequence \p
>>
>
> Not a bug, a compiler escape sequence.
> How do you think \n works in your non-regex examples ? ;)

yes, thanks.  I read your other link and that was helpful.   I 
think I presumed that the escape handling was something belonging 
to stdio, while regex would have its own valid escapes that would 
include \p.  But I see now that the string literals have their 
own set of escapes.



More information about the Digitalmars-d-learn mailing list