regex issue
Jay Norwood
jayn at prismnet.com
Tue Mar 20 08:26:02 PDT 2012
On Tuesday, 20 March 2012 at 10:28:11 UTC, Dmitry Olshansky wrote:
> Note that if your task is to split buffer by exactly '\n' byte
> then loop with memchr is about as fast as it gets, no amount of
> magic compiler optimizations would make other generic ways
> better (even theoretically). What they *could* do is bring the
> difference lower.
>
ok, I'll use memchr.
>> This works ok, but though concise it is not very fast
>>
>> void wcp (string fn)
>> {
>> string input = cast(string)std.file.read(fn);
>> ulong l_cnt = std.algorithm.count(input,"\n");
>> }
>>
>>
>
> BTW I suggest to separate I/O from actual work or better yet,
> time both separately via std.datetime.StopWatch.
I'm timing with the stopwatch. I have separate functions where
I've measured empty func, just the file reads with empty loop, so
I can see the deltas. All these are being executed inside a
parallel foreach loop ... so 7 threads reading different files,
and since that is the end target, the overall measurement in the
context is more meaningful to me. The file io is on the order of
25ms for chunk reads or 30ms for full file reads in these
results, as it is all reads of about 20MB for the full test from
a 510 series ssd drive with sata3. The reads are being done in
parallel by the threads in the threadpool. Each file is 2MB.
So any total times you see in my comments are for 10 tasks being
executed in a parallel foreach loop, with the file read portion
previously timed at around 30ms.
>
>> This fails to build, so I'd guess is missing \p
>>
>> void wcp (string fn)
>> {
>> enum ctr = ctRegex!("\p{WhiteSpace}","m");
>> }
>>
>> ------ Build started: Project: a7, Configuration: Release Win32
>> ------
>> Building Release\a7.exe...
>> a7.d(210): undefined escape sequence \p
>>
>
> Not a bug, a compiler escape sequence.
> How do you think \n works in your non-regex examples ? ;)
yes, thanks. I read your other link and that was helpful. I
think I presumed that the escape handling was something belonging
to stdio, while regex would have its own valid escapes that would
include \p. But I see now that the string literals have their
own set of escapes.
More information about the Digitalmars-d-learn
mailing list