Java > Scala
David Eagen
spam_me_here at mailinator.com
Thu Dec 1 21:17:30 PST 2011
"Andrei Alexandrescu" <SeeWebsiteForEmail at erdani.org> wrote in message
news:jb8hvh$2sdl$1 at digitalmars.com...
>> This is a good benchmark for I/O and a practical regex. David, could you
>> please send (privately if you want) the file or some statistics about it
>> (bytes, lines, a representative sample)? Thanks!
>>
> One more thing before I forget - you may want to use byLine() for input.
> In case the issue turns out to be related to I/O, it's much better we
> improve byLine() instead of the streams library.
>
I implemented the various suggestions (File.byLine, writeln instead of
writefln, std.algorithm.sort, except using FReD. FReD wouldn't compile on
the linux box I am using. the error was:
/phobos/std/file.d(537): Error: undefined identifier package c.stdio
Previous timing:
real 4m21.255s
user 4m14.216s
sys 0m5.940s
New timing after the changes:
real 2m15.840s
user 2m12.700s
sys 0m2.760s
So, it's nearly twice as fast but still the slowest of the four.
I was able to compile with FReD on a 32-bit Windows system and it performed
15% faster than std.regex processing these same test files. I would love to
try the precompiled regex code for FReD but the compile throws an out of
memory error when I try it.
The source files are /var/log/syslog files from sendmail on a Solaris 10
box. I can't make them available because they are mail logs from our company
but here are the sizes and line counts along with example entries.
$ wc -l syslog syslog.0 syslog.1 syslog.2
280618 syslog
331609 syslog.0
535035 syslog.1
543241 syslog.2
1690503 total
-rw-r--r-- 1 david david 86244537 2011-11-30 21:26 syslog.0
-rw-r--r-- 1 david david 146156778 2011-11-30 21:26 syslog.1
-rw-r--r-- 1 david david 143481904 2011-11-30 21:26 syslog.2
-rw-r--r-- 1 david david 73030898 2011-11-30 21:26 syslog
The entries look like this:
Oct 27 03:10:01 thehost sendmail[3248]: [ID 801593 mail.info]
p9R8A0MJ003245: to=user at somewhere.com, delay=00:00:01, xdelay=00:00:01,
mailer=esmtp, pri=120773, relay=some.host.com. [5.6.7.8], dsn=2.0.0,
stat=Sent (ok 1319703001 qp 25319 the.mail.host.com!1319703000!80184558!1)
Oct 27 03:10:04 thehost sendmail[3289]: [ID 801593 mail.info]
p9R8A3Nr003289: from=sender at senderbox, size=765, class=0, nrcpts=1,
msgid=<201110270810.p9R8A3QA021419 at senderbox>, proto=ESMTP, daemon=MTA,
relay=senderbox.foo.com [1.2.3.4]
-Dave
-------------- next part --------------
A non-text attachment was scrubbed...
Name: relayhosts.d
Type: application/octet-stream
Size: 1487 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20111201/c5876a65/attachment-0001.obj>
More information about the Digitalmars-d
mailing list