Java > Scala

David Eagen spam_me_here at mailinator.com
Thu Dec 1 21:17:30 PST 2011


"Andrei Alexandrescu" <SeeWebsiteForEmail at erdani.org> wrote in message 
news:jb8hvh$2sdl$1 at digitalmars.com...
>> This is a good benchmark for I/O and a practical regex. David, could you
>> please send (privately if you want) the file or some statistics about it
>> (bytes, lines, a representative sample)? Thanks!
>>
> One more thing before I forget - you may want to use byLine() for input. 
> In case the issue turns out to be related to I/O, it's much better we 
> improve byLine() instead of the streams library.
>

I implemented the various suggestions (File.byLine, writeln instead of 
writefln, std.algorithm.sort,  except using FReD. FReD wouldn't compile on 
the linux box I am using. the error was:

/phobos/std/file.d(537): Error: undefined identifier package c.stdio

Previous timing:
real    4m21.255s
user    4m14.216s
sys     0m5.940s

New timing after the changes:
real    2m15.840s
user    2m12.700s
sys     0m2.760s


So, it's nearly twice as fast but still the slowest of the four.

I was able to compile with FReD on a 32-bit Windows system and it performed 
15% faster than std.regex processing these same test files. I would love to 
try the precompiled regex code for FReD but the compile throws an out of 
memory error when I try it.

The source files are /var/log/syslog files from sendmail on a Solaris 10 
box. I can't make them available because they are mail logs from our company 
but here are the sizes and line counts along with example entries.

$ wc -l syslog syslog.0 syslog.1 syslog.2
   280618 syslog
   331609 syslog.0
   535035 syslog.1
   543241 syslog.2
  1690503 total

-rw-r--r-- 1 david david  86244537 2011-11-30 21:26 syslog.0
-rw-r--r-- 1 david david 146156778 2011-11-30 21:26 syslog.1
-rw-r--r-- 1 david david 143481904 2011-11-30 21:26 syslog.2
-rw-r--r-- 1 david david  73030898 2011-11-30 21:26 syslog

The entries look like this:

Oct 27 03:10:01 thehost sendmail[3248]: [ID 801593 mail.info] 
p9R8A0MJ003245: to=user at somewhere.com, delay=00:00:01, xdelay=00:00:01, 
mailer=esmtp, pri=120773, relay=some.host.com. [5.6.7.8], dsn=2.0.0, 
stat=Sent (ok 1319703001 qp 25319 the.mail.host.com!1319703000!80184558!1)
Oct 27 03:10:04 thehost sendmail[3289]: [ID 801593 mail.info] 
p9R8A3Nr003289: from=sender at senderbox, size=765, class=0, nrcpts=1, 
msgid=<201110270810.p9R8A3QA021419 at senderbox>, proto=ESMTP, daemon=MTA, 
relay=senderbox.foo.com [1.2.3.4]


-Dave

-------------- next part --------------
A non-text attachment was scrubbed...
Name: relayhosts.d
Type: application/octet-stream
Size: 1487 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20111201/c5876a65/attachment-0001.obj>


More information about the Digitalmars-d mailing list