Java > Scala

Jonathan M Davis jmdavisProg at gmx.com
Fri Dec 2 02:15:33 PST 2011


On Thursday, December 01, 2011 23:17:30 David Eagen wrote:
> "Andrei Alexandrescu" <SeeWebsiteForEmail at erdani.org> wrote in message
> news:jb8hvh$2sdl$1 at digitalmars.com...
> 
> >> This is a good benchmark for I/O and a practical regex. David, could
> >> you
> >> please send (privately if you want) the file or some statistics about
> >> it
> >> (bytes, lines, a representative sample)? Thanks!
> > 
> > One more thing before I forget - you may want to use byLine() for input.
> > In case the issue turns out to be related to I/O, it's much better we
> > improve byLine() instead of the streams library.
> 
> I implemented the various suggestions (File.byLine, writeln instead of
> writefln, std.algorithm.sort,  except using FReD. FReD wouldn't compile on
> the linux box I am using. the error was:
> 
> /phobos/std/file.d(537): Error: undefined identifier package c.stdio
> 
> Previous timing:
> real    4m21.255s
> user    4m14.216s
> sys     0m5.940s
> 
> New timing after the changes:
> real    2m15.840s
> user    2m12.700s
> sys     0m2.760s
> 
> 
> So, it's nearly twice as fast but still the slowest of the four.
> 
> I was able to compile with FReD on a 32-bit Windows system and it performed
> 15% faster than std.regex processing these same test files. I would love to
> try the precompiled regex code for FReD but the compile throws an out of
> memory error when I try it.
> 
> The source files are /var/log/syslog files from sendmail on a Solaris 10
> box. I can't make them available because they are mail logs from our company
> but here are the sizes and line counts along with example entries.
> 
> $ wc -l syslog syslog.0 syslog.1 syslog.2
>    280618 syslog
>    331609 syslog.0
>    535035 syslog.1
>    543241 syslog.2
>   1690503 total
> 
> -rw-r--r-- 1 david david  86244537 2011-11-30 21:26 syslog.0
> -rw-r--r-- 1 david david 146156778 2011-11-30 21:26 syslog.1
> -rw-r--r-- 1 david david 143481904 2011-11-30 21:26 syslog.2
> -rw-r--r-- 1 david david  73030898 2011-11-30 21:26 syslog
> 
> The entries look like this:
> 
> Oct 27 03:10:01 thehost sendmail[3248]: [ID 801593 mail.info]
> p9R8A0MJ003245: to=user at somewhere.com, delay=00:00:01, xdelay=00:00:01,
> mailer=esmtp, pri=120773, relay=some.host.com. [5.6.7.8], dsn=2.0.0,
> stat=Sent (ok 1319703001 qp 25319 the.mail.host.com!1319703000!80184558!1)
> Oct 27 03:10:04 thehost sendmail[3289]: [ID 801593 mail.info]
> p9R8A3Nr003289: from=sender at senderbox, size=765, class=0, nrcpts=1,
> msgid=<201110270810.p9R8A3QA021419 at senderbox>, proto=ESMTP, daemon=MTA,
> relay=senderbox.foo.com [1.2.3.4]

The performance boost would likely be minimal, since the vast majority of the 
speed problem is in std.regex, but I would point out that endsWith can take 
multiple arguments.

- Jonathan M Davis


More information about the Digitalmars-d mailing list