Can I speed up this log parsing script further?

Daniel Kozak via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri Jun 9 01:58:38 PDT 2017


On Fri, Jun 9, 2017 at 9:34 AM, uncorroded via Digitalmars-d-learn <
digitalmars-d-learn at puremagic.com> wrote:

> Hi guys,
>
> I am a beginner in D. As a project, I converted a log-parsing script in
> Python which we use at work, to D. This link was helpful - (
> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ ) I
> compiled it with dmd and ldc. The log file is 52 MB. With dmd (not release
> build), it takes 1.1 sec and with ldc, it takes 0.3 sec.
>
> The Python script (run with system python, not Pypy) takes 0.75 sec. The D
> and Python functions are here and on pastebin ( D -
> https://pastebin.com/SeUR3wFP , Python - https://pastebin.com/F5JbfBmE ).
>
> Basically, i am reading a line, checking for 2 constants. If either one is
> found, some processing is done on line and stored to an array for later
> analysis. I tried reading the file entirely in one go using std.file :
> readText and using std.algorithm : splitter for lazily splitting newline
> but there is no difference in speed, so I used the byLine approach
> mentioned in the linked blog. Is there a better way of doing this in D?
>
> There is no difference in speed because you do not process your data
lazily, so you make many allocations, so this is main reason why it is so
slow. I could improve that, but I will need to see some example data, which
you are trying to parse.

But some rules,
1.) instead of ~= you shoud use std.array.appender
2.) instead of std.string.split you could use std.algorithm.splitter or
std.algorithm.findSplit
3.) instead of indexOf I would use std.algorithm.startsWith (in case it is
on the begining of the line)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d-learn/attachments/20170609/50062036/attachment.html>


More information about the Digitalmars-d-learn mailing list