Can I speed up this log parsing script further?
uncorroded via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Fri Jun 9 09:28:08 PDT 2017
On Friday, 9 June 2017 at 14:19:48 UTC, Daniel Kozak wrote:
> import std.stdio;
> import std.array: appender, array;
> import std.algorithm : findSplit, splitter, joiner, canFind,
> map;
> import std.typecons : tuple, Tuple;
> import std.conv : to;
> import std.range : dropOne, dropExactly, takeExactly, chain;
>
> alias push_type = Tuple!(int, char[], int, bool, bool);
> alias npush_type = Tuple!(char[], int, char[]);
>
> void read_log(string filename) {
> File file = File(filename, "r");
> auto npushed = appender!(npush_type[])();
> auto pushed = appender!(push_type[])();
> foreach (line; file.byLine) {
> if (auto findResult = line.findSplit(" SYNC_PUSH: ")) {
> auto rel = findResult[2];
> auto att = rel.splitter(" ");
>
> auto firstVal = att.front.to!int;
> auto secondVal =
> att.dropExactly(2).takeExactly(2).joiner("
> ").to!(char[]).dup;
> auto thirdVal = att.dropExactly(5).front.to!int;
> auto fourthVal =
> findResult[2].canFind("PA-SOC_POP");
> auto fifthVal = findResult[2].canFind("CU-SOC_POP");
> pushed.put(tuple(firstVal, secondVal, thirdVal,
> fourthVal,
> fifthVal));
> continue;
> }
> if (auto findResult = line.findSplit(" SOC_NOT_PUSHED:
> ")) {
> auto leftPart = findResult[0].splitter("
> ").dropExactly(2)
>
> .takeExactly(2);
> auto rightPart = findResult[2].splitter("
> ").takeExactly(2);
> auto firstVal = chain(leftPart.front,
> leftPart.dropOne.front).to!(char[]);
> auto thirdVal = rightPart.front.to!(char[]).dup;
> auto secondVal = rightPart.dropOne.front.to!int;
> npushed.put(tuple(firstVal, secondVal, thirdVal));
> continue;
> }
> }
> // Doing more stuff with these arrays later. For now, just
> printing
> lengths
> writeln(npushed.data.length);
> writeln(pushed.data.length);
> }
>
Hi Daniel,
Thanks a lot for the code. I tested it on our production log file
and it takes 0.2 sec (50% improvement over old time)
I tried using just the appender and it did not make a significant
difference. On the other hand, just changing the inner loop to
use std.algorithm and range seems to make a big difference.Is
there a good resource to read about the good stuff in
std.algorithm and range? I tried going through the library docs
but they are too exhaustive!
Thanks :)
More information about the Digitalmars-d-learn
mailing list