Can I speed up this log parsing script further?

uncorroded via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Fri Jun 9 09:28:08 PDT 2017


On Friday, 9 June 2017 at 14:19:48 UTC, Daniel Kozak wrote:
> import std.stdio;
> import std.array: appender, array;
> import std.algorithm : findSplit, splitter, joiner, canFind, 
> map;
> import std.typecons : tuple, Tuple;
> import std.conv : to;
> import std.range : dropOne, dropExactly, takeExactly, chain;
>
> alias push_type = Tuple!(int, char[], int, bool, bool);
> alias npush_type = Tuple!(char[], int, char[]);
>
> void read_log(string filename) {
>     File file = File(filename, "r");
>     auto npushed = appender!(npush_type[])();
>     auto pushed = appender!(push_type[])();
>     foreach (line; file.byLine) {
>         if (auto findResult = line.findSplit(" SYNC_PUSH: ")) {
>             auto rel = findResult[2];
>             auto att = rel.splitter(" ");
>
>             auto firstVal = att.front.to!int;
>             auto secondVal = 
> att.dropExactly(2).takeExactly(2).joiner("
> ").to!(char[]).dup;
>             auto thirdVal = att.dropExactly(5).front.to!int;
>             auto fourthVal = 
> findResult[2].canFind("PA-SOC_POP");
>             auto fifthVal = findResult[2].canFind("CU-SOC_POP");
>             pushed.put(tuple(firstVal, secondVal, thirdVal, 
> fourthVal,
> fifthVal));
>             continue;
>         }
>         if (auto findResult = line.findSplit(" SOC_NOT_PUSHED: 
> ")) {
>             auto leftPart = findResult[0].splitter(" 
> ").dropExactly(2)
>                                                        
> .takeExactly(2);
>             auto rightPart = findResult[2].splitter(" 
> ").takeExactly(2);
>             auto firstVal = chain(leftPart.front,
> leftPart.dropOne.front).to!(char[]);
>             auto thirdVal = rightPart.front.to!(char[]).dup;
>             auto secondVal = rightPart.dropOne.front.to!int;
>             npushed.put(tuple(firstVal, secondVal, thirdVal));
>             continue;
>         }
>     }
>     // Doing more stuff with these arrays later. For now, just 
> printing
> lengths
>     writeln(npushed.data.length);
>     writeln(pushed.data.length);
> }
>

Hi Daniel,

Thanks a lot for the code. I tested it on our production log file 
and it takes 0.2 sec (50% improvement over old time)
I tried using just the appender and it did not make a significant 
difference. On the other hand, just changing the inner loop to 
use std.algorithm and range seems to make a big difference.Is 
there a good resource to read about the good stuff in 
std.algorithm and range? I tried going through the library docs 
but they are too exhaustive!

Thanks :)


More information about the Digitalmars-d-learn mailing list