Text editing [Was: Re: #line decoder]

bearophile bearophileHUGS at lycos.com
Thu Sep 25 09:36:17 PDT 2008


Sergey Gromov:
> you probably actually want "new string[][first.length]", otherwise you 
> simply get three huge strings.

Yes, I have not seem them missing commas in the test run output (this tells me to always avoid writeln and always use my putr, that puts "" around strings when they printed inside containers).
The timings of the loader3/loader4 versions now show that the ArrayBuilder is quite useful.

The version 4 too has the same bug, the new versions:

// loader3
import std.stream;
import std.string: split;
void main() {
    auto fin = new BufferedFile("data2.txt");
    string[] first = fin.readLine().split();
    auto cols = new string[][first.length];
    foreach (col, el; first)
        cols[col] ~= el.dup;
    foreach (string line; fin)
        foreach (col, el; line.split())
            cols[col] ~= el.dup;
}


// loader4
import std.stream, std.gc;
import std.string: split;
void main() {
    auto fin = new BufferedFile("data2.txt");
    string[] first = fin.readLine().split();
    auto cols = new string[][first.length];
    foreach (col, el; first)
        cols[col] ~= el.dup;
    disable();
    foreach (string line; fin)
        foreach (col, el; line.split())
            cols[col] ~= el.dup;
    enable();
}

And just to be sure and have a more real test, I have added a benchmark that shows the final array creation too (the final append is very quick, less than 0.01 s), derived from loader10:

// loader10b
import std.stream;
import std.string: split;
import d.string: xsplit;
import std.gc;
import d.builders: ArrayBuilder;
void main() {
    auto fin = new BufferedFile("data2.txt");
    string[] first = fin.readLine().split();
    auto cols = new ArrayBuilder!(string)[first.length];
    foreach (col, el; first)
        cols[col] ~= el.dup;
    disable();
    foreach (string line; fin)
        foreach (col, el; line.xsplit())
            cols[col] ~= el.dup;
    enable();
    string[][] truecols;
    foreach (col; cols)
        truecols ~= col.toarray;
}


Updated timings:
Timings, data2.txt, warm timings, best of 3:
  loader1:  23.05 s
  loader2:   3.00 s
  loader3:  44.79 s
  loader4:  39.28 s
  loader5:  21.31 s
  loader6:   7.20 s
  loader7:   7.51 s
  loader8:   8.45 s
  loader9:   5.46 s
  loader10:  3.73 s
  loader10b: 3.88 s
  loader11: 82.54 s
  loader12: 38.87 s

Bye and thank you,
bearophile


More information about the Digitalmars-d-announce mailing list