Text editing [Was: Re: #line decoder]

Sergey Gromov snake.scaly at gmail.com
Sat Sep 27 05:13:55 PDT 2008


Thu, 25 Sep 2008 12:36:17 -0400,
bearophile wrote:
> Updated timings:
> Timings, data2.txt, warm timings, best of 3:
>   loader1:  23.05 s
>   loader2:   3.00 s
>   loader3:  44.79 s
>   loader4:  39.28 s
>   loader5:  21.31 s
>   loader6:   7.20 s
>   loader7:   7.51 s
>   loader8:   8.45 s
>   loader9:   5.46 s
>   loader10:  3.73 s
>   loader10b: 3.88 s
>   loader11: 82.54 s
>   loader12: 38.87 s

And, for completeness sake, a straight-forward implementation in C++:

#include <fstream>
#include <string>
#include <memory>
#include <algorithm>
#include <ctype.h>
#include <vector>
#include <functional>

using namespace std;

int main()
{
    char buf[1024];
    char *pos, *end;
    ifstream ifs("data2.txt");
    // result
    vector<vector<string> > result;
    // number of columns
    ifs.getline(buf, sizeof buf);
    pos = buf;
    end = buf + ifs.gcount();
    // loop over the tokens
    while (pos != end)
    {
        char *wordEnd = find_if(pos, end, isspace);
        result.push_back(vector<string>());
        result.back().push_back(string(pos, wordEnd));
        pos = find_if(wordEnd, end, not1(ptr_fun(isspace)));
    }
    // rest of the lines
    while (ifs.good())
    {
        ifs.getline(buf, sizeof buf);
        pos = buf;
        end = buf + ifs.gcount();
        // loop over the tokens
        size_t col = 0;
        while (pos != end)
        {
            char *wordEnd = find_if(pos, end, isspace);
            result[col].push_back(string(pos, wordEnd));
            pos = find_if(wordEnd, end, not1(ptr_fun(isspace)));
            ++col;
        }
    }
}

On my system:
C++:    3.93 s
best D: 2.36 s
Python: 3.20 s

Replacing the inner vector<> with a list<> makes matters worse: 5.68 s 
instead of 3.93.  I also guessed that the problem was in vectors copying 
all the strings on every re-allocation and tried to replace strings with 
boost::shared_ptr<string>.  Which made for 13.35 s.  I'm not sure I can 
optimize this code any further without re-writing it in Cee.


More information about the Digitalmars-d-announce mailing list