Making byLine faster: we should be able to delegate this

Mon Mar 23 11:32:29 PDT 2015

On 3/23/15 10:43 AM, rumbu wrote:
> On Monday, 23 March 2015 at 15:00:07 UTC, John Colvin wrote:
>
>> What would be really great would be a performance test suite for
>> phobos. D is reaching a point where "It'll probably be fast because we
>> did it right" or "I remember it being fast-ish 3 years ago when i
>> wrote a small toy test" isn't going to cut it. Real data is needed,
>> with comparisons to other languages where possible.
>
> I made the same test in C# using a 30MB plain ASCII text file. Compared
> to fastest method proposed by Andrei, results are not the best:
>
> D:
> readText.representation.count!(c => c == '\n') - 428 ms
> byChunk(4096).joiner.count!(c => c == '\n') - 1160 ms
>
> C#:
> File.ReadAllLines.Length - 216 ms;
>
> Win64, D 2.066.1, Optimizations were turned on in both cases.
>
> The .net code is clearly not performance oriented
> (http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,675b2259e8706c26),
> I suspect that .net runtime is performing some optimizations under the
> hood.

At this point it gets down to the performance of std.algorithm.count, 
which could and should be improved. This code accelerates speed 2.5x 
over count and brings it in the zone of wc -l, which is probably near 
the lower bound achievable:

   auto bytes = args[1].readText.representation;
   for (auto p = bytes.ptr, lim = p + bytes.length;; )
   {
     import core.stdc.string;
     auto r = cast(immutable(ubyte)*) memchr(p, '\n', lim - p);
     if (!r) break;
     ++linect;
     p = r + 1;
   }

Would anyone want to put some work into accelerating count?

Andrei