howto count lines - fast
Patrick Schluter via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Wed May 31 12:57:15 PDT 2017
On Tuesday, 30 May 2017 at 23:41:01 UTC, H. S. Teoh wrote:
> On Tue, May 30, 2017 at 08:02:38PM +0000, Nitram via
> Digitalmars-d-learn wrote:
>> After reading
>> https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ , i was wondering how fast one can do a simple "wc -l" in D.
>>
> size_t lineCount3(string filename)
> {
> import std.mmfile;
>
> auto f = new MmFile(filename);
> auto data = cast(ubyte[]) f[];
> size_t c;
>
> foreach (i; 0 .. data.length)
> {
> if (data[i] == '\n') c++;
> }
> return c;
> }
>
> // real 0m0.242s
> // user 0m1.151s
> // sys 0m0.057s
You should try something more like
auto data = cast(ulong[]) f[];
foreach (i; 0 .. data.length/ulong.sizeof)
and then using bitfiedling to count the number of \n in the
loaded ulong. This divides by 8 the number of load instructions.
The counting of \n in the loaded word then only uses registers.
It is also possible to use bit fiddling to detect and count the
characters in that ulong. I don't know if it is really faster then
Here a function to detect if a given character is in an ulong
auto detect(alias CHAR)(ulong t)
{
enum ulong u = CHAR;
enum mask1 = u|(u<<8)|(u<<16)|(u<<24UL)|(u<<32UL);
enum mask = (mask1<<32)|mask1;
return ((t^mask) - 0x0101010101010101UL) & ~(t^mask) &
0x8080808080808080UL;
}
The returned value is 0 if the character is not in t. And the
highest bit of each byte is set if it contained the character. If
the CPU has a fast popcnt it should be easy to count.
More information about the Digitalmars-d-learn
mailing list