Making byLine faster: we should be able to delegate this
rumbu via Digitalmars-d
digitalmars-d at puremagic.com
Mon Mar 23 14:13:16 PDT 2015
On Monday, 23 March 2015 at 19:25:08 UTC, Tobias Pankrath wrote:
>> I made the same test in C# using a 30MB plain ASCII text file.
>> Compared to fastest method proposed by Andrei, results are not
>> the best:
>>
>> D:
>> readText.representation.count!(c => c == '\n') - 428 ms
>> byChunk(4096).joiner.count!(c => c == '\n') - 1160 ms
>>
>> C#:
>> File.ReadAllLines.Length - 216 ms;
>>
>> Win64, D 2.066.1, Optimizations were turned on in both cases.
>>
>> The .net code is clearly not performance oriented
>> (http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,675b2259e8706c26),
>> I suspect that .net runtime is performing some optimizations
>> under the hood.
>
> Does the C# version validate the input? Using std.file.read
> instead of readText.representation halves the runtime on my
> machine.
Source code is available at the link above. Since the C# version
works internally with streams and UTF-16 chars, the pseudocode
looks like this:
---
initilialize a LIST with 16 items;
while (!eof)
{
read 4096 bytes in a buffer;
decode them to UTF-16 in a wchar[] buffer
while (moredata in the buffer)
{
read from buffer until (\n or \r\n or \r);
discard end of line;
if (nomorespace in LIST)
double its size.
add the line to LIST.
}
}
return number of items in the LIST.
---
Since this code is clearly not the best for this task, as I
suspected, I looked into jitted code and it seems that the .net
runtime is smart enough to recognize this pattern and is doing
the following:
- file is mapped into memory using CreateFileMapping
- does not perform any decoding, since \r and \n are ASCII
- does not create any list
- searches incrementally for \r, \r\n, \n using CompareStringA
and LOCALE_INVARIANT and increments at each end of line
- there is no temporary memory allocation since searching is
performed directly on the mapping handle
- returns the count.
More information about the Digitalmars-d
mailing list