std.csv Performance Review

Sun Jun 4 00:04:37 PDT 2017

On Sunday, 4 June 2017 at 06:54:46 UTC, Patrick Schluter wrote:
> On Sunday, 4 June 2017 at 06:15:24 UTC, H. S. Teoh wrote:
>> On Sun, Jun 04, 2017 at 05:41:10AM +0000, Jesse Phillips via 
>> (Note that this is much less of a limitation than it seems; 
>> for example you could use std.mmfile to memory-map the file 
>> into your address space so that it doesn't actually have to 
>> fit into memory, and you can still take slices of it. The OS 
>> will manage the paging from/to disk for you. Of course, it 
>> will be slower when something has to be paged from disk, but 
>> IME this is often much faster than if you read the data into 
>> memory yourself.
>
> If the file is in the file cache of the kernel, memory mapping 
> does not need to reload the file as it is already in memory. In 
> fact, calling mmap() changes only the sharing of the pages in 
> general. That's where most of the performance win from memory 
> mapping comes from.
To be precise, it's the copying of data that is spared by mmap. 
If the file is in the file cache, the open/read/write/close 
syscalls will also be fed from the memory mapped cache entry, but 
this requires that the data is copied from the kernel memory 
space to the processes buffer space. So each call to read will 
have to do this copying. So the gain from mmap comes for avoiding 
the copy of memory and avoiding the syscalls read/write/seek. The 
loading in memory of the physical file is the same in both cases.


>
> This stackoverflow [1] discussion links to a realworldtech 
> discussion with Linus Torvalds explaining it in detail. On 
> windows and Solaris the mechanism is the same.
>
> [1] 
> https://stackoverflow.com/questions/5902629/mmap-msync-and-linux-process-termination/6219962#6219962