Why GNU coreutils/dd is creating a dummy file more efficiently than D's For loop?

Thu May 23 09:44:15 UTC 2019

On Thursday, 23 May 2019 at 09:09:05 UTC, BoQsc wrote:
> This code of D creates a dummy 47,6 MB text file filled with 
> Nul characters in about 9 seconds
>
> import std.stdio, std.process;
>
> void main() {
>
> 	writeln("Creating a dummy file");
> 	File file = File("test.txt", "w");
>
>    for (int i = 0; i < 50000000; i++)
> 	{
> 		file.write("\x00");
> 	}
>    file.close();
>
> }
>
>
> While GNU coreutils dd can create 500mb dummy Nul file in a 
> second.
> https://github.com/coreutils/coreutils/blob/master/src/dd.c
>
> What are the explanations for this?

If you're talking about benchmarking it's important to provide 
both source code and how you use/compile them. However in that 
case I think I can point you in the right direction already:

I'll suppose that you used something like that:

dd if=/dev/zero of=testfile bs=1M count=500

Note in particular the blocksize argument. I set it to 1M but by 
default it's 512 bytes. If you use strace with the command above 
you'll see a series of write() calls, each writting 1M of null 
bytes to testfile. That's the main difference between your code 
and what dd does: it doesn't write 1 byte at a time. This results 
in way less system calls and system calls are very expensive.

To go fast, read/write bigger chunks.

I may be wrong though, maybe you tested with a bs of 1 byte, so 
test for yourself and if necessary provide all informations and 
not just pieces so that we are able to reproduce your test :)