d2 file input performance

Heywood Floyd soul8o8 at gmail.com
Sun Aug 28 21:39:53 PDT 2011


Christian Köstlin Wrote:
> after some optimizing i got better, but was still way slower than c++. 
> so i started some small microbenchmarks regarding fileio: 
> https://github.com/gizmomogwai/performance in c++, java and d2.
> 
> christian


Hello!

Thanks for you effort in putting this together!


I found this interesting and played around with some of your code examples.
My findings differ somewhat from yours, so I thought I'd post them.



>From what I can tell, G++ does generate almost twice (~1.9x) as fast code, in the fread()/File-example, as DMD. Even though the D-code does handle errors encountered by fread(), that certainly can't explain the dramatic difference in speed alone.

It would be very interesting to see how GDC and LDC perform in these tests!! (I don't have them installed.)



Anyway, here are my notes:

I concentrated on the G++-fread-example and the DMD-File-example, as they seem comparable enough. However, I did some changes to the benchmark in order to "level" the playing field:

  1) Made sure both C++ and D used a 1 kb (fixed-size) buffer
  2) Made sure the underlying setvbuf() buffer is the same (64 kb)
  3) Made sure the read data has an actual side effect by printing out the accumulated data after each file. (Cheapo CRC)

The last point, 3, particularly seemed to make the G++-example considerably slower, perhaps hinting G++ is otherwise doing some clever optimization here. The second point, 2, seemed to have no effect on C++, but it helped somewhat for D. This may hint at C++ doing its own buffering or something. (?) In that case the benchmark is bogus.

Anyway, these are the results:

	(G++ 4.2.1, fread()+crc, osx)
	G++		1135 ms		(no flags)
	G++		399 ms		-O1
	G++		368 ms		-O2
	G++		368 ms		-O3
	G++nofx	156 ms		-O3 (Disqualified!)

	(DMD 2.054, rawRead()+crc, osx)
	DMD		995 ms		(no flags)
	DMD		913 ms		-O
	DMD		888 ms		-release
	DMD		713 ms		-release -O -inline
	DMD		703 ms		-release -O
	DMD		693 ms		-release -O -inline -noboundscheck

Well, I suppose a possible (and to me plausable) explanation is that G++'s optimizations are a lot more extensive than DMD's.

Skipping printing out the CRC-value ("nofx") makes the C++ code more than twice as fast. Note that the code calculating the CRC-value is still in place, the value is just not printed out, and surely, calling printf() 10 times can hardly account for a 200 ms increase. (?) I think it's safe to assume code is simply being ignored here, as it's not having any side effect.

My gut feel is DMD is not doing inlining, at least not to the same extent G++ is, as that seems to be especially important since we're making a function call for every single byte here. (Using the -inline flag even seems to make the D code slower. Weird.) But of course I don't really know. Again, GDC and LDC would be interesting to see here.

Finally, to this I must add the size of the generated binary:

    G++   15 kb
    DMD   882 kb

Yikes. I believe there's nothing (large enough) to hide behind for DMD there.



That's it!
Kind regards
/HF





Here's the modifed code: (Original https://github.com/gizmomogwai/performance)

// - - - - - - 8< - - - - - - 

import 	std.stdio,
		std.datetime,
		core.stdc.stdio;
	
struct FileReader
{
private:
	File file;
	
	enum BUFFER_SIZE = 1024;
	ubyte[BUFFER_SIZE] readBuf;
	size_t pos, len;
	
	this(string name){
		file = File(name, "rb");
		//setbuf(file.getFP(), null); // No buffer
		setvbuf(file.getFP(), null, _IOFBF, BUFFER_SIZE * 64);
	}

	bool fillBuffer()
	{
		auto tmpBuf = file.rawRead(readBuf);
		len = tmpBuf.length;
		pos = 0;
		return len > 0;
	}
	
public:	
	int read()
	{
		if(pos == len){
			if(fillBuffer() == false)
				return -1;
		}
		return readBuf[pos++];
	}
}

size_t readBytes()
{
	size_t count = 0;
	ulong crc = 0;
	for (int i=0; i<10; i++) {
		auto file = FileReader("/tmp/shop_with_ids.pb");	
		auto data = file.read();
		while(data != -1){
			count++;
			crc += data;
			data = file.read();
		}
		writeln(crc);
	}
	return count;
}


int main(string[] args) {
  auto sw = StopWatch(AutoStart.no);
  sw.start();
  auto count = readBytes();
  sw.stop();
  writeln("<tr><td>d2-6-B</td><td>", count, "</td><td>", sw.peek().msecs, "</td><td>using std.stdio.File </td></tr>");
  return 0;
}


// - - - - - - 8< - - - - - - 


#include "stopwatch.h"
#include <iostream>
#include <stdio.h>

class StdioFileReader {
private:
  FILE* fFile;
  static const size_t BUFFER_SIZE = 1024;
  unsigned char fBuffer[BUFFER_SIZE];
  unsigned char* fBufferPtr;
  unsigned char* fBufferEnd;

public:
  StdioFileReader(std::string s) : fFile(fopen(s.c_str(), "rb")), fBufferPtr(fBuffer), fBufferEnd(fBuffer) {
    assert(fFile);
	//setbuf(fFile, NULL); // No buffer
	setvbuf(fFile, NULL, _IOFBF, BUFFER_SIZE * 64);
  }
  ~StdioFileReader() {
    fclose(fFile);
  }

  int read() {
    bool finished = fBufferPtr == fBufferEnd;
    if (finished) {
      finished = fillBuffer();
      if (finished) {
	return -1;
      }
    }
    return *fBufferPtr++;
  }

private:
  bool fillBuffer() {
    size_t l = fread(fBuffer, 1, BUFFER_SIZE, fFile);
    fBufferPtr = fBuffer;
    fBufferEnd = fBufferPtr+l;
    return l == 0;
  }
};

size_t readBytes() {
  size_t res = 0;
  unsigned long crc = 0;
  for (int i=0; i<10; i++) {
    StdioFileReader r("/tmp/shop_with_ids.pb");
    int read = r.read();

    while (read != -1) {
      ++res;
      crc += read;
      read = r.read();
    }
    std::cout << crc << "\n"; // Comment out for "nofx"
  }
  return res;
}

int main(int argc, char** args) {
  StopWatch sw;
  sw.start();
  size_t count = readBytes();
  sw.stop();
  std::cout << "<tr><td>cpp-1-B</td><td>" << count << "</td><td>" << sw.delta() << "</td><td>straight forward implementation using fread with buffering.</td></tr>" << std::endl;
  return 0;
}










More information about the Digitalmars-d-learn mailing list