Question about CPU caches and D context pointers

Mon Feb 17 19:15:58 PST 2014

I've had his question at the back of my mind and I know it's 
probably related to back-end optimizations but I'm taking a 
chance to see if anyone knows anything.

I know everything about how insignificant the speed difference 
may be, but keep in mind this is to further my low-level 
understandings. Here's an example to illustrate the question 
because it's quite complicated (to me):

#1 contextual function
struct Contents {
	ubyte[] m_buffer;

	this(){
		m_buffer = new ubyte[4092];
	}

	rcv(string str){
		m_buffer ~= str;
	}

	flush(){
		send_4092_bytes_of_data_to_final_heap_buffer()
		m_buffer.reset();
	}

}

vs..
#2 context-less function

rcv(string str){
	send_small_bytes_of_data_to_final_heap_buffer(str);
}

The first case is the struct. When entering rcv() function, I 
know the pointer and length of m_buffer are on the stack at that 
point. That's pretty damn fast to access b/c the CPU caches keep 
these at level 1 through the whole routine. However, It's not 
obvious to me if the memory where m_buffer points to will stay in 
the CPU cache if there's 5 consecutive calls or so to this same 
routine in the same thread. Also note, it will flush to another 
buffer, so there's more heap roundtrips with buffers if the CPU 
cache isn't efficient.

The second case (context-less) just sends the string right 
through to the final allocation procedure (another buffer), and 
the string stays a function parameter so it's on the stack, thus 
in the CPU cache through every call frame until the malloc takes 
place (1 heap roundtrip regardless of any optimization).

So, would there be any chance for the m_buffer's pointee region 
to stay in the CPU cache if there's thousands of consecutive 
calls to the struct's recv, or do I forcefully have to keep the 
data on the stack and send it straight to the allocator? Is there 
an easy way to visualize how the CPU cache empties or fills 
itself, or to guarantee heap data stays in there without using 
the stack?

I'm sorry if the question seems complicated, I read everything 
Ulrich Drepper had to say in What every programmer should know 
about memory, and I still have a bit of a hard time visualizing 
the question myself.