Small Buffer Optimization for string and friends

Michel Fortin michel.fortin at michelf.com
Sun Apr 8 07:59:15 PDT 2012


On 2012-04-08 05:56:38 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail at erdani.org> said:

> Walter and I discussed today about using the small string optimization 
> in string and other arrays of immutable small objects.
> 
> On 64 bit machines, string occupies 16 bytes. We could use the first 
> byte as discriminator, which means that all strings under 16 chars need 
> no memory allocation at all.
> 
> It turns out statistically a lot of strings are small. According to a 
> variety of systems we use at Facebook, the small buffer optimization is 
> king - it just works great in all cases. In D that means better speed, 
> better locality, and less garbage.

Small buffer optimization is a very good thing to have indeed. But… how 
can you preserve existing semantics? For instance, let's say you have 
this:

	string s = "abcd";

which is easily representable as a small string. Do you use the small 
buffer optimization in the assignment? That seems like a definitive yes.

But as soon as you take a pointer to that string, you break the 
immutability guaranty:

	immutable(char)[] s = "abcd";
	immutable(char)* p = s.ptr;
	s = "defg"; // assigns to where?

There's also the issue of this code being legal currently:

	immutable(char)* getPtr(string s) {
		return s.ptr;
	}

If you pass a small string to getPtr, it'll be copied to the local 
stack frame and you'll be returning a pointer to that local copy.

You could mitigate this by throwing an error when trying to get the 
pointer to a small string, but then you have to disallow taking the 
pointer of a const(char)[] pointing to it:

	const(char)* getPtr2(const(char)[] s) {
		return s.ptr;
	}

	const(char)* getAbcdPtr() {
		string s = "abcd";
	
		// s implicitly converted to regular const(char)[] pointing to local 
stack frame
		const(char)* c = getPtr2(s);

		// c points to the storage of s, which is the local stack frame
		return c;
	}

So it's sad, but I am of the opinion that the only way to implement 
small buffer optimization is to have a higher-level abstraction, a 
distinct type for such small strings.


-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/



More information about the Digitalmars-d mailing list