Is mimicking a reference type with a struct reliable?

Denis Koroskin 2korden at gmail.com
Sat Oct 16 09:23:50 PDT 2010


On Sat, 16 Oct 2010 20:16:40 +0400, Steven Schveighoffer  
<schveiguy at yahoo.com> wrote:

> On Sat, 16 Oct 2010 11:52:29 -0400, Denis Koroskin <2korden at gmail.com>  
> wrote:
>
>> First I'd like to say that I don't really like (or rather use) Appender  
>> because it always allocates (at least an internal Data instance) even  
>> when I provide my own buffer.
>> I mean, why would I use Appender if it still allocates? Okay, you have  
>> to store a reference to an internal representation so that Appender  
>> would feel like a reference type.
>
> Appender needs to be a reference type.  If it's not then copying the  
> appender will stomp data.  Let's say appender is not a reference type,  
> you might expect the data members to look like:
>
> struct Appender(T)
> {
>     uint capacity;
>     T[] data;
> }
>
> Now, if you copy an appender to another instance, it gets its *own* copy  
> of capacity.  You append to a1, no problems.  You then append to a2 and  
> it overwrites the data you put in a1.
>
> It might be possible to do an unsafe appender that uses a pointer to a  
> stack variable for its implementation.  But returning such an appender  
> would escape stack data.  This would however obviate the need to  
> allocate extra data on the heap.
>
> A final option is to disable the copy constructor of such an unsafe  
> appender, but then you couldn't pass it around.
>
> What do you think?  If you think it's worth having, suggest it on the  
> phobos mailing list, and we'll discuss.
>
> Note that Appender is supposed to be fast at *appending* not  
> initializing itself.  In that respect, it's very fast.
>
>>  I'm not sure it's worth the trade-off, and as such I defined and use  
>> my own set of primitives that don't allocate when a buffer is provided:
>>
>> void put(T)(ref T[] array, ref size_t offset, const(T) value)
>> {
>>      ensureCapacity(array, offset + 1);
>>      array[offset++] = value;
>> }
>>
>> void put(T)(ref T[] array, ref size_t offset, const(T)[] value)
>> {
>>      // Same but for an array
>> }
>>
>> void ensureCapacity(ref char[] array, size_t minCapacity)
>> {
>>     // ...
>> }
>
> I'm not sure what ensureCapacity does, but if it does what I think it  
> does (use the capacity property of arrays), it's probably slower than  
> Appender, which has a dedicated variable for capacity.
>

No, it doesn't use capacity, it uses length as a capacity instead:

void ensureCapacity(T)(ref T[] array, size_t minCapacity)
{
	size_t capacity = array.length;
	if (minCapacity < capacity) {
		return;
	}
	
	// need resize
	capacity *= 2;
	
	if (capacity < 16) {
		capacity = 16;
	}

	if (capacity < minCapacity) {
		capacity = minCapacity;
	}

	array.length = capacity;
}

The usage pattern is as follows:

dchar[] toUTF32(string s, dchar[] buffer = null)
{
	size_t size = 0;
	foreach (dchar d; s) {
		buffer.put(size, d);
	}

	return buffer[0..size];
}

>> Back to my original question, can we mimick a reference behavior with a  
>> struct? I thought why not until I hit this bug:
>>
>> import std.array;
>> import std.stdio;
>>
>> void append(Appender!(string) a, string s)
>> {
>> 	a.put(s);
>> }
>>
>> void main()
>> {
>> 	Appender!(string) a;
>> 	string s = "test";
>> 	
>> 	append(a, s); // <
>> 	
>> 	writeln(a.data);	
>> }
>>
>> I'm passing an appender by value since it's supposed to have a  
>> reference type behavior and passing 4 bytes by reference is an overkill.
>>
>> However, the code above doesn't work for a simple reason: structs lack  
>> default ctors. As such, an appender is initialized to null internally,  
>> when I call append a copy of it gets initialized (lazily), but the  
>> original one remains unchanged. Note that if you append to appender at  
>> least once before passing by value, it will work. But that's sad. Not  
>> only it allocates when it shouldn't, I also have to initialize it  
>> explicitly!
>>
>> I think far better solution would be to make it non-copyable.
>>
>> TL;DR Reference semantic mimicking with a struct without default ctors  
>> is unreliable since you must initialize your object lazily. Moreover,  
>> you have to check that you struct is not initialized yet every single  
>> function call, and that's error prone and bad for code clarity and  
>> performance. I'm opposed of that practice.
>
> This is a point I've brought up before.  As of yet there is no  
> solution.  There have been a couple of ideas passed around, but there  
> hasn't been anything decided.  The one idea I remember (but didn't  
> really like) is to have the copy constructor be able to modify the  
> original.  This makes it possible to allocate the underlying  
> implementation in Appender for example, even on the data being passed.   
> There are lots of problems with this solution, and I don't think it got  
> much traction.
>
> I think the default constructor solution is probably never going to  
> happen.  It's very nice to always have a default fast way to initialize  
> structs, and there is precedence (C# has the same rule).
>
> My suggestion would be to have it be an actual reference type -- i.e. a  
> class.  I don't see any issues with that.  In that respect, you could  
> even have it be stack-allocated, since you have emplace.  But I don't  
> have a say in that.  I was the last one to update Appender, since it had  
> a bug-ridden design and needed to be fixed, but I tried to change as  
> little as possible.
>
> -Steve


More information about the Digitalmars-d mailing list