Is mimicking a reference type with a struct reliable?

Steven Schveighoffer schveiguy at yahoo.com
Sat Oct 16 09:16:40 PDT 2010


On Sat, 16 Oct 2010 11:52:29 -0400, Denis Koroskin <2korden at gmail.com>  
wrote:

> First I'd like to say that I don't really like (or rather use) Appender  
> because it always allocates (at least an internal Data instance) even  
> when I provide my own buffer.
> I mean, why would I use Appender if it still allocates? Okay, you have  
> to store a reference to an internal representation so that Appender  
> would feel like a reference type.

Appender needs to be a reference type.  If it's not then copying the  
appender will stomp data.  Let's say appender is not a reference type, you  
might expect the data members to look like:

struct Appender(T)
{
    uint capacity;
    T[] data;
}

Now, if you copy an appender to another instance, it gets its *own* copy  
of capacity.  You append to a1, no problems.  You then append to a2 and it  
overwrites the data you put in a1.

It might be possible to do an unsafe appender that uses a pointer to a  
stack variable for its implementation.  But returning such an appender  
would escape stack data.  This would however obviate the need to allocate  
extra data on the heap.

A final option is to disable the copy constructor of such an unsafe  
appender, but then you couldn't pass it around.

What do you think?  If you think it's worth having, suggest it on the  
phobos mailing list, and we'll discuss.

Note that Appender is supposed to be fast at *appending* not initializing  
itself.  In that respect, it's very fast.

>  I'm not sure it's worth the trade-off, and as such I defined and use my  
> own set of primitives that don't allocate when a buffer is provided:
>
> void put(T)(ref T[] array, ref size_t offset, const(T) value)
> {
>      ensureCapacity(array, offset + 1);
>      array[offset++] = value;
> }
>
> void put(T)(ref T[] array, ref size_t offset, const(T)[] value)
> {
>      // Same but for an array
> }
>
> void ensureCapacity(ref char[] array, size_t minCapacity)
> {
>     // ...
> }

I'm not sure what ensureCapacity does, but if it does what I think it does  
(use the capacity property of arrays), it's probably slower than Appender,  
which has a dedicated variable for capacity.

> Back to my original question, can we mimick a reference behavior with a  
> struct? I thought why not until I hit this bug:
>
> import std.array;
> import std.stdio;
>
> void append(Appender!(string) a, string s)
> {
> 	a.put(s);
> }
>
> void main()
> {
> 	Appender!(string) a;
> 	string s = "test";
> 	
> 	append(a, s); // <
> 	
> 	writeln(a.data);	
> }
>
> I'm passing an appender by value since it's supposed to have a reference  
> type behavior and passing 4 bytes by reference is an overkill.
>
> However, the code above doesn't work for a simple reason: structs lack  
> default ctors. As such, an appender is initialized to null internally,  
> when I call append a copy of it gets initialized (lazily), but the  
> original one remains unchanged. Note that if you append to appender at  
> least once before passing by value, it will work. But that's sad. Not  
> only it allocates when it shouldn't, I also have to initialize it  
> explicitly!
>
> I think far better solution would be to make it non-copyable.
>
> TL;DR Reference semantic mimicking with a struct without default ctors  
> is unreliable since you must initialize your object lazily. Moreover,  
> you have to check that you struct is not initialized yet every single  
> function call, and that's error prone and bad for code clarity and  
> performance. I'm opposed of that practice.

This is a point I've brought up before.  As of yet there is no solution.   
There have been a couple of ideas passed around, but there hasn't been  
anything decided.  The one idea I remember (but didn't really like) is to  
have the copy constructor be able to modify the original.  This makes it  
possible to allocate the underlying implementation in Appender for  
example, even on the data being passed.  There are lots of problems with  
this solution, and I don't think it got much traction.

I think the default constructor solution is probably never going to  
happen.  It's very nice to always have a default fast way to initialize  
structs, and there is precedence (C# has the same rule).

My suggestion would be to have it be an actual reference type -- i.e. a  
class.  I don't see any issues with that.  In that respect, you could even  
have it be stack-allocated, since you have emplace.  But I don't have a  
say in that.  I was the last one to update Appender, since it had a  
bug-ridden design and needed to be fixed, but I tried to change as little  
as possible.

-Steve


More information about the Digitalmars-d mailing list