Is mimicking a reference type with a struct reliable?

Denis Koroskin 2korden at gmail.com
Sat Oct 16 09:59:46 PDT 2010


Sorry, I misclicked a button and send the message preliminary.

On Sat, 16 Oct 2010 20:16:40 +0400, Steven Schveighoffer  
<schveiguy at yahoo.com> wrote:
>
> A final option is to disable the copy constructor of such an unsafe  
> appender, but then you couldn't pass it around.
>
> What do you think?  If you think it's worth having, suggest it on the  
> phobos mailing list, and we'll discuss.
>

It's still possible to pass it by reference, or even by pointer. You know,  
that's what you actually do right now - you are passing a Data* (a pointer  
to an internal state, wrapped with an Appender struct).
Passing by pointer might actually be a good idea (because you can default  
it to null). One of the reasons I use "T[] buffer = null" as a buffer is  
because you aren't force to provide one, null is also a valid buffer. Many  
function would benefit of passing optional Appender (e.g. converting from  
utf8 to utf16 etc), but we shouldn't force them to do so.

> Note that Appender is supposed to be fast at *appending* not  
> initializing itself.  In that respect, it's very fast.
>

This makes it useless for appending small amount of data.

>>  I'm not sure it's worth the trade-off, and as such I defined and use  
>> my own set of primitives that don't allocate when a buffer is provided:
>>
>> void put(T)(ref T[] array, ref size_t offset, const(T) value)
>> {
>>      ensureCapacity(array, offset + 1);
>>      array[offset++] = value;
>> }
>>
>> void put(T)(ref T[] array, ref size_t offset, const(T)[] value)
>> {
>>      // Same but for an array
>> }
>>
>> void ensureCapacity(ref char[] array, size_t minCapacity)
>> {
>>     // ...
>> }
>
> I'm not sure what ensureCapacity does, but if it does what I think it  
> does (use the capacity property of arrays), it's probably slower than  
> Appender, which has a dedicated variable for capacity.
>
>> Back to my original question, can we mimick a reference behavior with a  
>> struct? I thought why not until I hit this bug:
>>
>> import std.array;
>> import std.stdio;
>>
>> void append(Appender!(string) a, string s)
>> {
>> 	a.put(s);
>> }
>>
>> void main()
>> {
>> 	Appender!(string) a;
>> 	string s = "test";
>> 	
>> 	append(a, s); // <
>> 	
>> 	writeln(a.data);	
>> }
>>
>> I'm passing an appender by value since it's supposed to have a  
>> reference type behavior and passing 4 bytes by reference is an overkill.
>>
>> However, the code above doesn't work for a simple reason: structs lack  
>> default ctors. As such, an appender is initialized to null internally,  
>> when I call append a copy of it gets initialized (lazily), but the  
>> original one remains unchanged. Note that if you append to appender at  
>> least once before passing by value, it will work. But that's sad. Not  
>> only it allocates when it shouldn't, I also have to initialize it  
>> explicitly!
>>
>> I think far better solution would be to make it non-copyable.
>>
>> TL;DR Reference semantic mimicking with a struct without default ctors  
>> is unreliable since you must initialize your object lazily. Moreover,  
>> you have to check that you struct is not initialized yet every single  
>> function call, and that's error prone and bad for code clarity and  
>> performance. I'm opposed of that practice.
>
> This is a point I've brought up before.  As of yet there is no  
> solution.  There have been a couple of ideas passed around, but there  
> hasn't been anything decided.  The one idea I remember (but didn't  
> really like) is to have the copy constructor be able to modify the  
> original.  This makes it possible to allocate the underlying  
> implementation in Appender for example, even on the data being passed.   
> There are lots of problems with this solution, and I don't think it got  
> much traction.
>
> I think the default constructor solution is probably never going to  
> happen.  It's very nice to always have a default fast way to initialize  
> structs, and there is precedence (C# has the same rule).
>

I think there is, but it goes far beyond default ctors problem (it solves  
many other issues, too).
Currently, a struct is initialized with T.init/T.classinfo.init
Pros:
simple initialization - malloc, followed by memcpy
there is always an immutable instance of an object in memory, and you can  
use it as default/not initialized state

Cons:
you can't initialize class/struct variables with runtime values
increased file size (every single class/struct now has a copy of its own)

In Java, they use another approach. Instead of memcpy'ing T.init on top of  
allocated data, they invoke a so-called cctor (as opposed to ctor). This  
is a method that initializes memory so that a ctor can be called.  
memcpy'ing T.init has the same idea, however it is not moved into a  
separate method. In general, cctor can be implemented the way it is in D  
without sacrificing anything. However, a type-unique method is a lot  
better than that:

1) most structs initialize all of its members with 0. For these compiler  
can use memset instead.
2) killer-feature in my opinion. It allows initializing values to  
non-constant expressions:

class Foo
{
	ubyte[] buffer = new ubyte[BUFFER_SIZE];
}

This also solves an Appender issue:

struct Appender
{
	Data* data = new Data();
}

3) it allows getting rid of T.init, significantly reducing resulting file  
size

I'm not sure Walter will agree to such a radical change, but it can be  
achieved in small steps. D doesn't even have to get rid of T.init, it can  
still be there (but I'd like to get rid of it eventually)

a) Keep T.init/T.classinfo.init, introduce compiler-generated cctor what  
memcpy'ies T.init over the object
(Optionally) Make cctor more smart, and generate proper class/struct  
initialization code that doesn't rely on T.init
b) Allow non-constant expressions as initializers and initialize such  
members in the cctor
(Optionally) Get rid of T.init altogether

> My suggestion would be to have it be an actual reference type -- i.e. a  
> class.  I don't see any issues with that.  In that respect, you could  
> even have it be stack-allocated, since you have emplace.  But I don't  
> have a say in that.  I was the last one to update Appender, since it had  
> a bug-ridden design and needed to be fixed, but I tried to change as  
> little as possible.
>
> -Steve


More information about the Digitalmars-d mailing list