Is mimicking a reference type with a struct reliable?
Denis Koroskin
2korden at gmail.com
Sat Oct 16 09:23:50 PDT 2010
On Sat, 16 Oct 2010 20:16:40 +0400, Steven Schveighoffer
<schveiguy at yahoo.com> wrote:
> On Sat, 16 Oct 2010 11:52:29 -0400, Denis Koroskin <2korden at gmail.com>
> wrote:
>
>> First I'd like to say that I don't really like (or rather use) Appender
>> because it always allocates (at least an internal Data instance) even
>> when I provide my own buffer.
>> I mean, why would I use Appender if it still allocates? Okay, you have
>> to store a reference to an internal representation so that Appender
>> would feel like a reference type.
>
> Appender needs to be a reference type. If it's not then copying the
> appender will stomp data. Let's say appender is not a reference type,
> you might expect the data members to look like:
>
> struct Appender(T)
> {
> uint capacity;
> T[] data;
> }
>
> Now, if you copy an appender to another instance, it gets its *own* copy
> of capacity. You append to a1, no problems. You then append to a2 and
> it overwrites the data you put in a1.
>
> It might be possible to do an unsafe appender that uses a pointer to a
> stack variable for its implementation. But returning such an appender
> would escape stack data. This would however obviate the need to
> allocate extra data on the heap.
>
> A final option is to disable the copy constructor of such an unsafe
> appender, but then you couldn't pass it around.
>
> What do you think? If you think it's worth having, suggest it on the
> phobos mailing list, and we'll discuss.
>
> Note that Appender is supposed to be fast at *appending* not
> initializing itself. In that respect, it's very fast.
>
>> I'm not sure it's worth the trade-off, and as such I defined and use
>> my own set of primitives that don't allocate when a buffer is provided:
>>
>> void put(T)(ref T[] array, ref size_t offset, const(T) value)
>> {
>> ensureCapacity(array, offset + 1);
>> array[offset++] = value;
>> }
>>
>> void put(T)(ref T[] array, ref size_t offset, const(T)[] value)
>> {
>> // Same but for an array
>> }
>>
>> void ensureCapacity(ref char[] array, size_t minCapacity)
>> {
>> // ...
>> }
>
> I'm not sure what ensureCapacity does, but if it does what I think it
> does (use the capacity property of arrays), it's probably slower than
> Appender, which has a dedicated variable for capacity.
>
No, it doesn't use capacity, it uses length as a capacity instead:
void ensureCapacity(T)(ref T[] array, size_t minCapacity)
{
size_t capacity = array.length;
if (minCapacity < capacity) {
return;
}
// need resize
capacity *= 2;
if (capacity < 16) {
capacity = 16;
}
if (capacity < minCapacity) {
capacity = minCapacity;
}
array.length = capacity;
}
The usage pattern is as follows:
dchar[] toUTF32(string s, dchar[] buffer = null)
{
size_t size = 0;
foreach (dchar d; s) {
buffer.put(size, d);
}
return buffer[0..size];
}
>> Back to my original question, can we mimick a reference behavior with a
>> struct? I thought why not until I hit this bug:
>>
>> import std.array;
>> import std.stdio;
>>
>> void append(Appender!(string) a, string s)
>> {
>> a.put(s);
>> }
>>
>> void main()
>> {
>> Appender!(string) a;
>> string s = "test";
>>
>> append(a, s); // <
>>
>> writeln(a.data);
>> }
>>
>> I'm passing an appender by value since it's supposed to have a
>> reference type behavior and passing 4 bytes by reference is an overkill.
>>
>> However, the code above doesn't work for a simple reason: structs lack
>> default ctors. As such, an appender is initialized to null internally,
>> when I call append a copy of it gets initialized (lazily), but the
>> original one remains unchanged. Note that if you append to appender at
>> least once before passing by value, it will work. But that's sad. Not
>> only it allocates when it shouldn't, I also have to initialize it
>> explicitly!
>>
>> I think far better solution would be to make it non-copyable.
>>
>> TL;DR Reference semantic mimicking with a struct without default ctors
>> is unreliable since you must initialize your object lazily. Moreover,
>> you have to check that you struct is not initialized yet every single
>> function call, and that's error prone and bad for code clarity and
>> performance. I'm opposed of that practice.
>
> This is a point I've brought up before. As of yet there is no
> solution. There have been a couple of ideas passed around, but there
> hasn't been anything decided. The one idea I remember (but didn't
> really like) is to have the copy constructor be able to modify the
> original. This makes it possible to allocate the underlying
> implementation in Appender for example, even on the data being passed.
> There are lots of problems with this solution, and I don't think it got
> much traction.
>
> I think the default constructor solution is probably never going to
> happen. It's very nice to always have a default fast way to initialize
> structs, and there is precedence (C# has the same rule).
>
> My suggestion would be to have it be an actual reference type -- i.e. a
> class. I don't see any issues with that. In that respect, you could
> even have it be stack-allocated, since you have emplace. But I don't
> have a say in that. I was the last one to update Appender, since it had
> a bug-ridden design and needed to be fixed, but I tried to change as
> little as possible.
>
> -Steve
More information about the Digitalmars-d
mailing list