A question about move() and a rant about shared

Stanislav Blinov stanislav.blinov at gmail.com
Fri Jan 24 09:07:42 PST 2014


Ok, this is going to be a long one, so please bear with me.

I'll start with a question.

1. std.algorithm.move() and std.container

TDPL describes when a compiler can and cannot perform a move 
automatically.
For cases when it isn't done automatically but we explicitly 
require a move, we have
std.algorithm.move(). This function comes in extremely handy 
especially when we
want to pass around data that cannot be otherwise copied 
(disabled this(this)).
For example, sinking some unique value into a thread or storing 
it in a container.
Or even both.

But why is there no practical way of storing such uncopyable data 
in standard
containers? I.e. both Array and DList do try perform a copy when 
insert() is called,
and happily fail when this(this) is @disabled. Same with access: 
front() returns
by value, so again no luck with @disabled this(this).

What is interesting though is that range interfaces for containers
do allow for moveFront() et al., and for some containers they're 
even defined.
So it's safe to move contents *out* but not *in*?
Is there some deeper technical reasoning behind this that I fail 
to see?




Below is a medium rant that's somewhat unrelated to the above, 
and is
aimed at receiving insights from those who're interested,
so if you're not, just skip it :)

I'll use quotes here to distinguish words from language qualifier.
This is mostly my current thoughts on "shared" and its usage, and 
I'd like
that you could point out where I'm wrong in this sensitive topic, 
any feedback
is greately appreciated.


2. "shared" is transitive. How transitive?

Declaring something as "shared" means that all its representation 
is also "shared".
This is a good thing, right?. But it does have certain 
implications. Consider a shared
data structure (for example, a (multi)producer (multi)consumer 
queue). If it's designed
to store anything that has indirections (pointers, references), 
those are better be
either provably unique (not possble in D except for immutable 
data),
or "shared". In fact, the whole stored type should just be 
"shared",
which is enforced by the compiler. Thus, we come to this:

shared class Queue(T) {
	private Container!T q;  // that'll be shared(Container!T),
	                        // which will in turn store shared(T)
	alias shared(T) Type;

	// push and pop are of course synchronized
	void push(Type) { ... }
	Type pop() { ... }
}

But in case when there are no indirections (i.e. a primitive 
type, or, more practically,
a struct with some primitive fields and a bunch of methods that 
reason about that data
or maybe do something with it) it all comes down to usage. In 
case of that queue, no two threads
could possibly access the same data simultaneously.
Let me define it real quick (somewhat contrived but should state 
the intent):

struct Packet {
	ulong ID;
	ubyte[32] header;
	ubyte[64] data;

	string type() inout @property { ... }
	ulong checkSum() inout @property { ... }
	Variant payload() inout @property { ... }
}

Note that I do have arrays in there, but they cannot possibly 
introduce any aliasing,
since they're static.

As soon as a producer pushes such value, it releases ownership of 
it, and some consumer
later gains ownership. Remember, there are no indirections, so no 
two threads could race
against the same data. But I cannot just declare a plain struct 
and then start
pushing it into that queue. It wouldn't work, because queue 
expects "shared" type.

One solution would be to use a cast. On one hand, this is 
feasible: such data is
really only logically shared when it's somewhere "in-between" 
threads, i.e. sits in a queue.
The "shared" queue owns the data for a moment, and thus makes the 
data iself "shared".
As soon as a consumer pops that value off a queue, it can be cast 
back to non-"shared".

This is ugly: it imposes certain convention in handling one 
"shared"
type (the queue) with another non-"shared" one (the struct): i.e. 
"always cast when push or pop".
Convention is not a reasonable justification for overlooking type 
system.

Another solution would be to instantiate those structs as 
"shared" in the first place.
But that won't work either: now any methods that those structs 
have must also have
"shared" overloads. In other words, I suddenly need to provide 
"shared" interface
for my struct. Well ok, I can do that trivially, by just 
declaring the whole struct type
as "shared". But this is wrong. "shared" advertises certain 
promise: this data is allowed to be
accessed by more than one thread at a time. This implies that 
access to the data
is better be synchronized. In other words, I would have to 
actually *write* the synchronization
for something that would never *need* synchronization. If I don't 
do it and simply
leave the struct declared as "shared" (or have all relevant 
"shared" overloads), I'm
shooting someone in the foot: imagine that later someone starts 
using my types, sees that this struct
is declared "shared" and happily assumes that it can be used 
concurrently. In that
contrived example it would be easy to see that's not actually the 
case. But reality is cruel.
Bang.

So ideally I'd want the queue to handle this situation for me, 
and luckily I can:

shared class Queue(T) {
	static if (hasUnsharedAliasing!T) {
		private Container!T q;           // as before
		alias shared(T) Type;
	} else {
		private __gshared Container!T q; // nothing is "shared" here
		alias T Type;
	}

	// push and pop are still synchronized :)
	void push(Type) { ... }
	Type pop() { ... }
}

But that's not the end of it. As seen from that definition, I'm 
using some
container (Container(T)) as actual storage. Current definition of 
Queue
requires that Container(T) have "shared" interface. Either that, 
or implement
the whole storing business myself right there in the Queue. The 
latter is certainly not
feasible, especially since, depending on requirements for the 
Queue,
I may need different storage capabilities.
In short: I don't need that container to be "shared" at all 
(provided it's a sane container
that doesn't do anything else with the data except for storing 
it). And in fact, if it were
"shared" already, I wouldn't need to define Queue at all, I'd 
just use Container directly.

Therefore, final iteration of Queue would look like this:

shared class Queue(T) {
	static if (hasUnsharedAliasing!T)
		alias shared(T) Type;
	else
		alias T Type;

	private __gshared Container!Type q;

	// still synchronized :)
	void push(Type) { ... }
	Type pop() { ... }	
}

All synchronization issues are handled by Queue, Container merely 
stores the data,
which is in turn "shared" or not depending on its representation.

This is conceptually how std.concurrency works: it allows you to 
send and receive
plain value types without imposing any "shared" qualification on 
them, but as soon
as you try to send a non-"shared" reference type or a struct with 
non-"shared" pointers,
it won't allow it.

There is, however, a nag with this: __gshared is not @safe. But 
getting rid of it
would mean only one thing: the Queue could only ever store 
shared(T), which
kind of kills initial message.

So, is "shared" really not as transitive as D wants it to be?

I imagine by now you already have a big list of "you're 
incorrect"s to stick in my face,
or you probably have already stopped reading :)

I have some more doubts regarding my handling of "shared", but 
I'll leave them for later so as
to not bore you to death.


More information about the Digitalmars-d mailing list