Passing large or complex data structures to threads

Mon May 27 18:55:28 PDT 2013

On 05/27/2013 11:33 PM, Simen Kjaeraas wrote:
> A few questions:
> 
> Why use a class? Will MyDataStore be subclassed?

It was important to me that it have reference semantics, in particular that a =
b implies a is b.

> Will you have some instances of MyDataStore that will be mutated, and
> others that will always stay the same?
> 
> If the answer was yes, will these be in the same array?

I'm not sure I understand the question.  If you mean, are there certain members
of the class that will be mutated and some that won't, then yes.  So, I don't
think I can follow your example of an immutable instance of the whole class.

I'll give a longer explanation of what I'm trying to do, just for context.  I'm
carrying out Monte Carlo simulations and have been trying to write a fairly
generic set of code for that purpose.  Essentially I define a range which covers
successive steps of the Monte Carlo process.

What we're doing here is simulating a model on a system with a given
configuration.  The Monte Carlo process randomly mutates the configuration and
alternatively selects or rejects the mutation depending on a given fitness function.

So, the process needs to be handed two sets of data.  The first, which I call
"state", defines the variables that change when the model is run.  The second
set, which I call the "seed", are the parameters that are constant relative to
the model being examined, but that can be mutated by the Monte Carlo process.
(So, for example, one can optimize the configuration according to certain
criteria for how we want the model to behave.)

Now, depending on the models I'm examining, obviously the contents of the state
and the seed may vary.  The solution I found was to define a struct of the form,

struct MonteCarlo(State, Seed /* some other parameters */)
{
	this(ref State st, ref Seed sd /* other stuff */)
	{
	}
}

... and internally, this stores three different State and Seed instances: one
which stores the optimal solution found; one which stores the current selected
state and seed; and finally, one which stores the mutated state and seed.

I guess there could be other ways to handle these kinds of variable input data
in a generic way, but the easiest I could think of was just to define State and
Seed storage classes that would gather together all the relevant variables.
Both have forms along the lines of

	class StateInstance
	{
		double[] a;
		double[] b;
		size_t c;
	}

	class SeedInstance
	{
		double[] d;
		size_t[] e;
		Tuple!(size_t, size_t)[][] f;
	}

Now, _some_ of what goes into the Seed can be data imported from file, that
really will never change, and it's convenient to pass it to threads as
immutable.  But I don't want to force it to _always_ be immutable inside the
Seed class, because there could be other cases where it's that data that's being
mutated by the Monte Carlo process.

All of this feels like a lot of fuss over not a lot, because I have working
solutions -- it just needs to be edited and recompiled in order to run with
different input data, which is not actually that onerous for my use case.  But
it'd be nice to be able to tidy everything up a bit in case it can be useful to
other people when it gets released.

Hence the question about passing data to threads, and then the problem of how to
incorporate that data into a storage class.

> Short answer: If you will have mixed arrays, no. There's no way to make
> that safe. If you don't have mixed arrays, there are ways.

So you mean there's no way to have one member variable be immutable, the rest
mutable, without hardcoding that into the class/struct design?