Java memory efficiency and column-oriented data

Fri Feb 3 16:05:32 PST 2012

Timon Gehr:

> I like it and I think it would be a valuable addition to Phobos,

Marco Leise has asked for some benchmarks first, and I think he's right.

> as well as a nice showcase for Ds templates and string mixins.

Phobos is meant to be first of all _useful_ :-)

> You didn't 
> specify what the type of eg. a3.x_y_array is, I think it should be 
> Tuple!(int, "x", float, "y")[].

OK.

> Furthermore, I'd rather have it named a3.x_y.

The problem is when you ask for a single field then the name becomes something just like "x" that I think is misleading because it's not the original x, it's a tuple with a single item. I think a name that helps you tell apart the original 'x' field from this new field is better. That's why I have added a suffix.

> There should also be a possibility to manually name the parts,

I think this makes the splitting string a bit too much complex... This data structure is meant to be as much as possible plain, like a POD array. I'd like to not give the option to change the synthetic field names both to keep ParallelArray simpler, and to help its users to find it _easy_ to tell exactly what it spits out.

> as well as a way to specify 'anything that is left':
> 
> ParallelArray!(Foo, "pos: x y # rest: ...") pa;

This was my first design for ParallelArray, but I think it's not a good idea, and not I prefer the simpler design I've shown. Forcing a stricter management of the names in the optional string allows for a stronger static typing: if you don't do that if you later change the original Foo struct, adding a field, it goes in the "rest" part, and you don't know what's in "rest" any more.

But ParallelArray is meant to be a transparent data structure, so it asks you to list all the original fields once and only once (so if you add ore remove a field in Foo, the compiler asks you to modify all ParallelArray of your program defined on Foo that have a string too.  If you don't give a string,  like   ParallelArray!Foo  then it compiles even if you change Foo).

Anyway, if users later think it's really needed, then this ParallelArray feature is addable later with a syntax like:
ParallelArray!(Foo, "x y # ...") pa;
And the syntax for naming new fields is simple as you say:
ParallelArray!(Foo, "pos: x y # rest: ...") pa;
But I suggest to not put this in a first implementation of ParallelArray and keep things simpler first.

Thank you for your comments,
bearophile