No struct extending?

Mon Sep 11 05:45:19 PDT 2006

Steve Horne wrote:
> In C++, 'struct' is (almost) a synonym for 'class'. More a declaration
> of intent than a different thing. One useful side-effect of this is
> that you can declare structs as extensions of other structs. This can
> be useful even for plain-old-data. e.g. data structures with several
> node types, but some shared fields. Inheritance doesn't always imply
> virtual tables and stuff.
> 
> Of course you can handle this using...
> 
>   struct s_Branch
>   {
>     s_Common m_Common;
> 
>     ... (branch specific stuff)
>   }
> 
> But all that 'pointer.m_Common.actual_member' is a pain.
> 
> Adding a union...
> 
>   union u_Variant
>   {
>     s_Common m_Common;
>     s_Branch m_Branch;
>     s_Leaf   m_Leaf;
>   }
> 
> just means you have to specify 'm_Branch' or 'm_Leaf' for the
> non-shared fields too.
> 
> Anonymous structs and unions can save on these extra dots and
> identifiers, but they do a different job. They can't give you a family
> of structs. Only a single struct/union combo - a variant record.
> 
> Plus, one thing I have in mind is templates that build up in layers
> (mix-in layers pattern), and there will be an arbitrary number of
> extensions applied. For example, if you want ordered keys, you apply
> the 'gimme-keys' template as a mixin layer, and it extends whatever
> structures, classes and methods it needs to.
> 
> This can still be handled - you just compose an access class in
> parallel along with the structs. But it's a hassle.
> 
> Now, truth told, this mix-in layers bit isn't important. 'static if'
> probably means I'm better off specifying things using mix-in layers
> (setting up flags and aliases), but putting most of the final code in
> one big template. It will be a lot more readable and maintainable that
> way. The mix-in layers pattern is then mostly a way of avoiding having
> too many parameters for one template.
> 
> I could use a D mixin to define the common fields, of course. And
> whatever approach I take, there's pointer casting based on the
> run-time type so that's not a big deal, though the C++ approach is
> nice in that casting to the 'base struct' is implicit.
> 
> What bothers me is the chance of this happening...
> 
>   struct c_Common
>   {
>     mixin(common fields)
>   }
> 
>   struct c_Leaf
>   {
>     int m_Misplaced_Field;  //  whoops!
>     mixin(common fields)
>   }
> 
> Or, for that matter, the same just using nested structs.
> 
>   struct s_Branch
>   {
>     int      m_Misplaced_Field;  //  whoops!
>     s_Common m_Common;
>   }
> 
> That is, there is no rule forcing the shared part into matched
> locations in all structures, so when you do the union/pointer
> casts/whatever you can end up looking at the wrong memory.
> 
> So - am I being paranoid?
> 
> It's a small thing, especially given the amount of work that D has
> already saved me. And I'm not even convinced it's real. The node
> layouts above, for instance, will all be part of the same module and
> maintained together anyway. The odds of a misplaced field like that
> should be next to zero, and the same probably applies in any family
> tree of related structs.
> 
> I just thought I'd raise it and see what others think.

I suspect this is a perfect case of the Square Triangle. Happens to me 
too. The solution sought being just a notch off the problem, concepts 
fighting for neurons, and the goal but a mirage, elusive and yet so 
tempting.

And then of course, I may be misunderstanding the whole issue, for all I 
know.

Anyway, as I understood it, you have struct instances (from somewhere, 
like a C library routine or a file, etc., let's call them 
ForeignInstances) and you need to glue some new fields to them so that 
you can process them without needing to write reams of code that keeps 
track of what YourProperties belong to which item. And this has to work 
with several (more or less) different, but still conceptually related 
kinds of ForeignInstances.

One could use a concoction of templates, mixins, unions and inheritance 
to create a module (or a library) which then lets one handle the 
situation simply and cleanly in main code. (Either in current D, or 
after Walter makes some needed tweaks.) Carefully writing the module 
would let one be reasonably sure that the fields align right, maybe even 
have suitable dynamic and/or static checks and asserts, to (almost) 
enforce integrity.

The end result, or the goal, being that one ends up with in-memory 
(let's call them) prints, the beginning of which is exactly the same as 
in YourProperties and the rest the same as ForeignInstances.

Having got this far, one can then use functions from the module to 
handle the non-instance-type-specific manipulation of the 
ForeignInstances. Presumably one would either already have, or else 
write specific routines to do all the actual instance type specific 
stuff (mostly access and assignment).

You make it more difficult by bringing up the issue of physical field 
alignment. (See 
http://en.wikipedia.org/wiki/Fragile_binary_interface_problem which 
incidentally is on a wrong page(!!) since it discusses the Fragile Base 
Class problem. Oh well.)

I suspect you wanted to have all this more ambitious? Like having 
several alternate sets of YourProperties (and of course matching code 
sets), so that each set could be used for a different purpose, like 
sorting, selecting, serializing, combining, printing, etc. of the "prints".

---

If I got a task like this, I'd probably do it a lot simpler.

First, these ForeignInstances have to enter our process somehow. At that 
point we either have to recognize the specific type of each, or we know 
it from the context. In both cases, if we wanted to "slap on" our own 
fields to them, we need to copy them somewhere else in memory. (If we 
only have a couple of such instances coming one at a time all this 
effort is a waste, and if they come from a stream or a file then they're 
too near each other to have room for our fields anyhow.)

Thus, one of the main points of physically having our own fields 
attached to the ForeignInstances disappears: speed. And with sorting, 
unless they're very small, it will be more practical to sort linked 
lists of just pointers to them. So it seems some of the performance wins 
just vanish.

The other point for having our and their fields together was "kinda 
integrity", in other words, we wouldn't have to separately keep track of 
our and their data. Now, from above, it seems that having our and their 
data together at all times may actually incur more work for us than more 
traditional programming.

This all means we simply have to have routines for import and export. 
Now, the issues of machine word width and endianness, only exist when 
carrying the data (or porting the program) between different 
architectures. As to the data, simply exactly specifying the record 
layout takes care of both word size and endianness. (Hey, a .gif file is 
a .gif file no matter what computer you have.) As to within the program, 
as long as we access the fields with their names (as opposed to bit 
twiddling), we're okay. The process of importing or exporting converts 
between the "file format" and our internal representation (whatever it 
may be), and that process is where endianness and word size are take 
care of.

Implementation I'd start with simple textbook OO. We (feel entitled to) 
assume that the various ForeignInstance types do have much in common. 
This automatically suggests a Base Class that has methods to store, 
retrieve and manipulate the common fields. It would also have abstract 
methods that cater for what has to be done with the instance type 
specific fields, more or less constituting an interface specification in 
reality.

Each ForeignInstance subtype would then just be a sublclass of this base 
class, implementing only the specifics. Instances of these could then be 
stored in data structures (arrays, lists, trees) and referenced to as 
the base type. Thanks to polymorphism, we could then "just use" these 
instances without worrying which specific type each is. We'd also get 
adequate performance, without even trying.

Clear, concise and KISS. Oh, and way more robust and maintainable.

Ok, your original issue (as I understood it anyway) was not one of 
practical implementation, rather a (in itself very intriguing) thoughlet 
about the relationship of mixins, structs and their manipulation, both 
in source code and at runtime. My point (and I apologize) was merely 
that I can't see a suitable problem for your solution. :-)