Limited Semi-PolyMorphic (LSP) structs?

Era Scarecrow rtcvb32 at yahoo.com
Mon Aug 26 04:20:13 PDT 2013


  Been a while and out of the loop, I need to get my hands dirty 
in the code again (soon). Anyways, let's get to the matter at 
hand as I'm thinking about it. I'm working on some code (or will 
work on code again) that could use a polymorphic type, but at the 
end it will never be shared externally and can never extend 
beyond 'Known' types. (and the 'types' are behavior 95% of the 
time)


  True polymorphism will let me compile a code to work with say 
'Object' and you can plug in any code and pass it an object and 
it will be happy, or inherit and work with it. These are great as 
classes. But I want to throw a wrench in the gears, I want to use 
structs.


  I know you're asking, Why? Well the data structure I'll be 
working with have records in the hundreds of thousands, but most 
of the time looking at/sorting them I don't necessarily need to 
allocate memory (or really even interact with them beyond 
sorting). Most of the time the OO aspect is just behavior and the 
data between them never changes (behave this way because you're a 
type XXX). It can take a very long time to unpack everything 
(when you never needed to unpack in the first place, or allocate 
and then de-allocate immediately, probably without recycling).


  Alright, structs can't inherit because they don't have an object 
pointer to go to their parent structures that they are connected 
to. If we eliminate the pointer and the overhead on some of that, 
we bring it down to the following:

  To be fully polymorphic you need to:
  * Be able to tell what type it is (Probably enum identifier)
  * Extended types cannot be larger than the base/original (or the 
base has to hold extra data in reserve for it's other types).


  When can this LSP be applicable?
  * When a type can be changed on the fly with no side effects 
other than intended (This 'text' struct is now 'left aligned text 
object')
  * When allocations/pointers aren't needed (why fully build the 
object at all?)
  * When it won't be extended beyond the current type... (only a 
few fixed subtypes)
  * When teardown/deallocation/~this() isn't needed (never 
allocates? Never deallocates! Just throw it all away! POD's are 
perfect!)
  * When you have no extra 'data' to append to a class/struct and 
only override methods/behavior (Encryption type switch from DES 
to BlowFish! Same API/methods, otherwise nothing really changed)
  * When you need a half step up from structs but don't need full 
classes/objects

  Limitations: Hmmm..
  * LSP's cannot be a template or mixin (they can call on them)
  * Known at compile-time
  * Only one of them can handle setup/teardown.


  Now, I wonder. What if we make it so it CAN inherit? We need 3 
examples, where something is in A, A&B, and B. We will get to 
that. iA and iB are the class/structs and A/B/C are the 
methods/functions. So...

//original based on...
class iA {
   void A();
   void C();
}
class iB : iA {
   void A();
   void B();
}

  Breaking them down for structs we have:

struct base_AB { //the 'manager' that does redirection and known 
size
   char type = 'A'; //polymorphic type identifier, for simplicity
   //shared 'parent' baggage ie -> iA
   //baggage space for iB

   //Exists in both iA & iB
   void A() {
     switch(type) {
       case 'A': (cast(iA) this).A(); break;
       case 'B': (cast(iB) this).B(); break;
       default: throw new Exception("method A() invalid for struct 
type:" ~ type);
     }
   }

   //B only exists in inherited
   void B() {
     if (type == 'B') (cast(iB) this).B();
     else throw new Exception();
   }

   //C exists everywhere
   void C() { (cast(iA) this).C(); }
}


  So far the base is easy to make, now however as for how iA and 
iB interact with eachother... Ignoring infinite loops problems:

struct iA {
   char type;
   //baggage - Size must be identical to base_AB

   void A() {
     A();
     //B(); /*iA doesn't really know about B so can't call it from 
here... not safely anyways*/
     C();
   };
}

   Now, C() can be called regardless of the type (no checks 
needed). iB.B() can also be called anywhere in iB with no 
problems, but it would be simpler if all calls regardless went to 
the base and let the base handle it. (Optimizations would 
probably remove unneeded checks later).

struct iB {
   char type;  //and other baggage

   void A() {
     A(); // actually: (cast(base_AB) this).A();
     B(); // actually: (cast(base_AB) this).B();
     C(); // actually: (cast(base_AB) this).C();
   }
   void B() {
     //ditto as A()'s
   }
}


  This can probably be easily done having a different naming set 
which handles the redirection, but the names and boiler-plate 
seems unnecessarily large and verbose for a simple task; 
Optimization will likely remove the excess unneeded portions in 
final binary form anyways.

//different name examples:
  struct base_AB {
   char type;  //and other baggage

   void A() {
     switch(type) {
       case 'A': (cast(iA) this).realA(); break;
       case 'B': (cast(iB) this).realB(); break;
       default: throw new Exception("method A() invalid for struct 
type:" ~ type);
     }
   }

struct iB {
   char type;
   //bagage

   //boilerplate
   void A() { cast(base_AB) this.A(); }
   void B() { cast(base_AB) this.B(); }
   void C() { cast(base_AB) this.C(); }

   void realA() {
     A();
     B(); //Path really is now: this.B() -> base_AB.B() -> 
iB.realB();
     C();
   }

   void realB() {  /* ... */  }
}


   Now how would one build it without all the extra boilerplate 
overhead (also preferably with fewer confusing elements) Three 
ideas are coming to mind.

   1) Template. This can work but it's a lot of overhead, a lot of 
work and is hard to keep track of. I have an experimental version 
that semi-works but can't be used seriously. My project sorta 
relying on this has come to a halt for the moment. It's also very 
finickle where data (and struct order) are involved and refuses 
to compiler out of a certain order. Also for calling functions 
that exist in current structs require you to forcibly call 'this' 
or go around it in order for it to do the redirections elsewhere.


   2) UDA's could be possible, if you tag a struct as belonging 
and then scan/add the appropriate code. But this seems like a lot 
of work and overhead. Probably won't work, plus a new UDA for 
each type vs known fixed extensions. Seems like a lot of extra 
work. Maybe?

   struct base_AB @LSP(iA, iB) {}
   struct iA @LSP_base(base_AB) {}
   struct iB @LSP_base(base_AB) {}


   3) Compiler. This seems the most likely (and 95% of it is 
already in place for classes), but i can't implement it myself 
easily; Plus i am not sure if Walter or Andrei would want/allow 
it. If you squint your eyes you can see it looks very similar to 
the original struct/class formula for C++ (without using 
pointers/new), except that the total size reserved is known/set 
in advance to incorporate all types (fixed size, no 
growing/shrinking by casting).


  Perhaps introducing a third base type (so struct, classes and 
LSP's)? This might help but depends on the amount of extra 
complexity would be needed, or how many bugs it could 
introduce... or incorporate it as the normal struct but is a 
experimental/disabled feature unless you apply a certain 
attribute (or set of?)


   If there's a suggestion of how to make this work and feel more 
natural (class-like without allocation/pointers) then 
suggestions/ideas would be nice. It's possible this type of 
feature is just not wanted. But I'm willing to bet there's quite 
a few places this type of feature/set could be utilized when you 
consider what it can be used for.


More information about the Digitalmars-d-learn mailing list