std.data.json formal review

Mon Aug 17 12:28:17 PDT 2015

Am Mon, 17 Aug 2015 20:56:18 +0200
schrieb Sönke Ludwig <sludwig at outerproduct.org>:

> Am 17.08.2015 um 20:12 schrieb Andrei Alexandrescu:
> > On 8/14/15 7:40 AM, Andrei Alexandrescu wrote:
> >>
> >> struct TaggedAlgebraic(U) if (is(U == union)) { ... }
> >>
> >> Interesting. I think it would be best to rename it to TaggedUnion
> >> (instantly recognizable; also TaggedAlgebraic is an oxymoron as
> >> there's no untagged algebraic type). A good place for it is
> >> straight in std.variant.
> >>
> >> What are the relative advantages of using an integral over a
> >> pointer to function? In other words, what's a side by side
> >> comparison of TaggedAlgebraic!U and Algebraic!(types inside U)?
> >>
> >> Thanks,
> >>
> >> Andrei
> >
> > Ping on this. My working hypothesis:
> >
> > - If there's a way to make a tag smaller than one word, e.g. by
> > using various packing tricks, then the integral tag has an
> > advantage over the pointer tag.
> >
> > - If there's some ordering among types (e.g. all types below 16 have
> > some property etc), then the integral tag again has an advantage
> > over the pointer tag.
> >
> > - Other than that the pointer tag is superior to the integral tag at
> > everything. Where it really wins is there is one unique tag for each
> > type, present or future, so the universe of types representable is
> > the total set. The pointer may be used for dispatching but also as
> > a simple integral tag, so the pointer tag is a superset of the
> > integral tag.
> >
> > I've noticed many people are surprised by std.variant's use of a
> > pointer instead of an integral for tagging. I'd like to either
> > figure whether there's an advantage to integral tags, or if not
> > settle for good a misconception.
> >
> >
> > Andrei
> 
> (reposting to NG, accidentally replied by e-mail)
> 
> Some more points come to mind:
> 
> - The enum is useful to be able to identify the types outside of the
> D code itself. For example when serializing the data to disk, or when 
> communicating with C code.
> 
> - It enables the use of pattern matching (final switch), which is
> often very convenient, faster, and safer than an if-else cascade.
> 
> - A hypothesis is that it is faster, because there is no function
> call indirection involved.
> 
> - It naturally enables fully statically typed operator forwarding as
> far as possible (have a look at the examples of the current version).
> A pointer based version could do this, too, but only by jumping
> through hoops.
> 
> - The same type can be used multiple times with a different enum
> name. This can alternatively be solved using a Typedef!T, but I had
> several occasions where that proved useful.
> 
> They both have their place, but IMO where the pointer approach really 
> shines is for unbounded Variant types.

I think Andrei's point is that a pointer tag can do most things a
integral tag could as you don't have to dereference the pointer:

void* tag;
if (tag == &someFunc!A)

So the only benefit is that the compiler knows that the _enum_ (not
simply an integral) tag is bounded. So we gain:
* easier debugging (readable type tag)
* potentially better codegen (jump tables fit perfectly: ordered values,
  0-x, no gaps)
* final switch

In some cases enum tags might also be smaller than a pointer.