Struct-typing in Phobos

Chris Williams yoreanon-chrisw at yahoo.co.jp
Wed Feb 5 15:47:02 PST 2014


On Tuesday, 4 February 2014 at 21:07:01 UTC, Dicebot wrote:
> "alias this" for the rescue again. You can a struct private 
> member of a class and alias it.

So it seems. I have a vague recollection of using alias this a 
long time ago, so I presume that I knew all of its rules back 
then, but had since forgotten.

Here's an updated version of the writeup. Again, I'd like to 
recommend it as one of the How-tos or Articles on the top page, 
to help people out with using Phobos. Is that something I can do 
via GitHub and a pull request, or would the dlang.org admin need 
to do that (if he felt it worthwhile)?

----
The core library of a language should be small, fast, powerful,
and generic. This last item, however, causes a problem for
library architects. Traditionally, the more generic something is,
the larger and slower it is - with several layers of abstraction
and indirection to allow many disparate implementations of a
generic idea to all serve under the same interface. D supports
all of the features that would allow one to write a library like
this, but it also supports features that allow one to write a
library which does not sacrifice performance for generality.

For example, here would be a standard declaration of a Set
interface and a simple implementation as a class:

interface Set(T) {
     size_t length();
     size_t insert(T);
     T front();
     T back();
     void removeFront();
     void removeBack();
}

class HashSet(T) : Set!(T) {
private:
     void[0][T] data;

public:
     size_t length() {
         return data.length;
     }

     size_t insert(T value) {
         data[value] = [];
         return data.length;
     }

     T front() {
         foreach (k, v; data) {
             return k;
         }
         return T.init;
     }

     T back() {
         foreach_reverse (k, v; data) {
             return k;
         }
         return T.init;
     }

     void removeFront() {
         if (data.length == 0) return;

         T key;
         foreach (k, v; data) {
             key = k;
             break;
         }
         data.remove(key);
     }

     void removeBack() {
         if (data.length == 0) return;

         T key;
         foreach_reverse (k, v; data) {
             key = k;
             break;
         }
         data.remove(key);
     }
}

When we write a function which accepts a Set, we can write:

void foo(Set someSet) {
     someSet.insert(a);
     writeln(someSet.front);
}

This will accept and operate on any class which implements the
Set interface (as you would expect).

To reduce overhead, in the D standard library, it is standard to
try and implement this as a struct instead of as a class. But if
we implement HashSet as a struct (which cannot implement an
interface), then we would traditionally have no way to write
functions which can accept generic types, like Sets.

This is solved firstly by use of the templating system. If I have
a function which accepts a templated type, then so long as the
code compiles once the type has been resolved, any arbitrary
class or struct which implements the correct functions will
compile and operate correctly.

void foo(Set)(Set someSet) { // "Set" is just a templated type
name
     someSet.insert(a); // Only compiles if Set resolves to a type
with insert and front defined
     writeln(someSet.front);
}

In a sense, this is a method for duck-typing in D, though not a
very good one since if we look at the function declaration, it is
not clear what types are or are not valid as parameters to foo().
We must look through the implementation of foo() to find useages
of someSet to determine what is required. Likewise, there is
nothing like "interface Set" which is mandating a standardized
set of methods that all of our functions can expect and which
developers of new implementations of Set types can know to
implement.

This is corrected by use of compile-time duck-type checking
templates. First we declare a template which implements a static
test of compilability of the type for our chosen interface.

template isSet(Set) {
     enum bool isSet = is(typeof(
     (inout int = 0)
     {
         Set!int s = void; // can define an instance
         if (s.length == 0) {} // can test size
         s.insert(1); // can insert
         auto h = s.front; // can access front
         h = s.back; // can access back
         s.removeFront; // can remove front
         s.removeBack; // can remove back
     }));
}

Following this, we first want to validate that our implementation
is compliant.

struct HashSet(T) {
     static assert( isSet!(HashSet!T) );
     ...
}

We can then upgrade foo() to also require only types which
conform:

void foo(Set)(Set someSet) if (isSet!Set) {
     ...
}

While more verbose, this sort of duck-typing offers advantages
that allow for further optimization beyond just using structs 
instead of classes. Since the isSet() validation only checks 
'compilability', anyone wishing to implement the Set type can 
write any code that is visually equivalent to what is written in 
the checker.

For example, one type that is already defined in the Phobos 
standard library is InputRange (defined via the isInputRange() 
validator in std.range). Any sort of stream of data should be 
able to be defined as an InputRange. You can grab the current 
value, move to the next, and test whether it is empty.

Pseudo-random number generators can be considered to be 
InputRanges. You can grab the current value, ask for the next, 
and test whether there are more values to be retrieved. Since it 
will infinitely generate more values, any implementation of the 
"empty" check will return false. If InputRange was defined as an 
interface, empty would need to be defined as a method. But since 
isInputRange tests for the compilability of the code, "if 
(r.empty) {}", the random number generators in std.random are 
able to define "empty" as a constant enum with the value of 
"false", rather than a method, improving the performance time for 
any empty tests.

Another similar benefit is the ability to allow undefined return 
types. For example, asymmetric cryptography algorithms should 
always generate Public and Private keys, but there is nothing 
shared between an RSA public key and an El-Gamal public key in 
terms of properties nor interface. In both cases, the values are 
black boxes of use only to the algorithm in question. A checker 
for isAsymmetricCrypt can verify that an object contains 
properties "publicKey" and "privateKey" without having to check 
that they are of any particular type:

template isAsymmetricCrypt(Crypt) {
     enum bool isAsymmetricCrypt = is(typeof(
     (inout int = 0)
     {
         Crypt c = void; // can define an instance
         c.generate; // Can generate a key pair
         auto pub = c.publicKey; // Has a public key
         auto prv = c.privateKey; // Has a private key
         c.publicKey = pub; // Keys can be set
         c.privateKey = prv;
     }));
}

Using interfaces, all public and private keys would need to have 
a shared type, even if it was just an empty interface.

Theoretically, because we are using structs instead of classes or 
interfaces, inheritence would be impossible. D's "alias this" 
feature allows one to bypass this problem, so that any 
struct-based type can be inserted into a second struct or class 
and gain all of the features of the aliased type.

For example, if we wanted to convert our HashSet struct into a 
class and add methods which allow us to track insertions and 
deletions, we can do the following:

class LogSet(T) {
     private HashSet!T base;
     alias base this;

public:
     size_t insert(T value) {
         writefln("Inserting %s", value);
         return base.insert(value);
     }

     void removeFront() {
         writeln("Removing front");
         base.removeFront();
     }

     void removeBack() {
         writeln("Removing back");
         base.removeBack();
     }
}

We can override the methods that we wish to override while 
allowing calls to LogSet(T).front, LogSet(T).back, and 
LogSet(T).length all go direct to HashSet(T).


More information about the Digitalmars-d mailing list