New set class

Sun Feb 25 16:19:51 PST 2007

Michiel wrote:
> Bill Baxter wrote:
> 
>>> I have written a new class to represent sets, with several useful
>>> features. Uploaded for your review and free use. I don't know much about
>>> open source and licensing, but I request that if you use the class, that
>>> you keep my name in the author section of the documentation.
>>>
>>> http://files.michiel.eu/set.d
>>>
>>> Tell me what you think.
>> If you add an opApply with the signature
>>    int opApply(int delegate(uint i, inout Type) dg) { ... }
>>
>> it will allow this kind of foreach too:
>>    foreach(i, e; myset) {
>>        ...
>>    }
> 
> That's why I didn't do it. What meaning would it have for a set, since
> they have no index?

It's useful to the user who wants to print out the items numbered or 
something.  Or they have a set of 10 items and they want to copy them to 
a preallocated 10-item list.  Or they want to make another associative 
array that maps numbers to the items in the set.  Or they want to do 
something with only the first five items in the set then if(i>5) break;. 
  Or they want to append a number to every item in the set.  Or they 
want to split the set into two smaller sets arbitrarily using if (i%2).

Yes the user can always declare their own damn counter, but it doesn't 
make your code much more complex, and it does make their lives a little 
bit easier.

> 
>> Also I would just leave out opApplyReverse.  That way using it is a
>> compile-time error.  Or if you're dead set on it, at least have it throw
>> an exception.  I believe assert() is a no-op in release builds, meaning
>> your program will fail in strange ways in release.  Actually I'm
>> surprised that compiles without a return statement.
> 
> assert(0) is evaluated by D as a point in the code you can't ever reach.
> I got the idea from the opCmp example:
> 

Interesting.  I didn't realize that.  That's really not obvious from the 
docs that that's what happens:

"""
The expression assert(0) is a special case; it signifies that it is 
unreachable code. Either AssertError is thrown at runtime if it is 
reachable, or the execution is halted (on the x86 processor, a HLT 
instruction can be used to halt execution). The optimization and code 
generation phases of compilation may assume that it is unreachable code.
"""  -- http://www.digitalmars.com/d/expression.html#AssertExpression

Nowhere does it say it that it is "unconditional regardless of 
debug/release flags".

This page talks about asserts but doesn't mention debug/release behavior 
either.
http://www.digitalmars.com/d/dbc.html

Actually I can't find where disabling asserts for release builds is 
documented, though I'm sure I've seen it somewhere before.  A quick 
test, however, reveals that you are indeed correct: "regular" asserts 
turn off in a release build, but assert(0) does not.

Well that's good to know.  Still, it generates a runtime error, when you 
really would prefer a compile-time error.   So I think it's a bad idea 
to start putting in lots of
    opFoo() { assert(0,"This method doesn't exist"); }
Once you start putting those in, where do you stop?  I'm sure there are 
other operators and functions that Set does not implement.

It's almost like comments along the lines of:
    // this is an integer that does the counting:
    int counter;

The fact that the method is not there is all the documentation you need.

But if you really want to do put such things in your code you could try 
this:
struct Set(T) {
     ...
     void opApplyReverse()() {static assert(0,"don't be silly");}
}

Since foo is a template member, no attempt is made to compile it unless 
it's actually called from somewhere.  If you do try to call it then you 
trip the static assert at compile-time.

>> Finally
>>    Set!(T[0]) set(T...)(T elements)
>> why not just
>>    Set!(T) set(T)(T elements...) ??
> 
> You only use that syntax if T is an aggregate type. So the resulting set
> would actually be Set!(T[0]) anyway. I think I tried this before and had
> some problems, but I'm not certain now.

I don't follow what you mean about "aggregate type".  The unittest uses 
int as the type.  I don't think 'int' is considered an aggregate.

>> Have you looked at the HashSet implementation in Tango for comparison?
> 
> I had not. Is this the one?
> 
> http://www.dsource.org/projects/tango/docs/current/tango.util.collection.HashSet.html

Yes.

> Though I haven't looked at the source code, I see it exposes many of its
> internal workings to the outside. When you are working with a set you
> don't want to know if it's using a hash table or a balanced tree. Just
> that it implements a set.

It also doesn't include set operations like you have.  (union, 
intersection, difference, symmetric difference).  And I agree with you 
that it looks like it exposes too much of its innards for most users.

> Though I'm sure that they put more work into it than I and that it's
> more efficient. I just sneakily used an associative array on the inside.
> Which uses its own hash table, if I'm not mistaken.

It looks to be based on code by Doug Lea, who wrote dlmalloc, one of the 
fastest implementations of malloc around.  So I suspect performance is 
pretty good.

Another thing you might want to add to your set class is opCat (~) and 
opCatAssign (~=).  Those are pretty commonly used in D for appending 
something to something else.  They'd just be synonyms for add and union.

--bb