New set class
Bill Baxter
dnewsgroup at billbaxter.com
Sun Feb 25 16:19:51 PST 2007
Michiel wrote:
> Bill Baxter wrote:
>
>>> I have written a new class to represent sets, with several useful
>>> features. Uploaded for your review and free use. I don't know much about
>>> open source and licensing, but I request that if you use the class, that
>>> you keep my name in the author section of the documentation.
>>>
>>> http://files.michiel.eu/set.d
>>>
>>> Tell me what you think.
>> If you add an opApply with the signature
>> int opApply(int delegate(uint i, inout Type) dg) { ... }
>>
>> it will allow this kind of foreach too:
>> foreach(i, e; myset) {
>> ...
>> }
>
> That's why I didn't do it. What meaning would it have for a set, since
> they have no index?
It's useful to the user who wants to print out the items numbered or
something. Or they have a set of 10 items and they want to copy them to
a preallocated 10-item list. Or they want to make another associative
array that maps numbers to the items in the set. Or they want to do
something with only the first five items in the set then if(i>5) break;.
Or they want to append a number to every item in the set. Or they
want to split the set into two smaller sets arbitrarily using if (i%2).
Yes the user can always declare their own damn counter, but it doesn't
make your code much more complex, and it does make their lives a little
bit easier.
>
>> Also I would just leave out opApplyReverse. That way using it is a
>> compile-time error. Or if you're dead set on it, at least have it throw
>> an exception. I believe assert() is a no-op in release builds, meaning
>> your program will fail in strange ways in release. Actually I'm
>> surprised that compiles without a return statement.
>
> assert(0) is evaluated by D as a point in the code you can't ever reach.
> I got the idea from the opCmp example:
>
Interesting. I didn't realize that. That's really not obvious from the
docs that that's what happens:
"""
The expression assert(0) is a special case; it signifies that it is
unreachable code. Either AssertError is thrown at runtime if it is
reachable, or the execution is halted (on the x86 processor, a HLT
instruction can be used to halt execution). The optimization and code
generation phases of compilation may assume that it is unreachable code.
""" -- http://www.digitalmars.com/d/expression.html#AssertExpression
Nowhere does it say it that it is "unconditional regardless of
debug/release flags".
This page talks about asserts but doesn't mention debug/release behavior
either.
http://www.digitalmars.com/d/dbc.html
Actually I can't find where disabling asserts for release builds is
documented, though I'm sure I've seen it somewhere before. A quick
test, however, reveals that you are indeed correct: "regular" asserts
turn off in a release build, but assert(0) does not.
Well that's good to know. Still, it generates a runtime error, when you
really would prefer a compile-time error. So I think it's a bad idea
to start putting in lots of
opFoo() { assert(0,"This method doesn't exist"); }
Once you start putting those in, where do you stop? I'm sure there are
other operators and functions that Set does not implement.
It's almost like comments along the lines of:
// this is an integer that does the counting:
int counter;
The fact that the method is not there is all the documentation you need.
But if you really want to do put such things in your code you could try
this:
struct Set(T) {
...
void opApplyReverse()() {static assert(0,"don't be silly");}
}
Since foo is a template member, no attempt is made to compile it unless
it's actually called from somewhere. If you do try to call it then you
trip the static assert at compile-time.
>> Finally
>> Set!(T[0]) set(T...)(T elements)
>> why not just
>> Set!(T) set(T)(T elements...) ??
>
> You only use that syntax if T is an aggregate type. So the resulting set
> would actually be Set!(T[0]) anyway. I think I tried this before and had
> some problems, but I'm not certain now.
I don't follow what you mean about "aggregate type". The unittest uses
int as the type. I don't think 'int' is considered an aggregate.
>> Have you looked at the HashSet implementation in Tango for comparison?
>
> I had not. Is this the one?
>
> http://www.dsource.org/projects/tango/docs/current/tango.util.collection.HashSet.html
Yes.
> Though I haven't looked at the source code, I see it exposes many of its
> internal workings to the outside. When you are working with a set you
> don't want to know if it's using a hash table or a balanced tree. Just
> that it implements a set.
It also doesn't include set operations like you have. (union,
intersection, difference, symmetric difference). And I agree with you
that it looks like it exposes too much of its innards for most users.
> Though I'm sure that they put more work into it than I and that it's
> more efficient. I just sneakily used an associative array on the inside.
> Which uses its own hash table, if I'm not mistaken.
It looks to be based on code by Doug Lea, who wrote dlmalloc, one of the
fastest implementations of malloc around. So I suspect performance is
pretty good.
Another thing you might want to add to your set class is opCat (~) and
opCatAssign (~=). Those are pretty commonly used in D for appending
something to something else. They'd just be synonyms for add and union.
--bb
More information about the Digitalmars-d
mailing list