Sets

Bill Baxter dnewsgroup at billbaxter.com
Mon Feb 5 17:08:51 PST 2007


Michiel wrote:
>> The lack of a set type and the use of void[KeyType] for sets has been
>> discussed a few times in the past, I think.  It makes sense to me too,
>> at first glance.  But I suspect it would require a whole different
>> codepath in the compiler to deal with because you can't have arrays of
>> voids etc. You really want a different data structure to store a set.

> 
> I think creating a whole different data structure just for the set isn't worth the
> trouble.

Maybe not for casual use, but if for some reason you are creating huge 
sets, you'll probably care whether it uses 1GB or 2GB of memory.  It's 
not a "whole different" data structure that's required, just a modified 
version of the AA data structure that doesn't store values.  But still 
that makes it a different data structure that will require different 
code in D.

I'm kinda just playing devil's advocate here.  I was actually one of the 
people who previously commented that void[KeyType] should just work. 
But if it's going into the core language it better be a slam dunk.

> For now, I've created the following simple code, which I will use for now. It
> still uses dummy values, but hides this behind a template and an array function:
> 
> ---------------------------------
> template Set(T) {
> 	alias bit[T] Set;
> }
> 
> void add(T)(inout bit[T] set, T element) {
> 	set[element] = 1;
> }
> ---------------------------------
> Set!(char[]) set;
> 
> set.add("one");
> set.add("two");
> 
> assert("one" in set);
> assert("two" in set);
> assert(!("fortytwo" in set));
> ---------------------------------

I think the type 'bit' is deprecated.  Anyway, it's just an alias for 
'bool' (see phobos/object.d) so you might as well use bool rather than a 
type that isn't even listed in the spec:
      http://www.digitalmars.com/d/type.html

> 
>> And as you mentioned it should have an 'add' method which would be
>> meaningless for ordinary AA's.
> 
> For ordinary AA's, another parameter could be passed for the value, which could
> default to the type's init value.

True.

> 
>> Following AA syntax could also lead to
>> annoying special cases for generic templates that would have to be
>> prepared for the V==void case:
>>     foo(K,V)(V[K] aa) { ...V val; // whoops V could be void! }
> 
> Isn't the same thing true for many situations? What if you wanted to compare a V
> object with an integer? The compiler would generate an error for any V for which
> no such comparison is defined. I would simply expect the compiler to complain in
> your example.

Yeh, despite what I said, you could also see sets having AA syntax as 
being a *good* thing for templates, since it would let any template that 
  only cares about keys to match both sets and AA's.

But really it's kind of a bogus argument anyway.  Templates should use 
duck typing wherever possible.  I.e. the template should be written such 
that it supports any object with the necessary interface, like .keys and 
.values methods or a[x] syntax.

> I'm not saying it's easy. But it should be worth it for the same reason that
> associative arrays are included into the core language.

One issue with sets as AAs is that really one typically thinks of sets 
as a collection of "values" not a collection of "keys".  If you make a 
set type from scratch, you'd be more likely to have a "set.values" 
member than a "set.keys" member.

Here's a thought - one way to deal with that would be for the 
implementation of void[KeyType] to just return the key itself for any 
attempts to access an existing key.  So myset["key"] would return "key" 
if key was in the array, or throw an exception if not.  That way you 
could also use ".values" on a void[KeyType] set instead of ".keys", and 
handle the issue of what  auto = foo in myset; should return.

It looks like a number of things in dmd/src/phobos/internal/aaA.d would 
have to be rewritten with special versions for handling void.  But it 
doesn't look like it would be a huge amount of code.  In particular the 
main data structure already seems to allocate nodes as a big void* with 
size of keysize+valuesize, so that should still be ok with valuesize==0. 
  Then you'd also need some conditionals added to the code that calls 
into the aaA.d code.

--bb



More information about the Digitalmars-d mailing list