hashed array?

Mon Nov 12 09:36:18 PST 2012

11/12/2012 8:52 PM, Joseph Rushton Wakeling пишет:
> On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
>>> The current "workaround" is to just dummy it.
>>> alias bool Dummy;
>>> Dummy[string] names;
>>> names["Paul"] = true; //Ew.
>>
>> Make a wrapper?
>
> The problem with wrapped versions of associative arrays is that they
> just don't scale with number of keys.  If I want to store (say) a set of
> 3 million size_t's, then it costs a lot more than 3_000_000 *
> size_t.sizeof() to do so this way.
>
> Do dedicated set implementations do better than this?

Yes, sure they can. Starting with e.g. a bitset.
If the distribution is sparce then inversion lists can do wonders. They 
work especially well if your distribution has contiguous intervals, 
searching would be logN though.

I use one for unicode character sets and it saves a LOT of space.

Otherwise if it's sparce but doesn't have contiguous interval it seems 
entirely doable to combine the two in a universal datastructure for 
integral sets. Say a sorted array of 32/64-element bitsets (depending on 
architecture):

//denotes a small bit set covering interval of [start..start+32/64)
struct node{
	size_t start;
	size_t bits;
};

Should be quite compact & fast.

>
> I should say that my own interests in an implementation of set are
> twofold -- first, efficient storage; and second, being able to do an
> efficient foreach across the stored values, without any concern for order.
>
>> Or rather call them sets and have them in the library.
>
> The whole builtin-vs.-library thing seems a big distraction -- if
> something needs to be implemented, the library seems the natural place
> unless there's a very good reason otherwise.

-- 
Dmitry Olshansky