hashed array?

Mon Nov 12 09:58:14 PST 2012

11/12/2012 9:44 PM, H. S. Teoh пишет:
> On Mon, Nov 12, 2012 at 05:52:32PM +0100, Joseph Rushton Wakeling wrote:
>> On 11/12/2012 05:27 PM, Dmitry Olshansky wrote:
>>>> The current "workaround" is to just dummy it.
>>>> alias bool Dummy;
>>>> Dummy[string] names;
>>>> names["Paul"] = true; //Ew.
>>>
>>> Make a wrapper?
>>
>> The problem with wrapped versions of associative arrays is that they
>> just don't scale with number of keys.  If I want to store (say) a
>> set of 3 million size_t's, then it costs a lot more than 3_000_000 *
>> size_t.sizeof() to do so this way.
>>
>> Do dedicated set implementations do better than this?
>
> Probably.
>
> For storing integer types, you could optimize for space by dividing your
> key space into blocks of, say, 32 entries each. Then your hash function
> will hash only the upper (n-5) bits of the key, and each hash slot will
> be a bit array of 32 bits, which are indexed by the lower 5 bits of the
> key. Assuming your keys are relatively densely populated, this will save
> quite a significant amount of space. In the sparse case, you probably
> don't waste that much space either, since each slot is only one 32-bit
> int wide.

Problem is that keys will collide so each slot will have to have upper 
bits of key *and* the small set. The bucket itself is still inside 
intrusive linked list.

Otherwise the idea is great. About 3 words of overhead per word-sized set.

>
> You can probably also increase the size of the slots to, say, 64 bits or
> more, if you want to maximize the usage to GC allocation block overhead
> ratio. Yes, each slot is only 1 int wide, but the GC does have to
> maintain bookkeeping information that's probably a lot more than that.

16 bytes. Though you'd have to know how bucket entry looks like which is 
impossible with built-in hash map sadly.

> Allocating a large number of very small blocks is generally worse than a
> smaller number of medium-sized blocks.
>
[snip other good points]

-- 
Dmitry Olshansky