khash associative array / hash map / hash set

ikod igor.khasilev at gmail.com
Mon Aug 24 21:11:46 UTC 2020


On Monday, 24 August 2020 at 20:31:42 UTC, James Blachly wrote:
> On Monday, 24 August 2020 at 05:51:59 UTC, ikod wrote:
>> On Monday, 24 August 2020 at 01:39:26 UTC, James Blachly wrote:
>>> OH, I almost forgot the best part. It is crazy fast.
>>>
>>> https://attractivechaos.wordpress.com/2018/01/13/revisiting-hash-table-performance/
>>> https://attractivechaos.wordpress.com/2019/12/28/deletion-from-hash-tables-without-tombstones/
>>>
>>> My naive benchmark shows -- compared to 
>>> emsi_containers.HashMap -- 30% faster inserts, 80% faster 
>>> serial retrieval and 70% faster random retrieval. Perhaps I 
>>> am doing something wrong?
>>>
>>> James
>>
>> Thanks, nice job!
>>
>> You also may want to compare performance with 
>> https://github.com/ikod/ikod-containers, just add dependency 
>> from ikod-containers, then:
>>
>> import ikod.containers;
>>
>> and use ikod.containers.hashmap.HashMap as alias for 
>> container. I squeezed everything I was able from the 
>> open-addressing hash map.
>
> Nice, thank you and great job! Performance looks very 
> comparable and I would be happy to use your package as well. 
> Perhaps it is time that Dlang have a faster canonical hashmap 
> (phobos.next ?)
>

Thanks, but no )
This hashmap can't replace standard AA for next reason:
with standard AA you can safely do:

string[int] aa;
aa[0] = "null";
auto v = 0 in aa;
aa.remove(0);
assert(*v == "null");
aa[0] = "one";
assert(*v == "null");

This is because AA allocate memory in GC area for every value it 
store(and return pointer to it when "in" used), so even if you 
delete key from AA it is still safe to use pointer to value. But 
this require GC allocations.

Correct me if I'm wrong - your and mine HashMaps avoid 
allocations and store values inline, so you can't use pointer to 
values in safe code (value can be deleted, or replaced on next 
table insertion/deletion). In order to be safe my hashmap do not 
support "in" operator and always return value.

Also you may find useful safe map modification during iteration 
over map items (at the cost of creating temporary table copy).

> ```
> hashmap benchmarks
> Inserts for HashMap finished in 518 milliseconds.
> Inserts for khash finished in 549 milliseconds.
> Serial Lookups for HashMap finished in 21 milliseconds.
> Random lookups for HashMap finished in 41 milliseconds.
> Confirming stored value of last lookup: 
> 7353ece9-506c-467f-9cb4-7686426fa828
> Serial Lookups for khash finished in 12 milliseconds.
> Random lookups for khash finished in 36 milliseconds.
> Confirming stored value of last lookup: 
> 1164a2f1-e6cb-4072-89d9-23cec5cadd95
> ```
>
> Repeated tests show that ikod.containers' HashMap is 
> consistently faster on insertions, while khash is consistently 
> faster on retrieval.




More information about the Digitalmars-d-announce mailing list