Hash table element existence check

Sun Sep 4 16:25:54 PDT 2016

On Saturday, 3 September 2016 at 14:07:27 UTC, Cauterite wrote:
> On Saturday, 3 September 2016 at 12:33:26 UTC, Illuminati wrote:
>> On Saturday, 3 September 2016 at 07:44:28 UTC, Cauterite wrote:
>>> On Friday, 2 September 2016 at 19:38:34 UTC, Illuminati wrote:
>>>> I am trying to create a hash table and would like an 
>>>> efficient way to be able to know if an element exists to 
>>>> test for collisions.
>>>
>>> Just do a regular lookup on the hash? It's an O(1) operation, 
>>> like 4 instructions.
>>
>> Huh? One can look up fine, but how does one know if the result 
>> is valid or not?
>
> Okay I think I misinterpreted the question. I believe when I 
> did this my solution was to have an additional template 
> parameter specifying null key values, so the template was like 
> this:
>
> struct HashTbl(
> 	K, /* key type */
> 	V, /* value type */
> 	K NullKey, /* key placeholder value */
> 	alias hash_key, /* hash function */
> 	alias keys_eq /* equality function */
> ) {...
>
> I guess this doesn't really solve the problem, but makes it the 
> user's problem.

Yeah, and there might not be any "null" keys. If there is, then 
it is easy, But, say the key is an int... what is "null"? While 
it does let the user define what null means, it assumes that it 
is possible to do so and that might not always be the case.

I suppose one could then switch to pointers and waste a bit of 
space if necessary. The main point of the question was if there 
were any tricks that avoided this problem. E.g., maybe the hash 
function itself could be used to determine if the key exists. Not 
sure how but who knows?

>> I could keep a bitarray, but wasting around 12% space.
>
> That assumes your (K,V) tuples are 1 byte total, right?

Yeah. Larger values will reduce the size to some degree. Really, 
it is simply N/8 wasted space. For large N it becomes a problem. 
Maybe not a huge deal since virtually most of the time N < 1M, 
and it would be around 125k bytes for such a large array. Large 
arrays could probably be handled differently if memory 
constraints were such an issue.