About built-in AAs

Wed Aug 17 08:59:03 PDT 2011

On Wed, 17 Aug 2011 11:47:59 -0400, Jonathan M Davis <jmdavisProg at gmx.com>  
wrote:

> On Wednesday, August 17, 2011 10:04:21 Steven Schveighoffer wrote:
>> On Wed, 17 Aug 2011 00:15:38 -0400, Andrei Alexandrescu
>>
>> <SeeWebsiteForEmail at erdani.org> wrote:
>> > On 8/16/11 9:29 PM, bearophile wrote:
>> >> Walter Bright:
>> >>>> I think there are search trees like the Red-Black ones that
>> >>>> guarantee
>> >>>> a O(n ln n) worst case. I am wrong?
>> >>>
>> >>> Just feed it more data.
>> >>
>> >> If you feed it more data, even if all items pruce collision because
>> >> they all hash to the same bucket, if you use Red-Black trees to  
>> handle
>> >> the items in the same bucket you keep having a O(n ln n) behaviour,
>> >> that's usually fast enough. With Python and the new D AAs you instead
>> >> get a quadratic one. This quadratic behaviour gives troubles way
>> >> before
>> >> the physical RAM is exhausted.
>> >>
>> >> Bye,
>> >> bearophile
>> >
>> > Let's please stop this. Many of us, including yourself, noticed the
>> > relatively poor performance of D's previous hashtables compared to  
>> other
>> > languages. Switching to singly-list collision handling marked an
>> > improvement. Now a lot of data structure designs have a worst-case  
>> that
>> > makes them perform worse than others. If you worry about attacks,  
>> please
>> > implement your own hashtable. If we switch back to the old
>> > implementation, you'll complain again about D's hashtables being  
>> slower
>> > than Python's, thus closing a years-long cycle.
>>
>> Yes, but let's not forget the one valid request out of all of this -- if
>> trees are no longer being used, opEquals should be used insted of opCmp.
>> This allows more possible key types (which don't define an ordering). I
>> think this would be a simple druntime change.
>
> But then we can't change the hash table type to one that needs opCmp if  
> we
> need to later. That might be acceptable, but it makes it so that we can't
> transparently change the implementation again if we decide that we need  
> to.

I think that's a choice we should embrace.  AFAIK, no *builtin* hash  
implementations use trees for buckets in any language I'm aware of (I'm  
sure someone will find one though :).  The precedent is to require opHash  
and opEquals, not opCmp.  It just makes more sense for builtin hash tables  
to allow the most possible key types it can.

Also, currently, if opCmp doesn't exist the *COMPILER MAKES ONE UP*, which  
is totally unacceptable.

So if you define opEquals and not opCmp, as bearophile points out, your  
specifically defined opEquals is not even used, and some made-up  
approximation is used instead!

It's one thing to make up opEquals, that is pretty easy to get reasonably  
right.  It's something entirely different to invent an opCmp, especially  
for types which have no ordering!

-Steve