Dynamic array comparison optimization: check for equal pointers?
Johan
j at j.nl
Tue Jun 2 22:01:20 UTC 2020
On Tuesday, 2 June 2020 at 20:54:39 UTC, Stanislav Blinov wrote:
> On Tuesday, 2 June 2020 at 19:38:04 UTC, Johan wrote:
>
>> I was expecting something like:
>> ```
>> if ( sizes are not equal )
>> return false;
>> if ( pointers are equal )
>> return true;
>> return memcmp;
>> ```
>
> That's, worst case, two branches and a scan, and best case one
> branch which may or may not be slower than the actual scan.
The "actual scan" is a function call, which will be super slow
(note that in druntime's implementation it cannot be optimized to
a tail call) compared to the comparison and early return.
>> I am surprised that memcmp itself does not implement the
>> pointers are equal check. Is there a big impact of adding the
>> shortcircuit check to prevent scanning memory?
>
> There can't be a definitive answer for this, it's dependent on
> architecture as well as size of data. As far as array
> comparison goes, if a compiler (talking about ldc/gdc here) can
> infer pointer/length information, it'll be free to throw away
> the scan. In all other cases, I'd rather see the comparison
> function, well, do the comparison, and leave branches up to the
> user. It's much better than to inflict the branches on everyone
> unconditionally (pun intended): if your data shows that you can
> benefit from the short-circuit, you can always do just that.
> Short of rethinking the design, of course ;) I mean, I can't
> imagine a scenario where a program would have to routinely
> compare identical memory ranges. Doesn't mean there aren't such
> scenarios though.
You are assuming that an if-statement costs significant time, but
that doesn't have to be the case. The pointer values already must
be read from memory (otherwise you cannot do the memory scan),
and with speculative execution the comparison+branch only costs
very little (codegen could be coaxed to make if(false) be the
default execution path) and potentially saves many many memory
reads.
Could someone take the effort and check the performance
difference (for example the compiler itself) with this change?
Thanks!
Johan
More information about the Digitalmars-d
mailing list