Best interface for memcpy() (and the string.h family of functions)
Jonathan Marler
johnnymarler at gmail.com
Wed May 29 19:35:36 UTC 2019
On Wednesday, 29 May 2019 at 19:06:43 UTC, Stefanos Baziotis
wrote:
> On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler
> wrote:
>>
>> You didn't answer the question.
>>
>
> I don't know how "benchmarks" does not answer a question. For
> me, it's
> the most important answer.
Yes that would be an answer, I guess I got confused when you
mentioned CTFE and introspection, I wasn't sure if "benchmarks"
was referring to those features or to runtime benchmarks. And
looks like @Mike posted the benchmarks on that github link you
sent.
>
>> How would inlining the implementation of memcpy be faster? The
>> implementation of memcpy doesn't need to know which types it
>> is copying, so every call to it can have the exact same
>> implementation. You only need one instance of the
>> implementation. This means you can fine-tune it, many libc
>> implementations will implement it in assembly because it's
>> used so often and again, it doesn't need to know what types it
>> is copying. All it needs is 2 pointers a size. That's why in
>> D, you should only create wrappers that ensure type-safety and
>> bounds checking and then forward to the real implementation,
>> and those wrappers should be inlined but not the memcpy
>> implementation itself.
>>
>> If you want to provide you own implementation of memcpy you
>> can, but inlining your implementation into every call, when
>> the implementation is truly type agnostic just results in code
>> bloat with no benefit.
>
> It is typed currently, with benefits. It's not the same for
> every type and our
> idea is not to just forward the size. By inlining, you can get
> quite better
> performance exactly because you inline and you don't just
> forward the size and
> because you know info about the type.
> Check this:
> https://github.com/JinShil/memcpyD/blob/master/memcpyd.d
> And preferably, run it and see the asm generated.
> Also, what should be considered is that types give you the info
> about alignment
> and different implementations depending on this alignment.
It's true that if you can assume pointers are aligned on a
particular boundary that you can be faster than memcpy which
works with any alignment. This must be what Mike is doing,
though, I would then create only a few instances of memcpy that
assume alignment on boundaries like 4, 8, 16. And if you have a
pointer or an array to a particular type, you can probably assume
that pointer/array is aligned on that types's "alignof" property.
I think I will use this in my library.
More information about the Digitalmars-d
mailing list