Best interface for memcpy() (and the string.h family of functions)

Wed May 29 19:35:36 UTC 2019

On Wednesday, 29 May 2019 at 19:06:43 UTC, Stefanos Baziotis 
wrote:
> On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler 
> wrote:
>>
>> You didn't answer the question.
>>
>
> I don't know how "benchmarks" does not answer a question. For 
> me, it's
> the most important answer.

Yes that would be an answer, I guess I got confused when you 
mentioned CTFE and introspection, I wasn't sure if "benchmarks" 
was referring to those features or to runtime benchmarks.  And 
looks like @Mike posted the benchmarks on that github link you 
sent.

>
>> How would inlining the implementation of memcpy be faster? The 
>> implementation of memcpy doesn't need to know which types it 
>> is copying, so every call to it can have the exact same 
>> implementation.  You only need one instance of the 
>> implementation.  This means you can fine-tune it, many libc 
>> implementations will implement it in assembly because it's 
>> used so often and again, it doesn't need to know what types it 
>> is copying.  All it needs is 2 pointers a size.  That's why in 
>> D, you should only create wrappers that ensure type-safety and 
>> bounds checking and then forward to the real implementation, 
>> and those wrappers should be inlined but not the memcpy 
>> implementation itself.
>>
>> If you want to provide you own implementation of memcpy you 
>> can, but inlining your implementation into every call, when 
>> the implementation is truly type agnostic just results in code 
>> bloat with no benefit.
>
> It is typed currently, with benefits. It's not the same for 
> every type and our
> idea is not to just forward the size. By inlining, you can get 
> quite better
> performance exactly because you inline and you don't just 
> forward the size and
> because you know info about the type.
> Check this: 
> https://github.com/JinShil/memcpyD/blob/master/memcpyd.d
> And preferably, run it and see the asm generated.
> Also, what should be considered is that types give you the info 
> about alignment
> and different implementations depending on this alignment.

It's true that if you can assume pointers are aligned on a 
particular boundary that you can be faster than memcpy which 
works with any alignment.  This must be what Mike is doing, 
though, I would then create only a few instances of memcpy that 
assume alignment on boundaries like 4, 8, 16.  And if you have a 
pointer or an array to a particular type, you can probably assume 
that pointer/array is aligned on that types's "alignof" property.

I think I will use this in my library.