core.traits?

Wed Jan 9 12:31:13 UTC 2019

On Wednesday, 9 January 2019 at 11:49:40 UTC, Mike Franklin wrote:
> On Wednesday, 9 January 2019 at 11:01:46 UTC, Jacob Carlborg 
> wrote:
>> On 2019-01-09 03:32, Mike Franklin wrote:
>>
>>> Here's what I think will help:
>>> 1.  Get `alloca` or dynamic stack array allocation working.  
>>> This will help a lot because we won't have to reach for 
>>> `malloc` and friends for simple allocations like generating 
>>> dynamic assert messages
>>
>> What's the problem with "alloca"?
>
> In DMD you can't use it without linking in the runtime, but in 
> LDC and GDC, you can.  One of the goals of implementing these 
> runtime hooks as templates is to make more features available 
> in -betterC builds, or for pay-as-you-go runtime 
> implementations.  If you need to link in druntime to get 
> `alloca`, you can't implement the runtime hooks as templates 
> and have them work in -betterC.
>
>>> 2.  Convert `memcpy`, `memset`, and `memcmp` to 
>>> strongly-typed D templates so they can be used in the 
>>> implementations when converting runtime hooks to templates.  
>>> I did some exploration on that and published my results at 
>>> https://github.com/JinShil/memcpyD.  Unfortunately, DMD is 
>>> missing an AVX512 implementation so I couldn't continue.
>>
>> What do you mean "couldn't continue"? It's possible to 
>> implement "memcpy" without AVX512. Am I missing something?
>
> Yes, it's possible, but I don't think it will ever be accepted 
> if it doesn't perform at least as well as the optimized 
> versions in C or assembly that use AVX512 or other SIMD 
> features.  It needs to be at least as good as what libc 
> provides, so we need to be able to leverage these unique 
> hardware features to get the best performance.

AVX512 concerns only a very small part of processors on the 
market (Skylake, Canon Lake and Cascade Lake). AMD will never 
implement it and the number of people upgrading to one of the 
lake cpus from some recent chip is also not that great.
I don't see why not having it implemented yet is blocking 
anything. People who really need AVX512 performance will have 
implemented memcpy themselves already and for the others, they 
will have to wait a little bit. It's not as if it couldn't be 
added later. I really don't understand the problem.
This said, another issue with memcpy that very often gets lost is 
that, because of the fancy benchmarking, its system performance 
cost is often wrongly assessed, and a lot of heroic efforts are 
put in optimizing big block transfers, while in reality it's 
mostly called on small (postblit) to medium blocks. Linus 
Torvalds had once a rant on that subject on realworldtech.
https://www.realworldtech.com/forum/?threadid=168200&curpostid=168589