overlapping copy semantics question
Bruce Carneal
bcarneal at gmail.com
Mon May 20 23:09:35 UTC 2024
On Monday, 20 May 2024 at 18:14:50 UTC, kinke wrote:
> I think the problem here is that you don't get the expected
> optimization to a memcpy (with -O3) when using the `@restrict`
> UDA, with the variant taking a D slice. So no correctness issue.
>
> This boils down to the expected memcpy, apparently needing
> unpacking of D slices:
> ```
> void cpr2(size_t srcLength, ubyte* src, @restrict ubyte* dst)
> {
> foreach (i; 0 .. srcLength)
> dst[i] = src[i];
> }
> ```
I don't view that missed optimization as much of a problem,
although I will note that gdc decided to issue a call to memmove
for the @restrict slice code under -O3. The LDC cpr2 call out to
memcpy for the lowered/non-slice variant seems entirely justified
given @restrict.
What seems like a problem is emitting SIMD code for the vanilla
(no attributes) cp() variant that doesn't produce the same result
as a simple scalar loop would. Consider:
values at location x: 0, 1, 2, 3, 4, ...
src at location x
dst at location x + 1
Shouldn't the vanilla scalar copy loop for the above just result
in a bunch of zeros? This is what I'd expect if a dead simple
loop body were generated. If, on the other hand, you emit SIMD
code for the loads and stores, as LDC is want to do, you get
something different.
What am I missing?
More information about the digitalmars-d-ldc
mailing list