DMD 1.034 and 2.018 releases

Wed Aug 13 17:45:49 PDT 2008

"Don" <nospam at nospam.com.au> wrote in message 
news:g7u36h$20j0$1 at digitalmars.com...
> Georg Lukas wrote:
>> On Mon, 11 Aug 2008 09:55:26 -0400, Pete wrote:
>>> Walter Bright Wrote:
>>>> This one has (finally) got array operations implemented. For those who
>>>> want to show off their leet assembler skills, the initial assembler
>>>> implementation code is in phobos/internal/array*.d. Burton Radons wrote
>>>> the assembler. Can you make it faster?
>>> Not sure if someone else has already mentioned this but would it be
>>> possible for the compiler to align these arrays on 16 byte boundaries in
>>> order to maximise any possible vector efficiency. AFAIK you can't
>>> actually specify align anything higher than align 8 at the moment which
>>> is a bit of a problem.
>>
>> From a short look at the array*.d source code, it would be better to 
>> check if source and destination have the same alignment, i.e.:
>>
>> a = 0xf00d0013 (3 mod 16)
>> b = 0xdeaffff3 (3 mod 16)
>>
>> In that case, the first 16-3 = 13 bytes can be handled using regular D 
>> code, and the aligned SSE version can be used for the rest.

Good idea. Right now in that code there is (usually) a case for both 
un/aligned.

It typically goes like this:

if(cpu_has_sse2 && a.length > min_size)
{
    if(((cast(size_t) aptr | cast(size_t)bptr | cast(size_t)cptr) & 15) != 
0)
    {    // Unaligned case
    asm
    {
    ...
    movdqu  XMM0, [EAX]
    ...
    }
    }
    else
    {    // Aligned case
    asm
    {
    ...
    movdqa  XMM0, [EAX]
    ...
    }
    }
}

The two blocks of asm code is basically identical except for the un/aligned 
SSE opcodes.

With your idea, one could get rid of the test for alignment, probably some 
bloat and a whole lot of duplication. I guess the question would be if the 
overhead of your idea would be less than the current design.

- Dave

>>
>> This would also work for slices, at least when both slices have the same 
>> alignment remainder. I'm just not sure what overhead such a solution 
>> would impose for small arrays.
>
> Just begin with a check for minimal size. If less than that size, don't 
> use SSE at all.
>
>>
>> Georg