[Issue 21027] New: Backend: DMD use 'rep stosb' even for ulong arrays
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Wed Jul 8 02:41:59 UTC 2020
https://issues.dlang.org/show_bug.cgi?id=21027
Issue ID: 21027
Summary: Backend: DMD use 'rep stosb' even for ulong arrays
Product: D
Version: D2
Hardware: x86_64
OS: Linux
Status: NEW
Keywords: performance
Severity: normal
Priority: P1
Component: dmd
Assignee: nobody at puremagic.com
Reporter: pro.mathias.lang at gmail.com
Take the following code:
```
alias Content = ulong[256];
void main ()
{
Content v;
}
```
What DMD generates for this is on Linux c86_64 (used `run.dlang.org`):
```
.text._Dmain segment
assume CS:.text._Dmain
_Dmain:
push RBP
mov RBP,RSP
sub RSP,0808h
mov ECX,0800h
mov qword ptr -8[RBP],0
lea RAX,-8[RBP]
mov AL,[RAX]
lea RDI,0FFFFF7F8h[RBP]
rep
stosb
xor EAX,EAX
leave
ret
add [RAX],AL
.text._Dmain ends
```
The best to do here would be to call `memset` or `memcpy`, which is what LDC
does.
The second best would be to use `rep stosd` 0x100 times, as it is faster than
`rep stosb` 0x800 times.
Source:
- Agner Fog, optimizing assembly
(https://www.agner.org/optimize/optimizing_assembly.pdf), 16.9 Strings
instructions (all processors):
> `REP MOVSD` and `REP STOSD` are quite fast if the repeat count is not too small. The largest word size (DWORD in 32-bit mode, QWORD in 64-bit mode) is preferred. Both source and destination should be aligned by the word size or better. In many cases, however, it is faster to use vector registers. Moving data in the largest available registers is faster than `REP MOVSD` and `REP STOSD` in most cases, especially on older processors. See page 150 for details.
Related: https://issues.dlang.org/show_bug.cgi?id=14458
--
More information about the Digitalmars-d-bugs
mailing list