[Issue 12176] New: Possible std.algorithm.sum optimization for short fixed-size arrays
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Sat Feb 15 03:11:43 PST 2014
https://d.puremagic.com/issues/show_bug.cgi?id=12176
Summary: Possible std.algorithm.sum optimization for short
fixed-size arrays
Product: D
Version: D2
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Phobos
AssignedTo: nobody at puremagic.com
ReportedBy: bearophile_hugs at eml.cc
--- Comment #0 from bearophile_hugs at eml.cc 2014-02-15 03:11:41 PST ---
I suggest to modify std.algorithm.sum (to add an overload) to let it accept
fixed-sized arrays too, and then add a specialization for short fixed-sized
arrays like:
int[3] a = [10, 20, 30];
immutable tot = a.sum;
This expands the places where you are willing to use sum() because with such
specialization a normal D compiler is able to produce inlined loop-free asm
code similar to writing down a manual sum of the array items:
immutable tot = a[0] + a[1] + a[2];
Generic library code should not be slower than built-in features, where
possible, and short fixed-sized arrays are not uncommon in high performance
code (and currently DMD doesn't inline code containing a loop. And sometimes
even with LDC2 loops are not unrolled if their size is a run-time value. To
unroll them it should inline the function and then see that the run-time length
is actually a compile-time one).
If you know at compile-time the length of an array, this is precious
information, and it's bad to just throw such information away, especially when
such length is small (when the length is large it makes no difference).
This specialization should be present only for small arrays (like Length <= 5).
For the other cases and to reduce template bloat the sum() specialized on
fixed-sized arrays should just call (with a "static if") the normal sum for
dynamic arrays with a slicing:
auto sum(size_t N)(T[N] arr) {
static if (N < 6) {
// short-case implementation here ...
} else {
return sum(arr[]);
}
}
Using the sum() code from std.algorithm, this main() gets compiled by the
latest ldc2:
int main() {
int[3] a = [10, 20, 30];
return cast(int)a[].sum;
}
As (-O -release -inline -noboundscheck -output-s):
__Dmain:
pushl %ebp
pushl %ebx
pushl %edi
pushl %esi
subl $28, %esp
movl $10, 16(%esp)
movl $20, 20(%esp)
movl $30, 24(%esp)
leal 16(%esp), %edi
movl %edi, 4(%esp)
movl $3, (%esp)
xorl %esi, %esi
calll __D3std5array12__T5emptyTiZ5emptyFNaNbNdNfxAiZb
subl $8, %esp
testb $1, %al
jne LBB0_3
movl $3, %eax
movl %edi, %ecx
leal 20(%esp), %ebx
xorl %esi, %esi
movl $2, %ebp
xorl %edi, %edi
.align 16, 0x90
LBB0_2:
movl %ecx, 4(%esp)
movl %eax, (%esp)
calll __D3std5array12__T5frontTiZ5frontFNaNbNcNdNfAiZi
subl $8, %esp
movl (%eax), %eax
movl %eax, %ecx
sarl $31, %ecx
addl %eax, %esi
adcl %ecx, %edi
movl %ebx, 8(%esp)
movl %ebp, 12(%esp)
movl %ebx, 4(%esp)
movl %ebp, (%esp)
calll __D3std5array12__T5emptyTiZ5emptyFNaNbNdNfxAiZb
subl $8, %esp
movl 8(%esp), %ecx
decl %ebp
addl $4, %ebx
testb $1, %al
movl 12(%esp), %eax
je LBB0_2
LBB0_3:
movl %esi, %eax
addl $28, %esp
popl %esi
popl %edi
popl %ebx
popl %ebp
ret
The idea proposed here is just an optimization, so in theory a sufficiently
smart compiler should not need it.
--
Configure issuemail: https://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list