Performance problem in reverse algorithm
Fool via digitalmars-d-ldc
digitalmars-d-ldc at puremagic.com
Sun Aug 24 09:45:10 PDT 2014
Below there is a snippet of D code followed by a close C++
translation.
clang++ seems to have some problems optimizing the C++ version if
my_reverse is not attributed "noinline". Unfortunately, I was not
able to convince ldc to produce a fast executable. g++ and gdc
both produce good results; annotation is not required here.
The same effect is visible in the D version if one replaces
my_reverse with reverse from std.algorithm. Can someone figure
out what the problem is?
clang++ / inline: 5.7s
ldc / inline: 5.7s
clang++ / noinline: 1.4s
ldc / noinline: 5.8s
g++: 1.2s
gdc: 1.2s
Test environment: Arch Linux x86_64
clang version 3.4.2
LDC - the LLVM D compiler (0.14.0)
g++ (GCC) 4.9.1
gdc (GCC) 4.9.1
clang++ -std=c++11 -march=native -O3 -DNOINLINE
ldc2 -O3 -mcpu=native -release -disable-boundscheck
g++ -std=c++11 -march=native -O3
gdc -O3 -march=native -frelease -fno-bounds-check
Moreover, if I do not specify '-disable-boundscheck' ldc2
produces the following error message:
/usr/lib/liblphobos2.a(curl.o): In function
`_D3std3net4curl4HTTP18_sharedStaticCtor1FZv':
(.text._D3std3net4curl4HTTP18_sharedStaticCtor1FZv+0x10):
undefined reference to `curl_version_info'
/usr/lib/liblphobos2.a(curl.o): In function
`_D3std3net4curl4Curl18_sharedStaticDtor3FZv':
(.text._D3std3net4curl4Curl18_sharedStaticDtor3FZv+0x1):
undefined reference to `curl_global_cleanup'
/usr/lib/liblphobos2.a(curl.o): In function
`_D3std3net4curl13__shared_ctorZ':
(.text._D3std3net4curl13__shared_ctorZ+0x10): undefined reference
to `curl_global_init'
collect2: error: ld returned 1 exit status
Error: /usr/bin/gcc failed with status: 1
////
// reverse.d
import std.algorithm, std.range;
import ldc.attribute;
@attribute("noinline")
void my_reverse(int* b, int* e)
{
auto steps = (e - b) / 2;
if (steps) {
auto l = b;
auto r = e - 1;
do {
swap(*l, *r);
++l;
--r;
} while (--steps);
}
}
void main(string[] args)
{
immutable N = 2000;
immutable K = 10000;
auto a = iota(N).array;
for (auto n = 0; n <= N; ++n) {
for (auto k = 0; k <= K; ++k) {
my_reverse(&a[0], &a[0] + n);
}
}
}
////
// reverse.cpp
#include <numeric>
#include <vector>
#ifdef NOINLINE
__inline__ __attribute__((noinline))
#endif
void my_reverse(int* b, int* e)
{
auto steps = (e - b) / 2;
if (steps) {
auto l = b;
auto r = e - 1;
do {
std::swap(*l, *r);
++l;
--r;
} while (--steps);
}
}
int main()
{
const auto N = 2000;
const auto K = 10000;
std::vector<int> a(N);
auto b = std::begin(a);
auto e = std::end(a);
std::iota(b, e, 0);
for (auto n = 0; n <= N; ++n) {
for (auto k = 0; k <= K; ++k) {
my_reverse(&a[0], &a[0] + n);
}
}
}
More information about the digitalmars-d-ldc
mailing list