DConf 2013 Day 3 Talk 5: Effective SIMD for modern architectures by Manu Evans

bearophile bearophileHUGS at lycos.com
Thu Jun 20 04:58:05 PDT 2013


Andrei Alexandrescu:

> http://youtube.com/watch?v=q_39RnxtkgM

Very nice.

- - - - - - - - - - - - - - - - - - -

Slide 3:

> In practise, say we have iterative code like this:
> 
> int data[100];
> 
> for(int i = 0; i < data.length; ++i) {
>   data[i] += 10; }

For code like that in D we have vector ops:

int[100] data;
data[] += 10;


Regarding vector ops: currently they are written with handwritten 
asm that uses SIMD where possible. Once std.simd is in good shape 
I think the array ops can be rewritten (and completed in their 
missing parts) using a higher level style of coding.

- - - - - - - - - - - - - - - - - - -

Slide 22:

> Comparisons:
> Full suite of comparisons Can produce bit-masks, or boolean 
> 'any'/'all' logic.

Maybe a little of compiler support (for the syntax) will help 
here.

- - - - - - - - - - - - - - - - - - -

Slide 26:

> Always pass vectors by value.

Unfortunately it seems a bad idea to give a warning if you pass 
one of those by reference.

- - - - - - - - - - - - - - - - - - -

Slide 27:

> 3. Use ‘leaf’ functions where possible.

I am not sure how much good it is to enforce leaf functions with 
a @leaf annotation.

- - - - - - - - - - - - - - - - - - -

Slide 32:

> Experiment with prefetching?

Are D intrinsics offering instructions to perform prefetching?

- - - - - - - - - - - - - - - - - - -

LDC2 is supports SIMD on Windows32 too.

So for this code:


void main() {
     alias double2 = __vector(double[2]);
     auto a = new double[200];
     auto b = cast(double2[])a;
     double2 tens = [10.0, 10.0];
     b[] += tens;
}


LDC2 compiles it to:

	movl	$200, 4(%esp)
	movl	$__D11TypeInfo_Ad6__initZ, (%esp)
	calll	__d_newarrayiT
	movl	%edx, %esi
	movl	%eax, (%esp)
	movl	$16, 8(%esp)
	movl	$8, 4(%esp)
	calll	__d_array_cast_len
	testl	%eax, %eax
	je	LBB0_3
	movapd	LCPI0_0, %xmm0
	.align	16, 0x90
LBB0_2:
	movapd	(%esi), %xmm1
	addpd	%xmm0, %xmm1
	movapd	%xmm1, (%esi)
	addl	$16, %esi
	decl	%eax
	jne	LBB0_2
LBB0_3:
	xorl	%eax, %eax
	addl	$12, %esp
	popl	%esi
	ret


It uses addpd that works with two doubles at the same time.

- - - - - - - - - - - - - - - - - - -

The Reddit thread contains a link to this page, a compiler for a 
C variant from Intel that's optimized for SIMD:
http://ispc.github.io/

Some of the syntax of ispc:

- - - - - -

The first of these statements is cif, indicating an if statement 
that is expected to be coherent. The usage of cif in code is just 
the same as if:

cif (x < y) {
     ...
} else {
     ...
}

cif provides a hint to the compiler that you expect that most of 
the executing SPMD programs will all have the same result for the 
if condition.

Along similar lines, cfor, cdo, and cwhile check to see if all 
program instances are running at the start of each loop 
iteration; if so, they can run a specialized code path that has 
been optimized for the "all on" execution mask case.

- - - - - -

foreach_tiled(y = y0 ... y1, x = 0 ... w,
               u = 0 ... nsubsamples, v = 0 ... nsubsamples) {
     float du = (float)u * invSamples, dv = (float)v * invSamples;

- - - - - -

I'll take a better look at ispc.

Bye,
bearophile


More information about the Digitalmars-d-announce mailing list