gdc internals: weird way to translate "while" loops makes autovectorization impossible

downs default_357-line at yahoo.de
Tue Feb 6 07:25:34 PST 2007


At the moment, gdc seems to be incapable of vectorizing simple loops.
Example.
The following C program will, if compiled with gcc -O3 -ftree-vectorize -msse, generate SSE vectorized assembler code.
This can be verified by compiling with -ftree-vectorizer-verbose=7.
> #include "stdlib.h"
> 
> int main() {
>  float*ptr=malloc(sizeof(float)*8);
>  float *tptr=ptr+8;
>  while (ptr!=tptr) { *ptr=1.0; ptr++; } 
>  return 0;
> }
> 

I translated the program to D. Result:

> import std.c.stdlib : malloc;
> 
> int main() {
>  float*ptr=cast(float*)malloc(float.sizeof*8);
>  float *tptr=ptr+8;
>  while (ptr!=tptr) { *ptr=1.0; ptr++; } 
>  return 0;
> }
> 

Using the same flags, gdc doesn't vectorize the loop.
I took the gdc vectorization process apart, added a few fprintf statements, and got the following log:
test.d.t77.vect

> ;; Function main (_Dmain)
> 
> 
> test.d:6: note: ===== analyze_loop_nest =====
> test.d:6: note: === vect_analyze_loop_form ===
> test.d:6: note: not vectorized: too many BBs in loop.
> > I added that below.
> dropping additional loop info ... 
> ;;
> ;; Loop 1:
> ;;  header 4, latch 3
> ;;  depth 1, level 1, outer 0
> ;;  nodes: 4 3 2
> dropping block info ... 
>  Block 4
> ;; basic block 4, loop depth 1, count 0
> ;; prev block 3, next block 5
> ;; pred:       3 [100.0%]  (fallthru,exec) 1 [100.0%]  (fallthru,exec)
> ;; succ:       2 [100.0%]  (fallthru,exec)
> # ivtmp.27_1 = PHI <ivtmp.27_10(3), 8(1)>;
> # TMT.5_18 = PHI <TMT.5_13(3), TMT.5_12(1)>;
> # ptr_17 = PHI <ptr_9(3), ptr_3(1)>;
> <L2>:;
> #   TMT.5_13 = V_MAY_DEF <TMT.5_18>;
> *ptr_17 = 1.0e+0;
> ptr_9 = ptr_17 + 4;
> goto <bb 2> (<L1>);
> 
>  Block 3
> ;; basic block 3, loop depth 1, count 0
> ;; prev block 2, next block 4
> ;; pred:       2 [88.9%]  (dfs_back,false,exec)
> ;; succ:       4 [100.0%]  (fallthru,exec)
> <L11>:;
> 
>  Block 2
> ;; basic block 2, loop depth 1, count 0
> ;; prev block 1, next block 3
> ;; pred:       4 [100.0%]  (fallthru,exec)
> ;; succ:       5 [11.1%]  (loop_exit,true,exec) 3 [88.9%]  (dfs_back,false,exec)
> <L1>:;
> ivtmp.27_10 = ivtmp.27_1 - 1;
> if (ivtmp.27_10 == 0) goto <L3>; else goto <L11>;
> 
> Done
> > From here on out, it's normal gdc output.
> 
> test.d:6: note: bad loop form.
> test.d:6: note: vectorized 0 loops in function.
> main ()
> {
>   uintD.1160 ivtmp.27D.1244;
>   floatD.1163 * ptrD.1186;
>   floatD.1163 * tptrD.1189;
>   intD.1177 D.1196;
>   boolD.1173 D.1195;
>   boolD.1173 D.1194;
>   voidD.1154 * D.1191;
> 
>   # BLOCK 0 freq:1111, starting at line 4
>   # PRED: ENTRY
>   [test.d : 4] D.1191_2 = [test.d : 4] malloc (32);
>   [test.d : 4] ptrD.1186_3 = (floatD.1163 *) D.1191_2;
>   [test.d : 5] tptrD.1189_4 = ptrD.1186_3 + 32;
>   [test.d : 6] if (ptrD.1186_3 == tptrD.1189_4) [test.d : 6] goto <L3>; else goto <L9>;
>   # SUCC: 5 1
> 
>   # BLOCK 1 freq:988
>   # PRED: 0
> <L9>:;
>   goto <bb 4> (<L2>);
>   # SUCC: 4
> 
>   # BLOCK 2 freq:8889, starting at line 6
>   # PRED: 4
> [test.d : 6] <L1>:;
>   ivtmp.27D.1244_10 = ivtmp.27D.1244_1 - 1;
>   [test.d : 6] if (ivtmp.27D.1244_10 == 0) [test.d : 6] goto <L3>; else goto <L11>;
>   # SUCC: 5 3
> 
>   # BLOCK 3 freq:7901
>   # PRED: 2
> <L11>:;
>   # SUCC: 4
> 
>   # BLOCK 4 freq:8889, starting at line 6
>   # PRED: 3 1
>   # ivtmp.27D.1244_1 = PHI <ivtmp.27D.1244_10(3), 8(1)>;
>   # ptrD.1186_17 = PHI <ptrD.1186_9(3), ptrD.1186_3(1)>;
> <L2>:;
>   [test.d : 6] *ptrD.1186_17 = 1.0e+0;
>   [test.d : 6] ptrD.1186_9 = ptrD.1186_17 + 4;
>   [test.d : 6] goto <bb 2> (<L1>);
>   # SUCC: 2
> 
>   # BLOCK 5 freq:1111
>   # PRED: 2 0
> <L3>:;
>   return 0;
>   # SUCC: EXIT
> 
> }

Note the block 3, which only consists of a label and nothing more. Apparently, having three BBs instead of two blocks gdc from optimizing the loop.
See also, tree-vect-analyze.c:1861.
I don't know enough about gdc to see where that empty block is generated, but could we get rid of it, possibly, please?
Greetings .. downs



More information about the D.gnu mailing list