<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <div class="moz-text-flowed" style="font-family: -moz-fixed;

      font-size: 14px;" lang="x-western">On 02.04.2013 08:57, Walter

      Bright wrote:

      <br>

      <blockquote type="cite" style="color: #000000;">

        <br>

        On 4/1/2013 10:13 PM, Kai Nacke wrote:

        <br>

        <blockquote type="cite" style="color: #000000;">On 01.04.2013

          04:31, Walter Bright wrote:

          <br>

          <blockquote type="cite" style="color: #000000;">

            <br>

            On 3/31/2013 6:56 PM, Kai Nacke wrote:

            <br>

            <blockquote type="cite" style="color: #000000;">Hi!

              <br>

              <br>

              I try to write a generic vectorized version of SHA1.

              During that I noticed that only some operations are

              allowed on vector types.

              <br>

              <br>

              For the SHA1 algorithm I need to implement a 'rotate

              left'. I like to write something like this

              <br>

              <br>

                  uint4 w = ...;

              <br>

                  uint4 v = (w << 1) | (w >> 31);

              <br>

              <br>

              which is not allowed by DMD.

              <br>

              <br>

              Is this by design or simply not implemented because the

              backend is not capable of generating code for it? The TDPL

              says nothing about vector types. My understanding of the

              language reference on the web (<a

                class="moz-txt-link-freetext"

                href="http://dlang.org/simd.html">http://dlang.org/simd.html</a>)

              is that the supported operators are CPU architecture

              dependent.

              <br>

              <br>

              I really like to see more support for vector operations in

              the language, e.g. for shifting. What is the view of the

              language designers? Is the vector type a first class type

              or just an architecture (maybe vendor) dependent type with

              limited usability?

              <br>

              <br>

              Because LLVM treats the vector type as a first class type

              supporting more operators is easy with LDC. See my pull

              request for shift operators here: <a

                class="moz-txt-link-freetext"

                href="https://github.com/ldc-developers/ldc/pull/321">https://github.com/ldc-developers/ldc/pull/321</a>

              <br>

            </blockquote>

            <br>

            The idea is if a vector operation is not supported by the

            underlying hardware, then dmd won't allow it. It

            specifically does not generate "workaround" code like gcc

            does. The reason for this is the workaround code is

            terribly, terribly slow (because moving code between the ALU

            and the SIMD unit is awful), and having the compiler

            silently insert it leaves the programmer mystified why he is

            getting execrable performance.

            <br>

          </blockquote>

          <br>

          Shifting a vector left by a single scalar e.g. v << 2 is

          then a missing operation. It is supported by the PSLLW/D/Q

          instruction. Same for shifting right. This is good news for my

          implementation. <span class="moz-smiley-s1" title=":-)"></span>

          <br>

        </blockquote>

        <br>

        You can file a bugzilla for that one.

        <br>

      </blockquote>

      <br>

      Done. It's bugzilla 9860

      <br>

      <br>

      <blockquote type="cite" style="color: #000000;">

        <blockquote type="cite" style="color: #000000;">

          <blockquote type="cite" style="color: #000000;">The vector

            design philosophy in D is if you write SIMD code, and it

            compiles, you can be confident it will execute in the SIMD

            unit of your particular target processor. You won't have to

            dump the assembler output to be sure.

            <br>

          </blockquote>

          <br>

          Would it be legal for a D compiler to generate "workaround"

          code?

          <br>

        </blockquote>

        <br>

        No.

        <br>

        <br>

        <blockquote type="cite" style="color: #000000;">Otherwise the

          language changes depending on the target.

          <br>

        </blockquote>

        <br>

        That's correct.

        <br>

        <br>

        <blockquote type="cite" style="color: #000000;">Consider again

          the left shift: on an Intel CPU only v << n (v: vector;

          n: scalar) is valid. In contrast, Altivec allows v << w

          (v, w: vector). Then the same source may or may not compile

          depending on the target (with an error message saying

          'incompatible types'). As a user of a cross compiler I would

          be very surprised by this behavior.

          <br>

        </blockquote>

        <br>

        The bigger surprise would be the silent and unpredictable

        execrably bad performance. The only reason to write SIMD code is

        for performance, and the compiler ought to give an error when it

        cannot deliver SIMD performance.

        <br>

        <br>

        The workaround code can be 100x slower. This is a big deal.

        <br>

      </blockquote>

      <br>

      While I understand your argumentation I still feel a bit

      uncomfortable with it. It creates a situation in which you can't

      tell me if a D program will compile by reading the source if I

      dont' tell you the target architecture. I think this is something

      really new.

      <br>

      The current situation is that conditional compiling is used if

      interfaces etc. are not globally available.  This principle is now

      broken by the "invisible" rules which determines the availability

      of vector operations.

      <br>

      <br>

      My approach would be to define the following: if D_SIMD is defined

      then only the optimal vector operations are available. This

      ensures your goal of generating code with optimal performance. If

      D_SIMD is not defined but a vendor specific SIMD implementation is

      available then the rules of this implementation hold (which may

      include generation of "workaround" code). This has the advantage

      of being explicit:

      <br>

      <br>

          version(D_SIMD)

      <br>

          {

      <br>

              uint4 w = ...;

      <br>

              uint4 v = w << 1;

      <br>

          }

      <br>

          else version(XYZ_SIMD)

      <br>

          {

      <br>

              uint4 w = ..., x = ...;

      <br>

              // Not allowed by DMD . only fast on altivec

      <br>

              uint4 v = x << w;

      <br>

          }

      <br>

      <br>

      Or do I miss something?

      <br>

      <br>

      <blockquote type="cite" style="color: #000000;">

        <blockquote type="cite" style="color: #000000;">I really have

          Linux/PPC64 in mind but do most development on Windows...

          <br>

          (It feels a bit like ++ is only supported if the underlying

          hardware has an INC instruction...)

          <br>

        </blockquote>

        <br>

        That's a different issue, since the workaround code is just as

        fast.

        <br>

      </blockquote>

      <br>

      2nd try: core.bitop.popcnt is a "workaround" for a missing popcnt

      instruction. LDC provides an intrinsic for popcnt but this is

      lowered to the "workaround" code if the popcnt instruction is not

      available. If we apply the same rules then this is verboten.

      <br>

      <br>

      Regards

      <br>

      Kai

      <br>

    </div>

  </body>

</html>