<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <br>

    <div class="moz-cite-prefix">On 4/2/2013 10:58 PM, Kai Nacke wrote:<br>

    </div>

    <blockquote cite="mid:515BC4EF.10203@redstar.de" type="cite">

      <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

      <div class="moz-text-flowed" style="font-family: -moz-fixed;

        font-size: 14px;" lang="x-western"><br>

        While I understand your argumentation I still feel a bit

        uncomfortable with it. It creates a situation in which you can't

        tell me if a D program will compile by reading the source if I

        dont' tell you the target architecture. I think this is

        something really new. <br>

        The current situation is that conditional compiling is used if

        interfaces etc. are not globally available.  This principle is

        now broken by the "invisible" rules which determines the

        availability of vector operations.</div>

    </blockquote>

    <br>

    Performant SIMD code is simply not portable between architectures.

    The programmer writing SIMD code ought to be guaranteed he's getting

    SIMD code, not workaround code that is 100x slower. I view a

    compiler error as being far more visible than silently generating

    unacceptably slow code.<br>

    <br>

    <br>

    <blockquote cite="mid:515BC4EF.10203@redstar.de" type="cite">

      <div class="moz-text-flowed" style="font-family: -moz-fixed;

        font-size: 14px;" lang="x-western"> <br>

        <br>

        My approach would be to define the following: if D_SIMD is

        defined then only the optimal vector operations are available.

        This ensures your goal of generating code with optimal

        performance. If D_SIMD is not defined but a vendor specific SIMD

        implementation is available then the rules of this

        implementation hold (which may include generation of

        "workaround" code). This has the advantage of being explicit: <br>

        <br>

            version(D_SIMD) <br>

            { <br>

                uint4 w = ...; <br>

                uint4 v = w << 1; <br>

            } <br>

            else version(XYZ_SIMD) <br>

            { <br>

                uint4 w = ..., x = ...; <br>

                // Not allowed by DMD . only fast on altivec <br>

                uint4 v = x << w; <br>

            } <br>

        <br>

        Or do I miss something? <br>

      </div>

    </blockquote>

    <br>

    All the programmer really needs to do is use a version statement on

    the architecture for the SIMD code for that architecture, and then

    have a default with the workaround code. The point here will be that

    he *knowingly* selects the slow workaround code. This is critical

    for a systems programming language where programmers writing SIMD

    code are not always experts at dumping the compiler output to see

    what was generated.<br>

    <blockquote cite="mid:515BC4EF.10203@redstar.de" type="cite">

      <div class="moz-text-flowed" style="font-family: -moz-fixed;

        font-size: 14px;" lang="x-western"> 2nd try: core.bitop.popcnt

        is a "workaround" for a missing popcnt instruction. LDC provides

        an intrinsic for popcnt but this is lowered to the "workaround"

        code if the popcnt instruction is not available. If we apply the

        same rules then this is verboten.<br>

      </div>

    </blockquote>

    <br>

    The workaround code for popcnt isn't 100x slower.<br>

    <br>

  </body>

</html>