primitive vector types

Fri Feb 20 00:41:13 PST 2009

On Fri, 20 Feb 2009 08:55:16 +0300, Denis Koroskin <2korden at gmail.com>  
wrote:

> On Fri, 20 Feb 2009 06:22:40 +0300, Andrei Alexandrescu  
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> Denis Koroskin wrote:
>>> On Thu, 19 Feb 2009 23:05:34 +0300, Andrei Alexandrescu  
>>> <SeeWebsiteForEmail at erdani.org> wrote:
>>>
>>>> Denis Koroskin wrote:
>>>>> On Thu, 19 Feb 2009 22:25:04 +0300, Mattias Holm  
>>>>> <hannibal.holm at gmail.com> wrote:
>>>>>
>>>>>> Since (SIMD) vectors are so common and every reasonabe system  
>>>>>> support them in one way or the other (and scalar emulation of this  
>>>>>> is rather simple), why not have support for this in D directly?
>>>>>>
>>>>>> Yes, the array operations are nice (and one of the main reasons for  
>>>>>> why I like D :) ), but have the problem that an array of floats  
>>>>>> must be aligned on float boundaries and not vector boundaries. In  
>>>>>> my mind vectors are a primitive data type that should be exposed by  
>>>>>> the programming language.
>>>>>>
>>>>>> Something OpenCL-like:
>>>>>>
>>>>>>     float4 vec;
>>>>>>     vec.xyzw = {1.0,1.0, 1.0, 1.0}; // assignment
>>>>>>     vec.xyzw = vec.wyxz; // permutation
>>>>>>     vec[i] = 1.0; // indexing
>>>>>>
>>>>>> And then we can easily immagine some extra nice features to have  
>>>>>> with respect to operators:
>>>>>>
>>>>>>     vec ^ vec2; // 3d cross product for float vectors, for int  
>>>>>> vectors xor
>>>>>>
>>>>>> Has this been discussed before?
>>>>>>
>>>>>> / Mattias
>>>>>>
>>>>>  I don't see any reason why float4 can't be made a library type.
>>>>
>>>> Yah, I was thinking the same:
>>>>
>>>> struct float4
>>>> {
>>>>      __align(16) float[4] data; // right syntax and value?
>>>>      alias data this;
>>>> }
>>>>
>>>> This looks like something that should go into std.matrix pronto. It  
>>>> even has value semantics even though fixed arrays don't :o/.
>>>>
>>>>
>>>> Andrei
>>>  That would be great. If float4 gets its way into D, I'll share our  
>>> blazing fast math code with community (most common operations on  
>>> vectors, matrices, quaternions etc). It is written entirely in SSE  
>>> (intrinsics, not asm; there is a problem with inlining asm in D, IIRC.  
>>> Can anyone elaborate on this?) and *very* fast. According to our  
>>> benchmarks, that's the best we get squeeze out of hardware.
>>>  I know LLVM have support for *very* wide range of intrinsics:
>>> http://www.cs.ucla.edu/classes/spring08/cs259/llvm-2.2/include/llvm/Intrinsics.gen  
>>>   Hopefully they will get into LDC (and DMD *hint* Walter *hint*) very  
>>> soon.
>>>
>>
>> Put me down for that. What do I need to do?
>>
>> Andrei
>
> Convince Walter to add float4 type and some intrinsics to DMD (I'll post  
> a list of those we use later), LDC will follow, I believe.
> There should be some type that would be treated specially. After all,  
> intrinsics have function signatures and those should specify some  
> concrete types.
>

Here is a nice documentation about MMX, SSE, SSE2 intrinsics:
http://msdn.microsoft.com/en-us/library/y0dh78ez(VS.80).aspx

Here is a quick statistics on what intrinsics are used in our code and how  
many times.
Note that it doesn't directly maps to how many times it is *actually* used  
in user-code.

This info may give Walter some information about priorities (those  
intrinsics that aren't often used may be given lower priority, for  
example).

Arithmetic Operations (Floating-Point SSE2 Intrinsics)
http://msdn.microsoft.com/en-us/library/708ya3be(VS.80).aspx
_mm_add_ss - 2
_mm_add_ps - 48
_mm_sub_ss - 4
_mm_sub_ps - 24
_mm_mul_ss - 2
_mm_mul_ps - 100
_mm_div_ss - 0
_mm_div_ps - 1
_mm_sqrt_ss - 0
_mm_sqrt_ps - 0
_mm_rcp_ss - 1
_mm_rcp_ps - 0
_mm_rsqrt_ss - 0
_mm_rsqrt_ps - 1
_mm_min_ss - 0
_mm_min_ps - 1
_mm_max_ss - 0
_mm_max_ps - 1

Store Operations (SSE)
http://msdn.microsoft.com/en-us/library/ybhzf6dk(VS.80).aspx
_mm_store_ss - 1
_mm_store1_ps - 0
_mm_store_ps1 - 0
_mm_store_ps - 0
_mm_storeu_ps - 0
_mm_storer_ps - 0
_mm_move_ss - 2

Set Operations (SSE)
http://msdn.microsoft.com/en-us/library/wbzwdy6a(VS.80).aspx
_mm_set_ss - 0
_mm_set1_ps - 0
_mm_set_ps1 - 19
_mm_set_ps - 45
_mm_setr_ps - 0
_mm_setzero_ps - 2

Logical Operations (SSE)
http://msdn.microsoft.com/en-us/library/9759as73(VS.80).aspx
_mm_and_ps - 2
_mm_andnot_ps - 0
_mm_or_ps - 0
_mm_xor_ps - 3

Miscellaneous Instructions That Use Streaming SIMD Extensions
http://msdn.microsoft.com/en-us/library/dzs626wx.aspx
_mm_shuffle_ps - 124
_mm_shuffle_pi16 - 0
_mm_unpackhi_ps - 0
_mm_unpacklo_ps - 0
_mm_loadh_pi - 0
_mm_storeh_pi - 0
_mm_movehl_ps - 0
_mm_movelh_ps - 0
_mm_loadl_pi - 0
_mm_storel_pi - 0
_mm_movemask_ps - 0
_mm_getcsr - 0
_mm_setcsr - 0
_mm_extract_si64 - 0
_mm_extracti_si64 - 0
_mm_insert_si64 - 0
_mm_inserti_si64 - 0

Comparison Intrinsics (SSE)
http://msdn.microsoft.com/en-us/library/w8kez9sf(VS.80).aspx
Not used

Conversion Operations (SSE)
http://msdn.microsoft.com/en-us/library/0d4dtzhb(VS.80).aspx
Not used

Macros
_MM_SHUFFLE - 100 - #define _MM_SHUFFLE(fp3,fp2,fp1,fp0) (((fp3) << 6) |  
((fp2) << 4) | ((fp1) << 2) | ((fp0)))