LDC 0.16.0 alpha3 is out! Get it, test it, give feedback!

Marco Leise via digitalmars-d-ldc digitalmars-d-ldc at puremagic.com
Mon Sep 21 01:22:27 PDT 2015


Am Sun, 20 Sep 2015 16:54:19 +0200
schrieb David Nadlinger via digitalmars-d-ldc
<digitalmars-d-ldc at puremagic.com>:

> On 19 Sep 2015, at 20:09, Marco Leise via digitalmars-d-ldc wrote:
> > It is just not very intelligible nor portable. Is there a way
> > to turn the %rcx into a wild card or %rcx/%ecx depending on
> > pointer width?
> 
> Not that I know of. My first advice would be to use the intrinsic 
> corresponding to vpcmpistri, but you mentioned the generated asm would 
> be longer?
>
>   – David

Hell yeah, the GCC intrinsics don't differentiate between a
SIMD register argument and a memory reference. Basically they
take a SIMD vector, but understand that `*simdptr` can be
encoded as a memory reference. It comes down to compiler flags
and other circumstances how an argument is passed to a SIMD
instruction. Now the problem arises when the compilers blindly
assume that all SIMD memory is aligned although SSE4.2
introduces a few instructions that work on unaligned octet
streams (this `vpcmp(i/e)str(i/m)` and at least a crc32
function IIRC) to make life easier.
When these memory references get preloaded into SIMD registers
with an aligned load you get a SEGFAULT - they require an
unaligned load if anything. Long story short, GCC knew about
this 3 years ago and it was decided that using the intrinsic
without manually putting an unaligned load in front is
incorrect. (Unless you use it on aligned data.) But it kind of
defeats the purpose of using a specialized instruction to
speed up string scanning when you have to add bloat around it.
Maybe I'll post what LLVM or GCC would have generated if used
with only intrinsics.

-- 
Marco



More information about the digitalmars-d-ldc mailing list