Spec for the ‘locality’ parameter to the LDC and GDC builtin magic functions for accessing special CPU prefetch instructions
Cecil Ward
cecil at cecilward.com
Sat Aug 19 19:23:38 UTC 2023
I’m trying to write a cross-platform function that gives access
to the CPU’s prefetch instructions such as x86
prefetch0/1/2/prefetchnta and AAarch64 too. I’ve found that the
GDC and LDC compilers provide builtin magic functions for this,
and are what I need. I am trying to put together a plain-English
detailed spec for the respective builtin magic functions.
My questions:
Q1) I need to compare the spec for the GCC and LDC builtin magic
functions’ "locality" parameter. Can anyone tell me if GDC and
LDC have kept mutual compatibility here?
Q2) Could someone help me turn the GCC and LDC specs into english
regarding the locality parameter ? - see (2) and (4) below.
Q3) Does the locality parameter determine which _level_ of the
data cache hierarchy data is fetched into? Or is it always
fetched into L1 data cache and the outer ones, and this parameter
affects caches’ _future behaviour_?
Q3) Will these magic builtins work on AAarch64?
Here’s what I’ve found so far
1. GCC builtin published by the D runtime:
import gcc.simd :
prefetch;
prefetch!( rw, locality )( p );
2. GCC: builtin_prefetch (const void *addr, ...) ¶
“This function is used to minimize cache-miss latency by moving
data into a cache before it is accessed. You can insert calls to
__builtin_prefetch into code for which you know addresses of data
in memory that is likely to be accessed soon. If the target
supports them, data prefetch instructions are generated. If the
prefetch is done early enough before the access then the data
will be in the cache by the time it is accessed.
The value of addr is the address of the memory to prefetch. There
are two optional arguments, rw and locality. The value of rw is a
compile-time constant one or zero; one means that the prefetch is
preparing for a write to the memory address and zero, the
default, means that the prefetch is preparing for a read. The
value locality must be a compile-time constant integer between
zero and three. A value of zero means that the data has no
temporal locality, so it need not be left in the cache after the
access. A value of three means that the data has a high degree of
temporal locality and should be left in all levels of cache
possible. Values of one and two mean, respectively, a low or
moderate degree of temporal locality. The default is three.”
3. declare void @llvm.prefetch(ptr <address>, i32 <rw>, i32
<locality>, i32 <cache type>
4. Regarding llvm.prefetch() I found the following spec:
“rw is the specifier determining if the fetch should be for a
read (0) or write (1), and locality is a temporal locality
specifier ranging from (0) - no locality, to (3) - extremely
local keep in cache. The cache type specifies whether the
prefetch is performed on the data (1) or instruction (0) cache.
The rw, locality and cache type arguments must be constant
integers.”
5. I also found this snippet
https://dlang.org/phobos/core_builtins.html - which is great for
the syntax of the call to the LDC builtin, but the call for GDC
is no good as it lacks the parameters that I want. This D runtime
routine might benefit from accepting all the parameters that
GCC’s prefetch builtin takes.
Many thanks in advance.
More information about the Digitalmars-d-learn
mailing list