Spec for the ‘locality’ parameter to the LDC and GDC builtin magic functions for accessing special CPU prefetch instructions

Cecil Ward cecil at cecilward.com
Sat Aug 19 19:23:38 UTC 2023


I’m trying to write a cross-platform function that gives access 
to the CPU’s prefetch instructions such as x86 
prefetch0/1/2/prefetchnta and AAarch64 too. I’ve found that the 
GDC and LDC compilers provide builtin magic functions for this, 
and are what I need. I am trying to put together a plain-English 
detailed spec for the respective builtin magic functions.

My questions:

Q1) I need to compare the spec for the GCC and LDC builtin magic 
functions’ "locality" parameter. Can anyone tell me if GDC and 
LDC have kept mutual compatibility here?

Q2) Could someone help me turn the GCC and LDC specs into english 
regarding the locality parameter ? - see (2) and (4) below.

Q3) Does the locality parameter determine which _level_ of the 
data cache hierarchy data is fetched into? Or is it always 
fetched into L1 data cache and the outer ones, and this parameter 
affects caches’ _future behaviour_?

Q3) Will these magic builtins work on AAarch64?

Here’s what I’ve found so far

1. GCC builtin published by the D runtime:
    import gcc.simd : 
prefetch;
	    	prefetch!( rw, locality )( p );

    2. GCC: builtin_prefetch (const void *addr, ...) ¶
“This function is used to minimize cache-miss latency by moving 
data into a cache before it is accessed. You can insert calls to 
__builtin_prefetch into code for which you know addresses of data 
in memory that is likely to be accessed soon. If the target 
supports them, data prefetch instructions are generated. If the 
prefetch is done early enough before the access then the data 
will be in the cache by the time it is accessed.
The value of addr is the address of the memory to prefetch. There 
are two optional arguments, rw and locality. The value of rw is a 
compile-time constant one or zero; one means that the prefetch is 
preparing for a write to the memory address and zero, the 
default, means that the prefetch is preparing for a read. The 
value locality must be a compile-time constant integer between 
zero and three. A value of zero means that the data has no 
temporal locality, so it need not be left in the cache after the 
access. A value of three means that the data has a high degree of 
temporal locality and should be left in all levels of cache 
possible. Values of one and two mean, respectively, a low or 
moderate degree of temporal locality. The default is three.”

3. declare void @llvm.prefetch(ptr <address>, i32 <rw>, i32 
<locality>, i32 <cache type>

4. Regarding llvm.prefetch() I found the following spec:
“rw is the specifier determining if the fetch should be for a 
read (0) or write (1), and locality is a temporal locality 
specifier ranging from (0) - no locality, to (3) - extremely 
local keep in cache. The cache type specifies whether the 
prefetch is performed on the data (1) or instruction (0) cache. 
The rw, locality and cache type arguments must be constant 
integers.”

5. I also found this snippet 
https://dlang.org/phobos/core_builtins.html - which is great for 
the syntax of the call to the LDC builtin, but the call for GDC 
is no good as it lacks the parameters that I want. This D runtime 
routine might benefit from accepting all the parameters that 
GCC’s prefetch builtin takes.

Many thanks in advance.



More information about the Digitalmars-d-learn mailing list