Address of data that is static, be it shared or tls or __gshared or immutable on o/s <x>

Sun Sep 10 14:38:03 PDT 2017

On Wednesday, 6 September 2017 at 15:55:35 UTC, Ali Çehreli wrote:
> On 09/06/2017 08:27 AM, Cecil Ward wrote:
> > If someone has some static data somewhere, be it in tls or
> marked shared
> > __gshared or immutable or combinations (whatever), and
> someone takes the
> > address of it and pass that address to some other routine of
> mine that
> > does not have access to the source code of the original
> definition of
> > the object in question, then is it possible to just use 'the
> address'
> > passed without knowing anything about that data? I'm assuming
> that the
> > answer might also depend on compilers, machine architectures
> and
> > operating systems?
> >
> > If this kind of assumption is very ill-advised, is there
> anything
> > written up about implementation details in different
> operating systems /
> > compilers ?
>
> Yes, they are all valid operations. Further, the object need 
> not be a static one; you can do the same with any object even 
> it's on the stack. However,
>
> - The object must remain alive whenever the other routine uses 
> it. This precludes the case of the object being on the stack 
> and the other routine saving it for later use. When that later 
> use happens, there is no object any more. (An exception: The 
> object may be kept alive by a closure; so even that case is 
> valid.)
>
> - Remember that in D data is thread-local by default; e.g. a 
> module variable will appear to be on the same address to all 
> threads but each thread will have its own copy. So, if the data 
> is going to be used in another thread, it must be defined as 
> 'shared'. Otherwise, although the code will look like it's 
> working, different threads will be accessing different data. 
> (Sometimes this is exactly what is desired but not what you're 
> looking for.) (Fortunately, many high-level thread operations 
> like the ones in std.concurrency will not let you share data 
> unless it's 'shared'.)
>
> Ali

Ali, I have worked on operating systems' development in r+d. My 
definitions of terms are hopefully the same as yours. If we refer 
to two threads, if they both belong to the same process, then 
they share a common address space, by my definition of the terms 
'thread' and 'process'. I use thread to mean basically a stack, 
plus register set, a cpu execution context, but has nothing to do 
with virtual memory spaces or o/s ownership of resources, the one 
exception being a tls space, which by definition is 
one-per-thread. A process is one or more threads plus an address 
space and a set of all the resources owned by the process 
according to the o/s. I'm just saying this so you know how I'm 
used to approving this.

Tls could I suppose either be dealt with by having allocated 
regions within a common address space that are all visible to one 
another. Objects inside a tls could (1) be referenced by absolute 
virtual addresses that are meaningful to all the threads in the 
process, but not meaningful to (threads belong to) other 
processes. (By definition of 'process'.) or (2) be referenced 
most often by section-offsets, relative addresses from the start 
of a tls section, which constantly have to be made usable by 
having the tls base virtual address added to them before they can 
be dereferenced adding a big runtime cost and making tls very bad 
news. I have worked on a system like (2). But even in (2) an 
address of a type-2 tls object can still be converted to a 
readily usable absolute virtual address and used by any thread in 
the process with zero overhead. A third option though could be to 
use processor segmentation, so tls objects have to (3a) be 
dereferenced using a segment prefixed operation, and then it's 
impossible to just have a single dereference operation such as 
star without knowing whether to use the segment prefix or not. 
But if it is again possible to use forbidden or official 
knowledge to convert the segmented form into a process-wide 
meaningful straight address (as in 8086 20-bit addresses) then we 
could term this 3a addressing. If this is not possible because vm 
hardware translation is in use then I will term this 3b. In 3a I 
am going to assume that vm hardware is used merely to provide 
relocation, address offsetting, so the use of a segmentation 
prefix basically merely adds a per-thread fixed offset to the 
virtual address and if you could discover that offset then you 
don't need to bother with the segment prefix. In 3b, vm hardware 
maps virtual addresses to a set of per-tls pages using 
who-knows-what mechanism, anyway something that apps cannot just 
bypass using forbidden knowledge to generate a single 
process-wide virtual address. This means that 3b threads are 
probably breaking my definition of thread vs process, although 
they threads of one process do also have a common address space 
and they share resources.

I don't know what d's assumptions if any are. I have very briefly 
looked at some code generated by GDC and LDC for Linux x64. It 
seems to me that these are 3a systems, optimised strongly enough 
by the compilers to remove 3a inefficiency that they are nearly 
1. But I must admit, I haven't looked into it properly, just 
noted a few things in passing and haven't written any test cases 
as I don't know d well enough yet. I haven't seen the code these 
compilers generate for Windows.

[Many thanks for your superb book btw, which I am just reading 
for the second time round. I wouldn't have got very far without 
it.]