[dmd-internals] Regarding deprecation of volatile statements

Wed Aug 1 10:25:31 PDT 2012

On 7/31/2012 6:59 PM, Alex Rønne Petersen wrote:
> On Wed, Aug 1, 2012 at 2:55 AM, Walter Bright <walter at digitalmars.com> wrote:
>> On 7/31/2012 10:02 AM, Alex Rønne Petersen wrote:
>>> On Wed, Jul 25, 2012 at 1:20 AM, Walter Bright <walter at digitalmars.com>
>>> wrote:
>>>> On 7/24/2012 3:18 PM, Alex Rønne Petersen wrote:
>>>>> On Wed, Jul 25, 2012 at 12:11 AM, Walter Bright <walter at digitalmars.com>
>>>>> wrote:
>>>>>> On 7/24/2012 2:53 PM, Alex Rønne Petersen wrote:
>>>>>>> But shared can't replace volatile in kernel space. shared means
>>>>>>> atomics/memory fences which is not what I want - that would just give
>>>>>>> me
>>>>>>> unnecessary overhead. I want the proper, standard C semantics of
>>>>>>> volatile,
>>>>>>
>>>>>> C does not have Standard semantics for volatile. It's a giant mess.
>>>>> Right, it leaves the exact definition of a volatile access to the
>>>>> compiler.
>>>>
>>>> Right, that's why it is incorrect to refer to it as "standard" behavior.
>>>> Behaviors I've seen include various combinations of:
>>>>
>>>> 1. disallowing enregistering
>>>> 2. preventing folding multiple loads/stores together
>>>> 3. preventing reordering across expressions with volatiles
>>>> 4. inserting memory load/store fences
>>> As Martin already said, 1 and 2 are exactly what I need,
>>
>> Why do you need something not to be enregistered? It's usually loaded into a
>> register before use, anyway. Also, why would you need 2?
> I think there may be a misunderstanding. By enregistering I thought
> you meant moving something off the stack and into registers
> completely.

That's what it means. But also, I have no idea what problem is addressed by not 
disallowing register allocation.

>   But if I think about it, even that seems unnecessary. 2
> and 3 should be enough, as Sean said.

To reiterate, this is why I need to know what problem you are trying to address, 
rather than going at it from the solution point of view.

>
> For 2, see below (same reason why order matters).
>
>>
>>
>>>    maybe with
>>> the added clarification that volatile operations cannot be reordered
>>> with respect to each other as David pointed out is the LLVM (and
>>> therefore GCC, as LLVM is GCC-compatible) behavior.
>>
>> The only reason you'd need reordering prevention is if you had shared
>> variables.
> No. It's very common to use memory-mapped I/O (be it in kernel space
> or via files in user space) to create stateful communication.
> Reordered or folded operations would completely mess up the protocol.

Communication between what?

>
>>
>>>>
>>>>
>>>>>     But most relevant C compilers have a fairly sane definition
>>>>> of this. For example, GCC:
>>>>> http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html
>>>>>
>>>>>>> not the atomicity that people seem to associate with it.
>>>>>>
>>>>>> Exactly what semantics are you looking for?
>>>>> GCC's volatile semantics, pretty much. I want to be able to interact
>>>>> with volatile memory without the compiler thinking it can optimize or
>>>>> reorder (or whatever) my memory accesses. In other words, tell the
>>>>> compiler to back off and leave volatile code alone.
>>>>
>>>> Unfortunately, this is rather vague. For example, how many read/write
>>>> operations are there in v++? Optimizing is a terminally nebulous concept.
>>> How many reads/writes there are is actually irrelevant from my
>>> perspective. The semantics that I'm after will simply guarantee that,
>>> no matter how many, it'll stay at that number and in the defined order
>>> of the v++ operation in the language.
>>
>> At that number? At what number? And why do you need a defined order, unless
>> you're doing shared memory?
> Of course memory-mapped I/O can be called "shared" memory, but it's
> not shared in the traditional sense of concurrency, since memory
> barriers wouldn't matter; this is all about constraining the compiler.
> While memory barriers can do that, it would be inefficient.
>
> Let's look at it this way. Suppose I have this code:
>
>      class C { int i; }
>      C c = ...;
>
>      foo();
>      c.i++;
>      bar();
>      c.i++;
>      baz(c);
>
> A clever compiler could trivially spot that c isn't being shared
> between threads, assigned to a global, nor passed to any function. So
> it's not an unreasonable optimization to rewrite this to:
>
>      C c = ...;
>
>      foo();
>      bar();
>      c.i += 2;
>      baz(c);
>
> However, this would be invalid if some part of c was mapped to some
> device or file. Now, when I tack volatile on it like this:
>
>      C c = ...;
>
>      foo();
>      volatile { c.i++; }
>      bar();
>      volatile { c.i++; }
>      baz(c);
>
> I'm telling the compiler that these two increments matter. Rewriting
> them to a single addition of 2 is not okay. Rewriting them to two
> additions of 1 (which is how most compiler IRs represent it anyway) is
> perfectly fine if the compiler so desires. Further, by tacking
> volatile on here, I'm telling the compiler that the order matters as
> well, so the volatile statements may not be reordered with respect to
> *each other* (but may be reordered with respect to other statements).
>
> I suppose you have a point about numbers of reads and writes (which
> emphasizes order being very important). So, to be precise, in an
> operation like c.i++, there should be exactly one read and one write
> from/to the memory location c.i. Whether it's done in a single
> instruction, or whatever, is irrelevant, as long as the desired effect
> on memory is achieved.

This is quite incorrect. i++ can be one read and one write, or two reads and one 
write. There's nothing about volatile or the C standard that says anything about 
read/write cycles. The C compiler you're using may happen to do what you want, 
but you wouldn't be relying on any sort of guarantee, portable or not.

>   It's worth noting that excessive reads from
> volatile memory *are* acceptable, however, since they do not alter any
> state. Only excessive writes can be problematic.

The standard doesn't say anything about how many write cycles an operation may 
or may not do.

>
>>
>>>>
>>>> D volatile isn't implemented, either.
>>> It is in LDC and GDC.
>>>
>>>>> It doesn't insert a compiler reordering fence? Martin Nowak seemed to
>>>>> think that it does, and a lot of old druntime code assumed that it
>>>>> did...
>>>>
>>>> dmd, all on its own, does not reorder loads and stores across accesses to
>>>> globals or pointer dereferences. This behavior is inherited from dmc, and
>>>> was very successful. dmc generated code simply did not suffer from all
>>>> kinds
>>>> of heisenbugs common with other compilers because of that. I've always
>>>> considered reordering stores to globals as a bad thing to rely on, and
>>>> not a
>>>> significant source of performance improvements, so deliberately disabled
>>>> them.
>>>>
>>>> However, I do understand that the D spec does allow a compiler to do
>>>> this.
>>> Right. What you just described is an undocumented implementation
>>> detail of one particular D compiler that I simply cannot rely on.
>>>
>>>> Even though shared is not implemented at the low level, I suggest using
>>>> it
>>>> anyway as it currently does work (with or without shared). You should
>>>> anyway, as the only way there'd be trouble is for multithreaded access to
>>>> that memory anyway.
>>> ... with DMD.
>>>
>>> And even if we ignore the fact that this will only work with DMD,
>>> shared will eventually imply either memory fences or atomic
>>> operations, which means unnecessary pipeline slowdown. In a kernel.
>>> Not acceptable.
>>
>>
>>
>>>> As for exact control over read and write cycles, the only reliable way to
>>>> do
>>>> that is with inline assembler.
>>> Yes, that would perhaps work if I targeted only x86. But once a kernel
>>> expands beyond one architecture, you want to keep the assembly down to
>>> an absolute minimum because it makes maintenance and porting a
>>> nightmare. I specifically want to target ARM once I'm done with the
>>> x86 parts.
>>
>> It's not a nightmare to write an asm function that takes a pointer as an
>> argument and returns what it points to. You're porting a 2 line function
>> between systems.
> Not between systems. Between systems and compilers. It quickly turns
> into quite a few functions, especially if you're going to handle
> different sizes (1, 2, 4, 8 bytes, etc),

Just one if you use a template.

>   heck, you're going to have to
> handle anything that a pointer can point to. You can of course define
> some primitives to do this, but that not only results in inefficient
> code generation, it also leads to overly verbose and unmaintainable
> code because you have to read out each member of a structure manually.

I think this is an exaggeration.

>
> Can we please not use the inline assembler as an excuse to not
> implement a feature that is rather essential for a systems language,
> and especially in the year 2012? I understand that there are
> difficulties in working out the exact semantics, but I'm sure we can
> get there, and I think we should, instead of just hack-fixing a
> problem like this with inline assembly as some sort of "avoid the
> optimizing compiler completely" solution, which results in
> unreasonable amounts of code to maintain and port across
> configurations.

I don't think it is an unreasonable amount of code at all.

However, I can see it as a compiler builtin function, like bsr() and inp() are. 
Those are fairly straightforward, and are certainly a lot easier to understand 
than volatile semantics, which cause nothing but confusion. They are also as 
efficient as can be once implemented by the compiler (and you can use them with 
your own implementation in the meanwhile).

>
>>
>>
>>> I don't see why implementing volatile in D with the semantics we've
>>> discussed here would be so problematic. Especially considering GCC and
>>> LLVM already do the right thing, and it sounds like DMD's back end
>>> will too (?).
>>
>> I'd rather work on "what problem are you trying to solve" rather than
>> starting with a solution and then trying to infer the problem.
> It's always been about safe memory-mapped files and I/O in the face of
> optimizing compilers.
>

Well, you didn't say that until now :-). But now that I know what you're trying 
to do, I think that a couple compiler intrinsics can do the job.