[dmd-internals] Regarding deprecation of volatile statements

Tue Jul 31 18:59:42 PDT 2012

On Wed, Aug 1, 2012 at 2:55 AM, Walter Bright <walter at digitalmars.com> wrote:
>
> On 7/31/2012 10:02 AM, Alex Rønne Petersen wrote:
>>
>> On Wed, Jul 25, 2012 at 1:20 AM, Walter Bright <walter at digitalmars.com>
>> wrote:
>>>
>>> On 7/24/2012 3:18 PM, Alex Rønne Petersen wrote:
>>>>
>>>> On Wed, Jul 25, 2012 at 12:11 AM, Walter Bright <walter at digitalmars.com>
>>>> wrote:
>>>>>
>>>>> On 7/24/2012 2:53 PM, Alex Rønne Petersen wrote:
>>>>>>
>>>>>> But shared can't replace volatile in kernel space. shared means
>>>>>> atomics/memory fences which is not what I want - that would just give
>>>>>> me
>>>>>> unnecessary overhead. I want the proper, standard C semantics of
>>>>>> volatile,
>>>>>
>>>>>
>>>>> C does not have Standard semantics for volatile. It's a giant mess.
>>>>
>>>> Right, it leaves the exact definition of a volatile access to the
>>>> compiler.
>>>
>>>
>>> Right, that's why it is incorrect to refer to it as "standard" behavior.
>>> Behaviors I've seen include various combinations of:
>>>
>>> 1. disallowing enregistering
>>> 2. preventing folding multiple loads/stores together
>>> 3. preventing reordering across expressions with volatiles
>>> 4. inserting memory load/store fences
>>
>> As Martin already said, 1 and 2 are exactly what I need,
>
>
> Why do you need something not to be enregistered? It's usually loaded into a
> register before use, anyway. Also, why would you need 2?

I think there may be a misunderstanding. By enregistering I thought
you meant moving something off the stack and into registers
completely. But if I think about it, even that seems unnecessary. 2
and 3 should be enough, as Sean said.

For 2, see below (same reason why order matters).

>
>
>
>>   maybe with
>> the added clarification that volatile operations cannot be reordered
>> with respect to each other as David pointed out is the LLVM (and
>> therefore GCC, as LLVM is GCC-compatible) behavior.
>
>
> The only reason you'd need reordering prevention is if you had shared
> variables.

No. It's very common to use memory-mapped I/O (be it in kernel space
or via files in user space) to create stateful communication.
Reordered or folded operations would completely mess up the protocol.

>
>
>>
>>>
>>>
>>>
>>>>    But most relevant C compilers have a fairly sane definition
>>>> of this. For example, GCC:
>>>> http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html
>>>>
>>>>>> not the atomicity that people seem to associate with it.
>>>>>
>>>>>
>>>>> Exactly what semantics are you looking for?
>>>>
>>>> GCC's volatile semantics, pretty much. I want to be able to interact
>>>> with volatile memory without the compiler thinking it can optimize or
>>>> reorder (or whatever) my memory accesses. In other words, tell the
>>>> compiler to back off and leave volatile code alone.
>>>
>>>
>>> Unfortunately, this is rather vague. For example, how many read/write
>>> operations are there in v++? Optimizing is a terminally nebulous concept.
>>
>> How many reads/writes there are is actually irrelevant from my
>> perspective. The semantics that I'm after will simply guarantee that,
>> no matter how many, it'll stay at that number and in the defined order
>> of the v++ operation in the language.
>
>
> At that number? At what number? And why do you need a defined order, unless
> you're doing shared memory?

Of course memory-mapped I/O can be called "shared" memory, but it's
not shared in the traditional sense of concurrency, since memory
barriers wouldn't matter; this is all about constraining the compiler.
While memory barriers can do that, it would be inefficient.

Let's look at it this way. Suppose I have this code:

    class C { int i; }
    C c = ...;

    foo();
    c.i++;
    bar();
    c.i++;
    baz(c);

A clever compiler could trivially spot that c isn't being shared
between threads, assigned to a global, nor passed to any function. So
it's not an unreasonable optimization to rewrite this to:

    C c = ...;

    foo();
    bar();
    c.i += 2;
    baz(c);

However, this would be invalid if some part of c was mapped to some
device or file. Now, when I tack volatile on it like this:

    C c = ...;

    foo();
    volatile { c.i++; }
    bar();
    volatile { c.i++; }
    baz(c);

I'm telling the compiler that these two increments matter. Rewriting
them to a single addition of 2 is not okay. Rewriting them to two
additions of 1 (which is how most compiler IRs represent it anyway) is
perfectly fine if the compiler so desires. Further, by tacking
volatile on here, I'm telling the compiler that the order matters as
well, so the volatile statements may not be reordered with respect to
*each other* (but may be reordered with respect to other statements).

I suppose you have a point about numbers of reads and writes (which
emphasizes order being very important). So, to be precise, in an
operation like c.i++, there should be exactly one read and one write
from/to the memory location c.i. Whether it's done in a single
instruction, or whatever, is irrelevant, as long as the desired effect
on memory is achieved. It's worth noting that excessive reads from
volatile memory *are* acceptable, however, since they do not alter any
state. Only excessive writes can be problematic.

>
>
>>
>>>
>>>
>>> D volatile isn't implemented, either.
>>
>> It is in LDC and GDC.
>>
>>>
>>>> It doesn't insert a compiler reordering fence? Martin Nowak seemed to
>>>> think that it does, and a lot of old druntime code assumed that it
>>>> did...
>>>
>>>
>>> dmd, all on its own, does not reorder loads and stores across accesses to
>>> globals or pointer dereferences. This behavior is inherited from dmc, and
>>> was very successful. dmc generated code simply did not suffer from all
>>> kinds
>>> of heisenbugs common with other compilers because of that. I've always
>>> considered reordering stores to globals as a bad thing to rely on, and
>>> not a
>>> significant source of performance improvements, so deliberately disabled
>>> them.
>>>
>>> However, I do understand that the D spec does allow a compiler to do
>>> this.
>>
>> Right. What you just described is an undocumented implementation
>> detail of one particular D compiler that I simply cannot rely on.
>>
>>> Even though shared is not implemented at the low level, I suggest using
>>> it
>>> anyway as it currently does work (with or without shared). You should
>>> anyway, as the only way there'd be trouble is for multithreaded access to
>>> that memory anyway.
>>
>> ... with DMD.
>>
>> And even if we ignore the fact that this will only work with DMD,
>> shared will eventually imply either memory fences or atomic
>> operations, which means unnecessary pipeline slowdown. In a kernel.
>> Not acceptable.
>
>
>
>
>>
>>> As for exact control over read and write cycles, the only reliable way to
>>> do
>>> that is with inline assembler.
>>
>> Yes, that would perhaps work if I targeted only x86. But once a kernel
>> expands beyond one architecture, you want to keep the assembly down to
>> an absolute minimum because it makes maintenance and porting a
>> nightmare. I specifically want to target ARM once I'm done with the
>> x86 parts.
>
>
> It's not a nightmare to write an asm function that takes a pointer as an
> argument and returns what it points to. You're porting a 2 line function
> between systems.

Not between systems. Between systems and compilers. It quickly turns
into quite a few functions, especially if you're going to handle
different sizes (1, 2, 4, 8 bytes, etc), heck, you're going to have to
handle anything that a pointer can point to. You can of course define
some primitives to do this, but that not only results in inefficient
code generation, it also leads to overly verbose and unmaintainable
code because you have to read out each member of a structure manually.

Can we please not use the inline assembler as an excuse to not
implement a feature that is rather essential for a systems language,
and especially in the year 2012? I understand that there are
difficulties in working out the exact semantics, but I'm sure we can
get there, and I think we should, instead of just hack-fixing a
problem like this with inline assembly as some sort of "avoid the
optimizing compiler completely" solution, which results in
unreasonable amounts of code to maintain and port across
configurations.

>
>
>
>>
>> I don't see why implementing volatile in D with the semantics we've
>> discussed here would be so problematic. Especially considering GCC and
>> LLVM already do the right thing, and it sounds like DMD's back end
>> will too (?).
>
>
> I'd rather work on "what problem are you trying to solve" rather than
> starting with a solution and then trying to infer the problem.

It's always been about safe memory-mapped files and I/O in the face of
optimizing compilers.

>
>
>>
>>> Use these two techniques, and your code should be future proofed.
>>
>> Not so. It would make it worse (read: less portable and less
>> performant) than writing C.
>>
>>
>>
>> I think you're a bit too focused on what DMD does. I need well-defined
>> semantics that I can rely on, not implementation details of one
>> compiler's code generator and optimizer.
>>
>> In fact, if you're willing to recognize this need, I'm willing to
>> write up a DIP with well-defined volatile semantics - provided that it
>> won't be ignored (other DIPs are stagnating). This will of course mean
>> undeprecating volatile, but this time, defining its semantics
>> precisely.
>>
>> Regards,
>> Alex
>>
>
> _______________________________________________
> dmd-internals mailing list
> dmd-internals at puremagic.com
> http://lists.puremagic.com/mailman/listinfo/dmd-internals