[dmd-internals] Regarding deprecation of volatile statements

Wed Aug 1 10:46:09 PDT 2012

On Wed, Aug 1, 2012 at 7:25 PM, Walter Bright <walter at digitalmars.com> wrote:
>
> On 7/31/2012 6:59 PM, Alex Rønne Petersen wrote:
>>
>> On Wed, Aug 1, 2012 at 2:55 AM, Walter Bright <walter at digitalmars.com>
>> wrote:
>>>
>>> On 7/31/2012 10:02 AM, Alex Rønne Petersen wrote:
>>>>
>>>> On Wed, Jul 25, 2012 at 1:20 AM, Walter Bright <walter at digitalmars.com>
>>>> wrote:
>>>>>
>>>>> On 7/24/2012 3:18 PM, Alex Rønne Petersen wrote:
>>>>>>
>>>>>> On Wed, Jul 25, 2012 at 12:11 AM, Walter Bright
>>>>>> <walter at digitalmars.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On 7/24/2012 2:53 PM, Alex Rønne Petersen wrote:
>>>>>>>>
>>>>>>>> But shared can't replace volatile in kernel space. shared means
>>>>>>>> atomics/memory fences which is not what I want - that would just
>>>>>>>> give
>>>>>>>> me
>>>>>>>> unnecessary overhead. I want the proper, standard C semantics of
>>>>>>>> volatile,
>>>>>>>
>>>>>>>
>>>>>>> C does not have Standard semantics for volatile. It's a giant mess.
>>>>>>
>>>>>> Right, it leaves the exact definition of a volatile access to the
>>>>>> compiler.
>>>>>
>>>>>
>>>>> Right, that's why it is incorrect to refer to it as "standard"
>>>>> behavior.
>>>>> Behaviors I've seen include various combinations of:
>>>>>
>>>>> 1. disallowing enregistering
>>>>> 2. preventing folding multiple loads/stores together
>>>>> 3. preventing reordering across expressions with volatiles
>>>>> 4. inserting memory load/store fences
>>>>
>>>> As Martin already said, 1 and 2 are exactly what I need,
>>>
>>>
>>> Why do you need something not to be enregistered? It's usually loaded
>>> into a
>>> register before use, anyway. Also, why would you need 2?
>>
>> I think there may be a misunderstanding. By enregistering I thought
>> you meant moving something off the stack and into registers
>> completely.
>
>
> That's what it means. But also, I have no idea what problem is addressed by
> not disallowing register allocation.
>
>
>
>>   But if I think about it, even that seems unnecessary. 2
>> and 3 should be enough, as Sean said.
>
>
> To reiterate, this is why I need to know what problem you are trying to
> address, rather than going at it from the solution point of view.
>
>
>
>>
>> For 2, see below (same reason why order matters).
>>
>>>
>>>
>>>>    maybe with
>>>> the added clarification that volatile operations cannot be reordered
>>>> with respect to each other as David pointed out is the LLVM (and
>>>> therefore GCC, as LLVM is GCC-compatible) behavior.
>>>
>>>
>>> The only reason you'd need reordering prevention is if you had shared
>>> variables.
>>
>> No. It's very common to use memory-mapped I/O (be it in kernel space
>> or via files in user space) to create stateful communication.
>> Reordered or folded operations would completely mess up the protocol.
>
>
> Communication between what?

Typically processes doing different things. One process might be
receiving data from the network, while another processes it. It
depends entirely on application design.

>
>
>
>>
>>>
>>>>>
>>>>>
>>>>>>     But most relevant C compilers have a fairly sane definition
>>>>>> of this. For example, GCC:
>>>>>> http://gcc.gnu.org/onlinedocs/gcc/Volatiles.html
>>>>>>
>>>>>>>> not the atomicity that people seem to associate with it.
>>>>>>>
>>>>>>>
>>>>>>> Exactly what semantics are you looking for?
>>>>>>
>>>>>> GCC's volatile semantics, pretty much. I want to be able to interact
>>>>>> with volatile memory without the compiler thinking it can optimize or
>>>>>> reorder (or whatever) my memory accesses. In other words, tell the
>>>>>> compiler to back off and leave volatile code alone.
>>>>>
>>>>>
>>>>> Unfortunately, this is rather vague. For example, how many read/write
>>>>> operations are there in v++? Optimizing is a terminally nebulous
>>>>> concept.
>>>>
>>>> How many reads/writes there are is actually irrelevant from my
>>>> perspective. The semantics that I'm after will simply guarantee that,
>>>> no matter how many, it'll stay at that number and in the defined order
>>>> of the v++ operation in the language.
>>>
>>>
>>> At that number? At what number? And why do you need a defined order,
>>> unless
>>> you're doing shared memory?
>>
>> Of course memory-mapped I/O can be called "shared" memory, but it's
>> not shared in the traditional sense of concurrency, since memory
>> barriers wouldn't matter; this is all about constraining the compiler.
>> While memory barriers can do that, it would be inefficient.
>>
>> Let's look at it this way. Suppose I have this code:
>>
>>      class C { int i; }
>>      C c = ...;
>>
>>      foo();
>>      c.i++;
>>      bar();
>>      c.i++;
>>      baz(c);
>>
>> A clever compiler could trivially spot that c isn't being shared
>> between threads, assigned to a global, nor passed to any function. So
>> it's not an unreasonable optimization to rewrite this to:
>>
>>      C c = ...;
>>
>>      foo();
>>      bar();
>>      c.i += 2;
>>      baz(c);
>>
>> However, this would be invalid if some part of c was mapped to some
>> device or file. Now, when I tack volatile on it like this:
>>
>>      C c = ...;
>>
>>      foo();
>>      volatile { c.i++; }
>>      bar();
>>      volatile { c.i++; }
>>      baz(c);
>>
>> I'm telling the compiler that these two increments matter. Rewriting
>> them to a single addition of 2 is not okay. Rewriting them to two
>> additions of 1 (which is how most compiler IRs represent it anyway) is
>> perfectly fine if the compiler so desires. Further, by tacking
>> volatile on here, I'm telling the compiler that the order matters as
>> well, so the volatile statements may not be reordered with respect to
>> *each other* (but may be reordered with respect to other statements).
>>
>> I suppose you have a point about numbers of reads and writes (which
>> emphasizes order being very important). So, to be precise, in an
>> operation like c.i++, there should be exactly one read and one write
>> from/to the memory location c.i. Whether it's done in a single
>> instruction, or whatever, is irrelevant, as long as the desired effect
>> on memory is achieved.
>
>
> This is quite incorrect. i++ can be one read and one write, or two reads and
> one write. There's nothing about volatile or the C standard that says
> anything about read/write cycles. The C compiler you're using may happen to
> do what you want, but you wouldn't be relying on any sort of guarantee,
> portable or not.

That wasn't meant to be in the context of C, but just memory-mapped
I/O in general.

>
>
>
>>   It's worth noting that excessive reads from
>> volatile memory *are* acceptable, however, since they do not alter any
>> state. Only excessive writes can be problematic.
>
>
> The standard doesn't say anything about how many write cycles an operation
> may or may not do.

Same here.

>
>
>
>>
>>>
>>>>>
>>>>> D volatile isn't implemented, either.
>>>>
>>>> It is in LDC and GDC.
>>>>
>>>>>> It doesn't insert a compiler reordering fence? Martin Nowak seemed to
>>>>>> think that it does, and a lot of old druntime code assumed that it
>>>>>> did...
>>>>>
>>>>>
>>>>> dmd, all on its own, does not reorder loads and stores across accesses
>>>>> to
>>>>> globals or pointer dereferences. This behavior is inherited from dmc,
>>>>> and
>>>>> was very successful. dmc generated code simply did not suffer from all
>>>>> kinds
>>>>> of heisenbugs common with other compilers because of that. I've always
>>>>> considered reordering stores to globals as a bad thing to rely on, and
>>>>> not a
>>>>> significant source of performance improvements, so deliberately
>>>>> disabled
>>>>> them.
>>>>>
>>>>> However, I do understand that the D spec does allow a compiler to do
>>>>> this.
>>>>
>>>> Right. What you just described is an undocumented implementation
>>>> detail of one particular D compiler that I simply cannot rely on.
>>>>
>>>>> Even though shared is not implemented at the low level, I suggest using
>>>>> it
>>>>> anyway as it currently does work (with or without shared). You should
>>>>> anyway, as the only way there'd be trouble is for multithreaded access
>>>>> to
>>>>> that memory anyway.
>>>>
>>>> ... with DMD.
>>>>
>>>> And even if we ignore the fact that this will only work with DMD,
>>>> shared will eventually imply either memory fences or atomic
>>>> operations, which means unnecessary pipeline slowdown. In a kernel.
>>>> Not acceptable.
>>>
>>>
>>>
>>>
>>>>> As for exact control over read and write cycles, the only reliable way
>>>>> to
>>>>> do
>>>>> that is with inline assembler.
>>>>
>>>> Yes, that would perhaps work if I targeted only x86. But once a kernel
>>>> expands beyond one architecture, you want to keep the assembly down to
>>>> an absolute minimum because it makes maintenance and porting a
>>>> nightmare. I specifically want to target ARM once I'm done with the
>>>> x86 parts.
>>>
>>>
>>> It's not a nightmare to write an asm function that takes a pointer as an
>>> argument and returns what it points to. You're porting a 2 line function
>>> between systems.
>>
>> Not between systems. Between systems and compilers. It quickly turns
>> into quite a few functions, especially if you're going to handle
>> different sizes (1, 2, 4, 8 bytes, etc),
>
>
> Just one if you use a template.

That's a good point, but that template still has to handle all kinds
of weird struct layouts correctly, including those with custom
alignment specifiers.

>
>
>>   heck, you're going to have to
>> handle anything that a pointer can point to. You can of course define
>> some primitives to do this, but that not only results in inefficient
>> code generation, it also leads to overly verbose and unmaintainable
>> code because you have to read out each member of a structure manually.
>
>
> I think this is an exaggeration.

I don't follow. How else would you do it?

>
>
>>
>> Can we please not use the inline assembler as an excuse to not
>> implement a feature that is rather essential for a systems language,
>> and especially in the year 2012? I understand that there are
>> difficulties in working out the exact semantics, but I'm sure we can
>> get there, and I think we should, instead of just hack-fixing a
>> problem like this with inline assembly as some sort of "avoid the
>> optimizing compiler completely" solution, which results in
>> unreasonable amounts of code to maintain and port across
>> configurations.
>
>
> I don't think it is an unreasonable amount of code at all.

It really depends on how much of such code you're gonna have to write.

Now, I certainly can't say from experience since I'm far from having
an even remotely usable kernel, but I expect most device drivers to
work through memory-mapped I/O, and hardware vendors aren't known for
following standards of any kind...

>
> However, I can see it as a compiler builtin function, like bsr() and inp()
> are. Those are fairly straightforward, and are certainly a lot easier to
> understand than volatile semantics, which cause nothing but confusion. They
> are also as efficient as can be once implemented by the compiler (and you
> can use them with your own implementation in the meanwhile).
>
>
>
>>
>>>
>>>
>>>> I don't see why implementing volatile in D with the semantics we've
>>>> discussed here would be so problematic. Especially considering GCC and
>>>> LLVM already do the right thing, and it sounds like DMD's back end
>>>> will too (?).
>>>
>>>
>>> I'd rather work on "what problem are you trying to solve" rather than
>>> starting with a solution and then trying to infer the problem.
>>
>> It's always been about safe memory-mapped files and I/O in the face of
>> optimizing compilers.
>>
>
> Well, you didn't say that until now :-). But now that I know what you're
> trying to do, I think that a couple compiler intrinsics can do the job.

Sorry, I guess this thread has been more than a little unclear/confusing.

I'm not opposed to intrinsics if they can be reasonably implemented by
GDC and LDC as well. Also, they should be templated so they work with
arbitrary pointer types. I would recommend names like "volatileLoad"
and "volatileStore" since that would be immediately familiar to C
programmers and it would do what they expect their C compiler to do
(even if not standardized in C land).

Regards,
Alex