DIP 1024--Shared Atomics--Community Review Round 1

Manu turkeyman at gmail.com
Tue Oct 15 04:09:52 UTC 2019


On Mon, Oct 14, 2019 at 2:50 PM Ola Fosheim Grøstad via Digitalmars-d
<digitalmars-d at puremagic.com> wrote:
>
> On Monday, 14 October 2019 at 18:04:28 UTC, Manu wrote:
> > And it's not at hominem, I really do think you're just making
> > stuff up
> > (although I'm sure informed by ideas in competing ecosystems)...
>
> If you go after the person and not the statement then that is "ad
> hominem", isn't it?  Claiming that people are making stuff up
> does not compel me to expand on the topic, but I'll try to give
> you an idea of the landscape as I see it, anyway.
>
> I'm not sure what you mean by «competing eco systems», but I am
> talking about theoretical and applied research. And it is not a
> field I am highly interested in, but I have a passing interest in
> it. It makes a lot of sense to use such approaches if you are
> designing communication systems or critical control systems. It
> might be wasted on games.
>
> Anyway, there are various ways of reasoning formally about
> concurrency, and it has been done for a long time... And there
> are many approaches.  Ranging from dedicated tools for
> programming in various niches where it is beneficial, to formal
> methods and tools for analyzing distributed protocols. It is a
> big field, and I think the latter theme is easier to get into,
> and you'll find plenty on this if you search on keywords like:
> formal methods, formally verified, distributed computing,
> distributed protocols...

I'm aware of a lot of work in this area, but I haven't seem how it
influences a language at the lowest level (beyond ownership based
solutions, where transfer of unique ownership is used to model
thread-safe data transitions); it's usually in the realm of what I
would call 'framework', although there are many examples of such being
blessed by the language.
I don't think D has a great history with blessing an implementation as
sacred and installing it into the syntax, and D tends to be expressive
enough to make library solutions for this sort of thing quite
satisfying.

> > The closest thing you've said to a real implementation proposal
> > so far
> > is strong-ownership based message passing like rust, but that's
> > not
> > really in contest with this DIP, and I can't imagine how
> > `shared`
> > would have any influence on that toolbox.
>
> My argument was that D would not have it, so it would not be
> possible to make significant optimization based on modelling
> concurrency. Therefore "shared" as a library type ought to be
> sufficient.  That was my line of argument.
>
> I do not think D should model concurrency formally.

Then how do you define thread-local by default, and how do you
mark/identify data that violates that axiom?
D doesn't "model concurrency", that's not what `shared` does, it
simply establishes thread-locality.

> I belive such languages should be created from a clean slate.
> Though, some appears to do stuff with annotated C, but it does
> not sound very compelling to just slap it onto an existing
> language.

Are you talking about D here, or just a hypothetical application of your POV?

> > That's all thread-local concepts. There's nothing reasonable
> > you can
> > determine between threads without synchronising the threads.
>
> You might need synchronization, but you can potentially have less
> of it if you have a proof that thread 1 is always in state B when
> thread 2 is in state A, or that thread 1 is never in state B if
> thread 2 is not in state D, etc.

This is well into 'framework' territory; a thread is associated with
static knowledge of a state machine, known threads are collected into
a registry, and then state transitions can assert validity against the
other threads in the pool.
This sounds super complex, hard to express, and not worth of language
syntax. It also has nothing to do with `shared`.

> It is not uncommon to modell processes as statemachines and
> analyze them formally.

Sure, but we don't do that, or anything that's even remotely in that
ballpark. It's not a functional language, and we don't have *code* or
data access introspection like that.

> When you have that model, you can
> implement it at a lower level if you want, and make a
> correspondence proof that tie each chunk of low level code to the
> high level model. Then there are languages where everything is
> handled for you in a "regular" programming language, but they
> will have a particular focus, with a particular set of trade-offs.

That sort of language goes all in on one very restrictive programming
model. It's not anything like D, and I don't think it's a
consideration when talking about `shared`.
If you want to resist `shared`'s existence, then I don't think ideas
like this will move us forward.

> > The only reasonable CS model of threading I've ever seen
> > requires you
> > to start talking about strong ownership and transfering
> > ownership
> > between threads.
>
> There are many ways to model and reason about concurrency. One
> approach was actually inspired by the modelling of chemical
> reactions using PetriNets.
>
> One "impractical", but interesting, low level
> instruction-by-instruction approach uses Hoare logic. While
> possible, you might be limited to very few instructions. Anyway,
> with this approach you might in theory be able to avoid explicit
> synchronization in some cases as you can then prove whether or
> not all possible interleaving of instructions during execution
> comply with the specification! Still, the combinatorial explosion
> makes this very limiting and hard to reason about.  Anyway, Hoare
> logic is a very low level approach that has been researched in
> the 70s or 80s, so as you can see there is a history of various
> approaches.  Maybe applicable to a tight low level scope like a
> simple high performance hardware driver.

I've read about ideas like this before, but like, it's strictly
fantasy. I'm interested in what's real, where we are, what we have,
and where we need to be in practical terms.

> > While I agree that strong ownership and message passing is
> > robust and
> > worth having in any async ecosystem, it's also tremendously
> > inefficient for certain workloads, and if that were all we had
> > baked
> > into the language, it excludes other (useful) possibilities.
>
> Ok, but I didn't argue for it being built into the language...

There's gotta be something at the bottom of the stack, or you can't
write anything above.

> I am arguing that if it is built into the language then there
> should be a semantic reason for doing so. And static analysis of
> locking patterns might be one such reason.

There is; D specifies thread-local by default, and that absolutely
requires a marker (and associated semantics) to attribute the data
that violates that principle.

This has nothing to do with locking, that's a much higher-level
problem and completely outside the language.
Locking is this:
  * I have this shared data, I want to access it...
  * I must create a context where I have an exclusive window on the data.
  * One such implementation involves a mutex of some form, where I
assert that I am the unique lease-holder, and then in effect the data
becomes thread-local for the duration of that lease (ie, cast shared
away, because the data is thread-local for that duration).

In D, we can implement semantics like this by scope-guarded lock tools
of various kinds. It is also possible to achieve thread-locality by
deferring to a scheduling infrastructure with knowledge of
access-control. Such frameworks will cast to thread-local when issuing
access controlled data to the user.
We should make use of D's recent escape analysis tricks to guarantee
that thread-local leases remain properly scoped, and mutually
exclusive with other leases.

> But in general I think D would be better off by providing generic
> mechanisms rather than adding more and more limited features.

What does this mean to you? We can write very rich libraries with
`shared` as a safety mechanism. Nothing more is needed from the
language, except perhaps some future implicit cast rules.

> So maybe not add borrowing, but perhaps add some sort of limited
> linear/affine typing mechanism than can be used for implementing
> libraries that provides it to some extent. So people can write
> libraries that statically check that files that are opened also
> are closed. Or that you follow specific patterns when setting up
> hardware (register protocols or something like that).

What does this have to do with `shared`, and why is the DIP at odds with this?

> If meta-programming and library-authoring is D2's main selling
> point then that ought to be front and center.

I agree. I wrote that borrow-inspired guarded access type which I lank above.

> >> Oh. People do it. Sure, it is hard in the most general case,
> >> but you do have languages with high level concurrency
> >> constructs that provide type systems that makes it possible to
> >> write provably correct concurrent programs.
> >
> > That's not so much language at that point, it becomes blessed
> > library,
> > or a 'framework' even.
>
> No, you cannot run a solver from a library. Well, I guess you
> could, but that is not how it is usually done.

If this relates to the ideas you mentioned at the top; it's dependent
on threads with rich statically-associated data.
D could do this sort of thing in library; we have user-defined
attributes, and heavy-weight thread definition objects could carry
that around and be consumed by solvers.

In some ways it's a bit like Andrei's Big-O library, which carries
metadata around with definitions and used for static analysis.

> > The way you initially described the premise was something along
> > the lines of "the language knows the *state* of interacting
> > threads and does magic thread-safe stuff", which I'm skeptical
> > exists anywhere without a very elaborate framework to manage
> > any such orchestration.
>
> Not sure what you mean by elaborate framework. You need a
> language for expressing it. Then you need a solver for solving
> the constraints (analyzing/checking the model).

This is exactly what I would call a framework library. An elaborated
and opinionated implementation of a particular model.

> > I think now that you might be talking about ownership and
> > message passing...
>
> It could be, but I am talking about formal verification of
> concurrency models. Or languages that are designed in such a way
> that they translates into such models that can be verified.
>
> What you are talking about might be the other direction where you
> have languages that are designed in such a way that some of the
> issues provably cannot arise (e.g. Pony, maybe Go).
>
> But that is not what I am referring to. I am referring to
> inference using a solver.

Your ideas rely on intensive attribution or extremely sophisticated
introspection. We can easily do such attribution in D, but not
introspection of code, and where to be practical, like you say, the
low-level set of operations would need to be radically simplified into
a small and reasonable set.
You're not talking about D at that point.

> > Incidentally, my code that I linked to back up this thread is
> > effectively an implementation of a rust-like shared object
> > where it maintains a 'borrow' ref count, and guards access
> > safely... it's easy to write those tools in D.
>
> Alright, but if you had access to more advanced typing machinery
> in D then you could go further down that road yourself. Maybe not
> to the extent that it could keep up with Rust, but closer.

I'm not sure what rust offers to my little lib up there that I can't
do; I mean, rust basically does exactly that, in basically the same
way.
The only advantage of rust is that it has some language to track
lifetime knowledge through function calls which can be used for escape
analysis of the scoped lease object.

> > if a _language_ did that, it almost certainly
> > comes at a high cost, and it's almost certainly not what I want
> > it to
> > do... but I'm willing to be surprised.
>
> I don't think you want to do it. You'd have to change how you
> approach it, and it is not really needed for more mundane tasks.

Is it needed for *any* task? I can't think of a more complex
interactive environment in software than simulation/video games.
Nowhere else are so many completely different distinct solutions all
interacting in an asynchronous (and realtime) environment.
Maybe you want a proof-able solution that guarantees progress, and
willing to sacrifice efficiency for it... but in that world, why do
you need threads at all? If such an environment exists, I think it's
very niche.

> >> That's right. So those benefits are not avilable... which was
> >> my point. :-)
> >
> > I can't predict what you're talking about, and you haven't
> > shown anything substantial. I suspect if you show what you
> > mean, I may find the term 'benefits' to be problematic ;)
>
> What do you mean by substantial?

By 'substantial' I mean 'at all'. You didn't reference examples or
specifics, or describe your vision until now.

> The benefits is that you prove that the code follows the model,
> which in turn can be fed into the optimizer and assumed to always
> hold.
>
> It is not realistic that D will ever go there.
>
> It is more realistic that D would expand the type system or
> meta-programming mechanisms in ways that are useful to library
> authors and that they could create concurrent frameworks.

Right. That's exactly what we're doing. I'm glad you agree :)



More information about the Digitalmars-d mailing list