Second Draft: Coroutines

Sat Jan 25 18:14:09 UTC 2025

On 26/01/2025 5:44 AM, Mai Lapyst wrote:
> On Saturday, 25 January 2025 at 13:41:24 UTC, Richard (Rikki) Andrew 
> Cattermole wrote:
>> The ``await`` keyword has been used for multithreading longer than 
>> I've been alive. To mean what it does.
>> Its also very uncommon and does not see usage in druntime/phobos.
> 
> So "preventing breaking" is only reserved for phobos then, and any user- 
> written code is fine to break at every moment. I find that a very 
> problematic way when implementing / enhancing a language. "Dont break 
> userspace" comes to mind; we should first and foremost be concerned with 
> users interacting with the feature (which you seem to be concerned with 
> as well), and as such I would'nt want to break all existing asyncronous 
> libraries out there when the new edition rolls around. This makes dlang 
> seem even more broken and "too niche" for people to use as any async 
> library up to this point used in examples, tutorials etc will horrobly 
> break.

The ``await`` statement only works in a coroutine, it should not break 
anything.

Its entirely new code that it applies to.

Old code that uses that identifier won't be compatible with the new 
eventloop anyway, and probably won't be desirable to call (i.e. blocking 
where you don't want it to block).

We have strict rules these days on breaking code, which is to not do it.
The breaking changes and deprecations section reflects this.

I have no intention on breaking anything in this proposal as it isn't 
needed.

>> As it has no meaning outside of a coroutine, it'll be easy to handle I 
>> think.
> 
> Then the DIP should specify it. Either the tokens `await` becomes an 
> hard-keyword, disallowing any identifier usage of it, or it becomes a 
> soft one, where it only acts as a keyword in `@async` contexts and like 
> an normal identifier outside of it. You even link C#'s definition of it 
> that has the (somewhat) exact wording needed for it:
> ```
> Inside an async function, await shall not be used as an 
> available_identifier although the verbatim identifier @await may be 
> used. There is therefore no syntactic ambiguity between 
> await_expressions and various expressions involving identifiers. Outside 
> of async functions, await acts as a normal identifier.
> ```

It depends.

If we get editions, then it can be a keyword in a new edition, but not 
in an old one.

If we don't get it, it can be a soft keyword where it only applies in 
context of a coroutine.

Whatever is picked, it will be tuned towards "non-breaking".

>> Stuff like this is why I added the multiple returns support, even 
>> though I do not believe it is needed.
> 
> Which multiple return support? The DIP states clearly that it is **NOT** 
> supported.

For C#, not the proposed feature.

Adding "This is a return that does not complete the coroutine, to enable 
multiple value returns." to make it very explicit that this is what it 
is offering.

>> Its also a good example of why the language does not define the 
>> library, so you have the freedom to do this stuff!
> 
> Yes, but honestly you do the same: your dependency system define how 
> libraries need to interact with coroutines, the same way waker does. I 
> dont want to argue that wakers dont define a library usage as well, but 
> dependencies to so as well.

This isn't what I am meaning.

The DIP only defines the language transformation, you are responsible 
for how it gets called, and what can be waited upon ext.

I.e. if you don't support ``await`` statements, you can static assert 
out if they are used.

```d
__generatedName generatedFromCompilerStateStruct = ...;
...
static assert(co.WaitingON.__tags.length == 1);
```

Or something akin to it.

It could return a waker, socket or anything else. You control what can 
be waited upon. The language isn't filtering it.

>> It is not part of the DIP. Without the operator overload example, it 
>> wouldn't be understood.
> 
> Then do not put it into the DIP. It should **only** contain your design 
> and whats possible with it, without having to rely on possible future 
> DIP's to add some operators to make your DIP actually work.

The operator overload ``opConstructCo`` is part of the DIP.

Therefore there are examples for it.

But the library types such as ``GenericCoroutine``, 
``InstantiableCoroutine``, and ``Future`` are what isn't part of the DIP 
and they are needed to show how the language feature can be used.

>> The compiler using just the parse tree can see the function 
>> ``opConstructCo`` on the library type ``InstantiableCoroutine``. 
>> Allowing it to flag the type as a instantiable coroutine.
> 
> Again: this description says that the compiler treats `opConstructCo` 
> differently as other functions. What would happen if I want to use 
> another name? What will happen if I have multiple functions with the 
> same signature but different names?

It is an operator overload, like any other.

You use what the language specifies end of.

It has the ``op`` prefix, which is established for use by operator 
overload methods.

>> See above, it can see that it is a coroutine by the parameter, rather 
>> than on the argument.
> 
> So the argument (lambda) would not be a coroutine and could not use 
> `await` or `@async return`? This seems counter-intuitive, as I clearly 
> can see that code as this will exist:
> ```d
> ListenSocket ls = ListenSocket.create((Socket socket) @async {
>      auto line = await socket.readLine();
>      // ...
> });
> ```

Almost got to a good example on this, the ``await`` is a statement not 
an expression.

It'll be easier to transform into the state machine.

```d
ListenSocket ls = ListenSocket.create((Socket socket) @async {
      auto line = socket.readLine();
      await line;
      // ...
});
```

Lambdas if you do not specify types in the parameter lists, are actually 
templates.

It is explicitly required in this case that it'll take the ``@async`` 
attribute from the parameter on ``create`` based upon the parameter type.

Which does imply that we cannot limit ``await`` statements and 
``@async`` returns during parsing. Which shouldn't be a problem due to 
the whitespace. ``await ...;`` not ``await;`` and there are no 
attributes on statements currently (but there are for declarations).

> therefore the function should be anotated to be `async`; espc. bc you 
> say time and time again it should be useable by users without prior 
> knowlage of the insides of the system. Makeing it that functions can 
> only have `await` if they're `@async` but lambdas are whatever they want 
> to be seems like a hughe boobytrap.
> 
>> You don't win a whole lot by requiring it. Especially when they are 
>> templates and they look like they should "just work".
> 
> It makes things clearer for the writer (and future readers), and by 
> extend the compiler as it now certainly knows to slice the lambda as 
> well as this is the intention of the developer.

We infer attributes on templates.

I see no difference here.

Not doing it here, seems like it would create more surprises then not.

>> It was heavily discussed
> 
> Where exactly? Haven't seen it yet sorry. And even then: these should be 
> part of the DIP under a section "non-goals" or "discarded ideas" so 
> people know that a) they where considered and b) what where the 
> considerations that lead to the decision.

This is a trust me, adding such a section is non-helpful.

It ends up derailing things for the D community.

>> See the ``Prime Sieve`` example for one way you can do this.
> 
> I've seen it, but again: it uses undeclared things that aren't as clear 
> as day if your'e **not** the writer of the DIP.
> ```d
> InstantiableCoroutine!(int) ico = \&generate;
> Future!int ch = ico.makeInstance();
> ```

"Given the following _potential_ shell of a library struct that is used 
for the purpose of examples only:"

Added the clarification at the end that it is only used for example, but 
it was stated as part of ``Constructing Library Representation``.

> Why does this work? `generate` is an coroutine, but why can it be "just" 
> assigned to an library shell? Does it "just work"? Thats not how 
> programming works or how standards should be written. I **could** see 
> that you ment that an constructor that takes an template parameter with 
> the `__descriptorco` should be used, but again: it is not stated in the 
> DIP and as such should not be taken as "granted" just bc you expect 
> people to come to the conclusion themself. Look at C++ papers, they are 
> **hughe** for a reason: EVERYTHING gets written down so no confusion can 
> happen.

This is described in ``Constructing Library Representation``.

The relevant lowering is:

```d
// The location of this struct is irrelevant, as long as compile time 
accessible things remain available
struct __generatedName {
}

InstantiableCoroutine!(int, int) co = InstantiableCoroutine!(int, int)
	.opConstructCo!__generatedName;
```

>> The ``await`` statement does two things.
>> 1. It assigns the expression's value into the state variable for 
>> waiting on.
>> 2. It yields.
> 
> Then please for the love of good put it into the DIP! I'm sorry that im 
> so picky about this, but a **specification** (what your DIP is), should 
> contain **every detail of your idea** not only the bits gemini deemed as 
> important. We're humans, and as such we should be espc carefull to give 
> us each other as much information as possible.

Gemini is a test to see how well it could be understood prior to humans 
having to review it. If it cannot pass that, it cannot pass a human.

Hmm, ``Yielding`` does cover the tag side of things, but not the 
variable assignment in the state.

``// If we yield on a coroutine, it'll be stored here``

It was indeed added to the generated state struct, just not at the 
yielding side of it.

Also added to exceptions too.

>> Whereas the other approaches including C++ is still after much reading 
>> not in my mental model.
> 
> I somewhat start to get a graps of yours, while in your model, you try 
> to just "throw" the awaited-on back to anyone interested in it and use 
> an sumtype to do it, other languages define an stricter interface that 
> need to be followed: c++ with awaiters and rust with it`s `Future<>`s 
> and `Waker`s. Both ways prevent splits in the ecosystem or that only one 
> library gets on top while everything else just dies. Thats what I tbh 
> fear with the current approach: there will be one way to use 
> dependencies and thats it. The problems it have will extend to all async 
> code and an outside viewer will declare async in dlang broken without 
> anyone realising thats just the library thats broken. Take dlang's 
> std.regex for example: it's very slow in comparison with others and you 
> easily could roll your own, but nobody does so everybody just assumes 
> it's a "dlang" problem and moves on. While this has only minimal impact 
> bc it's just regex, with an entire language feature that will be 
> presented through the lens of the most used or most "present" library 
> (not popular! big difference), this will make people say "Hey dlangs 
> async is so bad bc. that and that". I want to prevent such a thing.

Talking about regex engines... guess what I've been writing over the 
last two months :) And no, I cannot confirm that it is easy, especially 
with the Unicode stuff.

Other languages define the library stuff and directly tie it into the 
language lowering.

This proposal does not do that. It is purely the transformation.

How you design the library is on the library author, not the language!

One of the lessons we have learned about tieing the language to a 
specific library is it tends to err on the side of not working for everyone.

D classes are a great example of this, forcing you to use the root class 
``Object``, and hit issues with attributes, monitor ext.

I don't intend for us to make the same mistake here, especially on a 
subject where people have such different views on how it should work.

> With an more strict protocol on how things are awaited (c++) or a 
> coroutine can be "retried" / woken up (rust) these problems go away. Any 
> executor can rely on the fact that any io / waiting structure **will** 
> follow protocol, and as such they're interchangeable, which comes to a 
> **big** benefit of user and application code as noone needs to re-invent 
> the whole weel.

So do it that way. Neither I, nor the language will stop you!

> Another benefit is also thag it (somewhat) helps in ensuring that the 
> coroutine is actually in a good state without the executor needing to 
> know about that state itself.
> 
> To help understanding a bit more the two models lets take a look at a 
> "typical" flow of a coroutine:
> - starts coroutine
> - initiate `read_all()` of a file
> - `await`s the `read_all()` and pauses the coroutine
> - gets re-called since the waited on part is now resolved
> - processes data
> 
> In your proposal this works by setting a dependency on the 
> `read_all()`'s returntype. If now the executor simply ignores the 
> dependency, it recalls the coroutine and the coroutine is in a bad 
> state, as it does not validate if the dependency is actually resolved 
> (how would it?). As a result, you would need to put it inside a loop:

Sounds like a bug, if it allows you to ``await`` and not actually 
respect it.

> ```d
> ReadDependency r = ...;
> while (!r.isReady) {
>    await r;
> }
> ```
> Which is boilerplait best avoided.

Agreed.

I do not like this waker design. It seems highly inefficient.

I prefer the dependency design, as you will only be executed if you have 
what you need to make progress.

But if you the library author wants to do it differently, all I can say 
is go for it!

> Secondly the read_all itself: It and the exector would need to agree on 
> an out-of-language protocol on how to actually handle the dependency; 
> this will mostlikely be that an library would expose an interface like 
> `Awaitable` that any dependency would need to implement, but with the 
> downside that any dependent now has an explicit dependency on said 
> library. Sure, maybe over time a standard set of interfaces would araise 
> that the community would adapt, but then we have the API-dependency hell 
> in java just re-invented.

That is correct, the language level transformation that this DIP 
proposes does not deal with this library stuff. The usage in examples is 
just that example code to show it can be utilized.

If I were to propose a specific approach to this, I would have people 
complaining that it doesn't work the way that they want it to and for 
good reason.

My library uses the ``GenericCoroutine`` and ``Future`` to do all of this.

With the help of what I call future completion that is a ``Future`` in 
API, but isn't actually a coroutine. Which is how my socket reads return.

https://github.com/Project-Sidero/eventloop/blob/master/source/sidero/eventloop/coroutine/future_completion.d#L216

> In C++ the `co_await` dictates that the coroutine is blocked as long as 
> the `Awaiter` protocol says it does, since any user **expects** that the 
> `await`ed thing is actually resolved after it's `await`ed. It dosn't 
> mater if successfully or not the keypoint is that it's **not pending** 
> anymore.

Yeah, the way I view it is that a coroutine has to be complete (error, 
have a value ext.), or have a value before continuation to occur 
(multiple return).

But the language transformation isn't responsible for guaranteeing it.
Although I would recommend it.

> In rust it's even simpler: polling is an concept that even kids 
> understand: when you want your parents to give you something, you "poll" 
> until they give it to you or tell you no in a way that keeps you from 
> continuing what you originaly wanted to do. Same thing in rust: a 
> coroutine is "polled" by the exector and can either resolve with the 
> data you expected, or tell you that's it still waiting and to come back 
> later. The compiler ensures that only ever a ready state is allowed to 
> continue the coroutine. If you want to be more performant and not spin- 
> lock in the executor in the hopes that someday the future will resolve, 
> you can give it a waker and say: "hey, if you say you are still not 
> done, I will do other things; if you think you're ready for me to try 
> again, just call this and I will come to you!".

Yes, that is a kind of dependency approach. But it is done by means 
other than how I do it.

The DIP as far as I know (and I've done some minimal exploration in this 
thread), should work for this. Since the language knows nothing about 
how your scheduler works.