DIP69 - Implement scope for escape proof references

Sat Dec 13 16:44:55 PST 2014

On 13 December 2014 at 15:11, Walter Bright via Digitalmars-d
<digitalmars-d at puremagic.com> wrote:
> On 12/12/2014 6:55 PM, Manu via Digitalmars-d wrote:
>>
>> I did just give some examples, I'll repeat; auto ref fails when the
>> function is extern.
>
>
> Don't make it extern, then.

Do you think I just interact with other languages for fun or something?

>> It is also wrong that when I pass an int or float, it is passed by ref
>> instead of by value... that is never what I want!
>
>
> If there's source to the function, it'll often be inlined which will remove
> the indirection.

Not if it's extern (applicable to ref, not auto-ref), or wrapped, or
if I ever want to capture a function pointer.
It's also semantically different; I can change the caller's value.

>> What do you get when you take a pointer of a function with an auto-ref
>> arg? ...an error, because it's not actually a function!
>> So in a context where I'm dealing with functions and function pointers
>> (very, very frequent), I suddenly have something that's not a function
>> find it's way into my meta and I have to special-case the hell out of
>> it.
>
>
> Why are function pointers and ints going to the same argument of a function?
> I thought you weren't using templates?

I'm confused. I'm talking about capturing function pointers. Not about
function arguments.
Capturing pointers of functions with auto-ref args (aka, template
functions) is a serious nuisance, and impossible outside the site of
instantiation.
You need information that's only available at the point of
instantiation. That changes the landscape quite a bit.

>> The solution in this case is to wrap it in a non-auto-ref shim with
>> the ref-ness explicitly stated in the way I expect... which is in
>> itself another problem, because 'ref' is not part of the type! So the
>> annoying de-auto-ref-ing shim must be a text mixin with some very
>> complex, practically unreadable, definitely unmaintainable logic
>> surrounding it. It's insanity on top of insanity!
>>
>> I also run the invisible risk of generating 2^num_args instances of
>> the code for functions containing auto-ref args.
>
>
> I wonder what is the need for the code that you are writing.

General function adaptation. There are lots of reasons to wrap
functions. Adaptation to existing or external API's is the most common
in my experience.

Functions are what programs are! I really struggle to understand why
you have such trouble sympathising with my view here. Languages
generate code, which is packaged into blocks we call 'functions'...
and they require to adhere to strict ABI requirements. That is
programming in a nutshell. Surely it's reasonable to want to retain
complete control over this most primitive and basic of tasks.
One important aspect of that control is where the code is generated;
is it 'here', or 'there'? And that's a critical distinction between
functions and templates.

Look at LuaD. Lots of awkward cases have emerged there. There are many
more to come too; the known bug list are mostly nasty issues of this
nature that I've been putting off.
I can't offer any insight into my commercial code sadly, and I no
longer have access to it :/

Situations like this appear frequently:
https://github.com/JakobOvrum/LuaD/pull/76/files#diff-bcb370a5bc6fe75a9d5c04f2e1c17eb0R178

And something like this tends to appear as one aspect of the solution:
https://github.com/JakobOvrum/LuaD/pull/76/files#diff-bcb370a5bc6fe75a9d5c04f2e1c17eb0R68
'struct Ref(T)' leads to its own problems though, in that it's a
localised concept. No external code anywhere understands it as a
'ref', so if 3rd party code has any special treatment for ref, that
code now fails.

Then more magic like this:
https://github.com/JakobOvrum/LuaD/pull/76/files#diff-ec8c532aeca798240de4d70ee639fc16R90
Since we need to recognise 'ref' and substitute it for our magic
'struct Ref(T)'.

I've probably spent more hours wrangling D meta of this sort than most
people. This sort of thing always happens.
If Ref!T is the tool that resolves the situation, then it's clear
demonstration that ref should be part of the type.

>> When I'm writing a function that's not a template, I intend, and
>> expect, to write a function that's _not a template_.
>> Templates and functions are different things. I think it's a massive
>> mistake to have created a way to write a template that looks nothing
>> like a template.
>
>
> A function template is a function that takes both compile-time args and
> run-time args. C++ tried to make them completely different animals, when
> they are not. Don't think "template", think "compile-time argument to a
> function". I convinced Andrei to never mention "template" in his D book, as
> it brings forth all kinds of connotations and expectations that impede
> understanding what D templates actually are. I think the result was
> successful.

You can spin it however you like, but it's exactly the same thing.
They are completely different animals. A function template is not a
function at all until it's instantiated. When it's instantiated, it
becomes a function, or even, one of a suite of functions. And it's not
known to the author where the function is.
(Note: I define 'function' in this context to mean 'some code that is emitted')

This one:many relationship between definition and code creates some
sorts of problems that aren't present with functions, which relate 1:1
with their codegen.
It is possible to take a pointer to a function. It is not possible to
take a pointer to a template, unless your code exists at the callsite
where the parameters for instantiation are known.
This makes certain things more complicated.

I reject that 'a template is a function with compile time args'.
Template functions have rather different characteristics, which result
in special consideration, and some restrictions on usage.
It's a cute idea, and something that might sound nice in a book... but
it's not the reality.
'functions' are a much simpler, more fundamental concept, and also one
that is easily portable between languages.

>> auto-ref is not, and never has been a tool I have wanted. I don't have
>> any use for auto-ref, and it has only proven to make an already severe
>> problem worse. I've tried to use it before in some instances, but it
>> made ref's out of int's and floats, so even in the rare potentially
>> useful cases, I had to reject it.
>
>
> If it's a rare useful case, why is it a pervasive problem?

I never argued that auto-ref wasn't rare, but it was *a mistake*. What
it did was cement an already very shaky language feature into the
foundation.
Once there is layers of language built on top of something, it becomes
that much harder to refactor at some later time.

I'm not saying auto-ref is a pervasive problem, I'm saying (over and
over again) that *ref* is a pervasive problem, and auto-ref turns out
to be a further (fairly rare) nuisance. It can't be presented as a
solution to any problem that I have (how we got onto this tangent),
because it's not.
I also don't have any idea why it exists! What's it for?
You love to make people justify exactly what things are for; you're
doing it to me in almost every paragraph.

In my case here though, I'm not asking you to invent a language
feature like auto-ref, I'm just commenting on the usefulness of an
existing feature. Trouble is, my cases are really hard to define and
highly context specific.
I'm sharing my anecdotal experience that D's ref is an awkward pain in
the arse at best, and by my judgement, a failed experiment.

>> At the time, you introduced auto-ref as a 'solution' to an issue that
>> I was the primary complainant (although the solution was mainly pushed
>> by Andrei iirc?). I absolutely rejected it then, and said it would
>> have the disastrous effect of setting ref in stone, which I was
>> already very critical about. I reject it even more now that I have had
>> some experience with it.
>> Who was the customer that you designed it for? (It wasn't me!)
>> What issue was it designed to address?
>
>
> I still have no idea what code you are developing that you need to send ints
> and function pointers to the same argument of a function template, yet you
> don't use function templates. Nor do I understand what pattern you need that
> simply must mix up ref and value parameters, and why that pattern appears
> pervasively in your code.

I was talking about taking function pointers of functions. Not passing
function pointers to functions.
The auto-ref detail was on a slight tangent; explaining that it's not
useful to me to solve the problem of dealing with ref-ness of incoming
functions args or results.

I think I said that clearly: "What do you get when you take a pointer
of a function with an auto-ref
arg? ...an error, because it's not actually a function!"

WRT to mixing ref and value parameters; I'm wrapping/adapting existing
api's. I have no control over the code/api that exists. If that api
uses ref, for whatever reason that it makes sense to do so, I need to
handle the case.
There's no 'pattern', only a mechanical process.

>>> struct Tree {
>>>     RefCount!(Tree*) left;
>>>     RefCount!(Tree*) right;
>>>     ...
>>> }
>>
>>
>> ... I don't think I'd ever have a use for this code.
>
>
> You have no use for tree structures?

I don't have use for a tree where every node is ref counted... tree
nodes should only have one reference; their parent.

>> I've been using trees for a long time, and I can't imagine a situation
>> where each node in a tree would want to be ref counted.
>
>
> You have more than one parent of a node. You never write data structures
> like that? dmd uses such pervasively (Type and Expression).

That's not a tree, that's a graph.
I made a comment on such graph structures: "Perhaps in some complex
graph structure... but I expect that sort of thing would be far more
specialised."
And I still think that.

I'm not going to rule out that an RC graph node might be useful, but I
was responding to your example, and that wasn't your example.

>> It sounds like a disaster for performance, and I can't imagine any
>> advantage it would offer?
>
>
> How do you propose to manage the memory explicitly otherwise? Use a GC? :-)

Tree elements only have one reference; their parent. Management of
trees is very simple.

Graph's may be a little more complex, and I can see that an RC node
might be useful... but I've been using various graph structures for a
long time, and never wanted to RC the nodes.
I expect traditional patterns would remain. I don't think an RC is
required to know if a graph node is still referenced within the graph;
graph nodes typically have bi-directional linkage, so when there are
no links to the node, then it is unreferenced, and can easily be
destroyed/returned to a pool.
I think an RC mechanism is effectively implicit in graph structures I
can imagine.

The cases where I use RC are for general 'resources'. In my line of
work, this is textures, mesh, renderstate, sounds, etc... things that
find themselves being shared around arbitrarily.
It is important that tree/graph nodes may contain a ref-counted thing,
but I have trouble visualising how that affects 'scope' in this case.

>> I can see common cases where nodes may contain a refcounted object as
>> node data, but that's different than the situation you demonstrate
>> here.
>
>
> That would fail if scope were transitive.

I'm not sure I see why...
The situation is: a whole transitive scope graph is received as scope,
how does that inhibit my accessing a resource contained in the graph?
It inhibits my _escaping_ said resource... as it should. Right?

The problem you're addressing is that, if the graph isn't received as
scope (ie, I want the ability to escape it's internals), then RC
doesn't work efficiently.
My reaction is that RC optimisation is perhaps not strongly associated
with 'scope'. scope may offer a mechanism by which it can work some of
the time, but it's not precisely the same thing.

I kinda feel like you're trying to make scope into RC optimisation,
rather than RC optimisation into scope...?

I was once arguing for: int^ rcInt;
I think I'm going back in that direction. That's what other languages
do. And scope would create additional opportunity here; it would allow
implicit casting of T^ -> T*.

>> But there are more things than pointers which need to be protected
>> against escaping their scope, in particular, things that contain
>> pointers...
>
>
> Solve it in the general case (this proposal) and RC now works.

Perhaps scope and RC are different things?
Solve scope purely for RC, and you've broken scope for other purposes
that it's useful; that is, inhibiting escaping of data.

>> Maybe there's a compromise. If we say scope isn't 'transitive', but it
>> is transitive when it comes to aggregates?
>
>
> One thing I've tried very hard to make work in D is have basic types /
> aggregages / arrays be interchangeable.

I'm looking for ways to make scope fit your proposed mould and be useful to me.
If I can't do this then my most common use case is unsatisfied:

struct Wrap
{
  OpaqueThing *ptr;

  this() {}
  ~this() {}  // <- constructors that twiddle RC effectively.
  this(this) {}

  this() scope {}
  ~this() scope {}  // <- Overloadable for 'scope', in which case,
don't twiddle the RC!
  this(this) scope {}

  // lots of operations on OpaqueThing wrapped up here...
}

void f(scope Wrap x) <-- RC twiddling is effectively elided here due
to constructor overloads
{
}

If you can propose an alternative solution?
This is representative of my 99% case. Sometimes the struct is more
than a single pointer though.

>> Ie, you can apply scope to a by-val struct, and all aggregate members
>> have scope applied.
>> It is not transitive in the sense that it does not follow through
>> pointer members, but any such pointer members may not escape as if it
>> were a pointer argument alone.
>>
>> This is unlike any other behaviour in D, but without that, I think
>> this proposal fails to suit my most common use cases.
>
>
> I have no idea what your use cases are, but when you're explicitly managing
> memory, every damn pointer needs to be carefully evaluated as to what
> exactly it's pointing two and who owns it. Scope is not a magic solution to
> this, neither is shared_ptr, neither are Rust's annotations.
>
> The only scheme that absolves the user of having to deal with this is - GC.

scope just needs to say "I will not let this memory escape". Then it
finally allows us to safely put data on the stack, something we can't
do in D today. I don't think scope should aim to do anything more than
that.

If scope can be useful to RC, that's great! But from this, it's
starting to look to me like scope might be partially useful to RC, but
scope and RC aren't exactly the same thing.
RC seems to require something like head-scope in order to not place
awkward restrictions on usage of RC objects. That said, full-scope
isn't without use cases; it allows us to safely use the stack,
potentially eliminating much GC load.