Proposal for design of 'scope' (Was: Re: Opportunities for D)

Sat Jul 12 11:01:36 PDT 2014

On Friday, 11 July 2014 at 21:04:05 UTC, H. S. Teoh via 
Digitalmars-d wrote:
> On Thu, Jul 10, 2014 at 08:10:36PM +0000, via Digitalmars-d 
> wrote:
> Hmm. Seems that you're addressing a somewhat wider scope than 
> what I had
> in mind. I was thinking mainly of 'scope' as "does not escape 
> the body
> of this block", but you're talking about a more general case of 
> being
> able to specify explicit lifetimes.
>

Indeed, but it includes what you're suggesting. For most use 
cases, just `scope` without an explicit lifetime annotation is 
fully sufficient.

> [...]
>> A problem that has been discussed in a few places is safely 
>> returning
>> a slice or a reference to an input parameter. This can be 
>> solved
>> nicely:
>> 
>>     scope!haystack(string) findSubstring(
>>         scope string haystack,
>>         scope string needle
>>     );
>> 
>> Inside `findSubstring`, the compiler can make sure that no 
>> references
>> to `haystack` or `needle` can be escape (an unqualified 
>> `scope` can be
>> used here, no need to specify an "owner"), but it will allow 
>> returning
>> a slice from it, because the signature says: "The return value 
>> will
>> not live longer than the parameter `haystack`."
>
> This does seem to be quite a compelling argument for explicit 
> scopes. It
> does make it more complex to implement, though.
>
>
> [...]
>> An interesting application is the old `byLine` problem, where 
>> the
>> function keeps an internal buffer which is reused for every 
>> line that
>> is read, but a slice into it is returned. When a user naively 
>> stores
>> these slices in an array, she will find that all of them have 
>> the same
>> content, because they point to the same buffer. See how this is
>> avoided with `scope!(const ...)`:
>
> This seems to be something else now. I'll have to think about 
> this a bit
> more, but my preliminary thought is that this adds yet another 
> level of
> complexity to 'scope', which is not necessarily a bad thing, 
> but we
> might want to start out with something simpler first.

It's definitely an extension and not as urgently necessary, 
although it fits well into the general topic of borrowing: 
`scope` by itself provides mutable borrowing, but `scope!(const 
...)` provides const borrowing, in the sense that another object 
temporarily takes ownership of the value, so that the original 
owner can only read the object until it is "returned" by the 
borrowed value going out of scope. I mentioned it here because it 
seemed to be an easy extension that could solve an interesting 
long-standing problem for which we only have workarounds today 
(`byLineCopy` IIRC).

And I have to add that it's not completely thought out yet. For 
example, might it make sense to have `scope!(immutable ...)`, 
`scope!(shared ...)`, and if yes, what would they mean...

>
>
> [...]
>> An open question is whether there needs to be an explicit 
>> designation
>> of GC'd values (for example by `scope!static` or `scope!GC`), 
>> to say
>> that a given values lives as long as it's needed (or 
>> "forever").
>
> Shouldn't unqualified values already serve this purpose?
>
>

Likely yes. It might however be useful to contemplate, especially 
with regards to allocators.

> [...]
>> Now, for the problems:
>> 
>> Obviously, there is quite a bit of complexity involved. I can 
>> imagine
>> that inferring the scope for templates (which is essential, 
>> just as
>> for const and the other type modifiers) can be complicated.
>
> I'm thinking of aiming for a design where the compiler can 
> infer all
> lifetimes automatically, and the user doesn't have to. I'm not 
> sure if
> this is possible, but based on what Walter said, it would be 
> best if we
> infer as much as possible, since users are lazy and are 
> unlikely to be
> thrilled at the idea of having to write additional annotations 
> on their
> types.

I agree. It's already getting ugly with `const pure nothrow @safe 
@nogc`, adding another annotation should not be done 
lightheartedly. However, if the compiler could infer all the 
lifetimes (which I'm quite sure isn't possible, see the 
haystack-needle example), I don't see why we'd need `scope` at 
all. It would at most be a way not to break backward 
compatibility, but that would be another case where you could say 
that D has it backwards, like un- at safe by default...

>
> My original proposal was aimed at this, that's why I didn't put 
> in
> explicit lifetimes. I was hoping to find a way to define things 
> such
> that the lifetime is unambiguous from the context in which 
> 'scope' is
> used, so that users don't ever have to write anything more than 
> that.
> This also makes the compiler's life easier, since we don't have 
> to keep
> track of who owns what, and can just compute the lifetime from 
> the
> surrounding context. This may require sacrificing some 
> precision in
> lifetimes, but if it helps simplify things while still giving 
> adequate
> functionality, I think it's a good compromise.

I agree it looks a bit intimidating at first glance, but as far 
as I can tell it should be relatively straightforward to 
implement. I'll explain how I think it could be done:

The obvious things: The parser needs to recognize the new syntax, 
and scope needs to be turned into a type modifier and stored in 
the internal data structures accordingly.

It is then possible to define a hierarchy of lifetimes. At the 
top are global and static variables and the GC heap 
(`scope!static` or just unannotated), then the come function 
parameters, then local variables in function bodies, and finally 
local variables in lower scopes like `if` blocks. This is purely 
based on lexical scope and order of declaration (local variables 
are destroyed in inverse order of construction, for example); it 
can be derived from the AST. Furthermore, it is a strict 
hierarchy; lifetimes higher in the hierarchy are strict super 
sets of lower lifetimes.

A variables effective lifetime is then its place in this 
hierarchy, or the lifetime of its owner if one is specified.

Once that's done, the semantic phase needs to be extended to 
check for scope correctness. This seems complicated, but actually 
needs to touch only a few places. Any time a scope value is 
copied, by assignment, returning from a function, passing to a 
function, throwing, and what else I may have missed, the compiler 
needs to check that the destination's effective lifetime is not 
wider than that of the source.

For function calls, an additional step is necessary, but it isn't 
really complicated either. Let's take `findSubstring` as an 
example:

     scope!haystack(string) findSubstring(
         scope string haystack,
         scope string needle
     );

     void foo() {
         string[$] h = "Hello, world!";
         auto found = findSubstring(h, ", ");
         // `typeof(found)` is now `scope!h`
     }

As owners in function signatures may refer to other parameters 
(or `this`), the compiler needs to match up these parameters with 
what is passed in, and substitute them accordingly for type 
deduction (only for `auto` return values).

And that's it, AFAICS. Notice that none of this requires flow 
control analysis or inter-procedural things, it can all be 
decided locally at the place of assignment/calling/etc.

>
>
> [...]
>> I also have a few ideas about owned types and move semantics, 
>> but this
>> is mostly independent from borrowing (although, of course, it
>> integrates nicely with it). So, that's it, for now. Sorry for 
>> the long
>> text. Thoughts?
>
> It seems that you're the full borrowed reference/pointer 
> problem, which
> is something necessary. But I was thinking more in terms of the 
> baseline
> functionality -- what is the simplest design for 'scope' that 
> still
> gives useful semantics that covers most of the cases? I know 
> there are
> some tricky corner cases, but I'm wondering if we can somehow 
> find an
> easy solution for the easy parts (presumably the more common 
> parts),
> while still allowing for a way to deal with the hard parts.
>
> At least for now, I'm thinking in the direction of finding 
> something
> with simple semantics that, at the same time, produces complex
> (interesting) effects when composed, that we can use to solve 
> the
> borrowed pointer problem.

I already wrote this in a reply to Walter. I believe in some 
cases we can allow automatic borrowing without any annotation at 
all, not even bare `scope`. The most obvious examples are pure 
functions with signatures that guarantee that nothing can be 
escaped from them:

     void foo(int[] p) pure;    // obvious, function has no 
opportunity
                                // to keep a reference to `p`
     int bar(int[] p) pure;     // returns an `int` but that's a 
value
                                // type, and that's ok
     int[] baz(const(int)[] p) pure;
                                // the return type is not `const` 
and thus
                                // cannot come from `p`

Maybe there are some cases with non-pure functions, too. But on 
the other hand, I also think that in the end we won't get around 
introducing explicit annotations, because the above rules can 
never cover enough cases to disregard the remaining ones.

Anyway, I don't believe that explicit annotations will be needed 
often enough to turn the users away. It will be mostly library 
writers who have to use them, and Phobos can set a good example 
there and work out a good style, just as it has done for other 
matters.

It also helps to take a glance at Rust's standard library, to see 
how frequent or infrequent lifetime annotations will be. They 
keep popping up here and there, but they are not littered all 
over the source code. They're frequent enough to confirm my 
suspicion that they cannot be disregarded, but they're also 
infrequent enough not to be an annoyance. (I only looked at a few 
modules, though.)