<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 4 February 2014 17:50, Adam Wilson <span dir="ltr"><<a href="mailto:flyboynw@gmail.com" target="_blank">flyboynw@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div>On Mon, 03 Feb 2014 22:12:18 -0800, Manu <<a href="mailto:turkeyman@gmail.com" target="_blank">turkeyman@gmail.com</a>> wrote:<br>
</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><br></div><div>
So, the way I see this working in general, is that because in the majority<br>
case, ARC would release memory immediately freeing up memory regularly, an<br>
alloc that would have usually triggered a collect will happen far, far less<br>
often.<br>
Practically, this means that the mark phase, which you say is the longest<br>
phase, would be performed far less often.<br>
<br>
</div></blockquote>
<br>
Well, if you want the ARC memory to share the heap with the GC the ARC memory will need to be tracked and marked by the GC. Otherwise the GC might try to allocate over the top of ARC memory and vice versa. This means that every time you run a collection you're still marking all ARC+GC memory, that will induce a pause. And the GC will still STW-collect on random allocations, and it will still have to Mark all ARC memory to make sure it's still valid. So yes, there will be fewer pauses, but they will still be there.<br>
</blockquote><div><br></div><div>I'm not bothered in the least. At that stage, I will have turned the GC off, and I'll handle weak references myself. The GC crowd are then welcome to go on and continue improving the GC in whatever way they plan to do so.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div>
For me and my kind, I think the typical approach would be to turn off the<br>
backing GC, and rely on marking weak references correctly.<br>
This satisfies my requirements, and I also lose nothing in terms of<br>
facilities in Phobos or other libraries (assuming that those libraries have<br>
also marked weak references correctly, which I expect phobos would<br>
absolutely be required to do).<br>
<br>
This serves both worlds nicely, I retain access to libraries since they use<br>
the same allocator, the GC remains (and is run less often) for those that<br>
want care-free memory management, and for RT/embedded users, they can<br>
*practically* disable the GC, and take responsibility for weak references<br>
themselves, which I'm happy to do.<br>
<br>
<br></div><div>
Going the other way, GC is default with ARC support on the side, is not as<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
troublesome from an implementation standpoint because the GC does not have<br>
to be taught about the ARC memory. This means that ARC memory is free of<br>
being tracked by the GC and the GC has less overall memory to track which<br>
makes collection cycles faster. However, I don't think that the RT/Embedded<br>
guys will like this either, because it means you are still paying for the<br>
GC at some point, and they'll never know for sure if a library they are<br>
using is going to GC-allocate (and collect) when they don't expect it.<br>
<br>
</blockquote>
<br></div><div>
It also means that phobos and other libraries will use the GC because it's<br>
the default. Correct, I don't see this as a valid solution. In fact, I<br>
don't see it as a solution at all.<br>
Where would implicit allocations like strings, concatenations, closures be<br>
allocated?<br>
I might as well just use RefCounted, I don't see this offering anything<br>
much more than that.<br>
<br></div><div>
The only way I can see to make the ARC crowd happy is to completely replace<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
the GC entirely, along with the attendant language changes (new keywords,<br>
etc) that are probably along the lines of Rust. I strongly believe that the<br>
reason we've never seen a GC backed ARC system is because in practice it<br>
doesn't completely solve any of the problems with either system but costs<br>
quite a bit more than either system on it's own.<br>
</blockquote>
<br>
<br></div><div>
Really? [refer to my first paragraph in the reply]<br>
It seems to me like ARC in front of a GC would result in the GC running far<br>
less collect cycles. And the ARC opposition would be absolved of having to<br>
tediously mark weak references. Also, the GC opposition can turn the GC<br>
off, and everything will still work (assuming they take care of their<br>
cycles).<br>
I don't really see the disadvantage here, except that the<br>
only-GC-at-all-costs-I-won't-<u></u>even-consider-ARC crowd would gain a<br>
ref-count, but they would also gain the advantage where the GC would run<br>
less collect cycles. That would probably balance out.<br>
<br>
I'm certainly it would be better than what we have, and in theory, everyone<br>
would be satisfied.<br>
</div></blockquote>
<br>
I'm not convinced. Mostly, because it's not likely going to be good news for the GC crowd. First, now there are two GC algos running unpredictably at different times, so while you *might* experience a perf win in ARC-only mode, we'll probably pay for it in GC-backed ARC mode, because you still have the chance at non-deterministic pause lengths with ARC and you have the GC pauses, and they happen at different times (GC pause on allocate, ARC pause on delete).</blockquote>
<div><br></div><div>I don't understand. How is ARC non-deterministic? It seems entirely deterministic to me. And if you want to, you can defer destruction to idle time if you fancy.</div><div>Sure, the GC may pause from time to time, but you already have that anyway. In this case, it'll run much less often.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> Each individual pause length *might* be shorter, but there is no guarantee of that, but you end up paying more time on the whole than you would otherwise, remembering that with the GC on, the slow part of the collection has to be performed on all memory, not just the GC memory. So yes you might delete a bit less, but you're marking just as much, and you've still got those pesky ARC pauses to deal with.</blockquote>
<div><br></div><div>Again, what ARC pauses? You mean object destruction time? Defer destruction if you find cleaning up on the spot to be expensive. GC will always have to scan all memory that is allocated. The fact that it's scanning ARC memory is precisely the point (catch cycles), and no additional cost in scan load, since that memory would all be GC memory anyway if ARC wasn't present.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> And in basically everything but games you measure time spent on resource management as a portion of CPU cycles over time, not time spent per frame.<br>
</blockquote><div><br></div><div>I suspect that spending less time doing GC scan's will result in a win overall. I have nothing to back that up, but it's a strong suspicion. Object destruction, which you seem to have a problem with under ARC, still happens even with a GC, just at some unknown time. It's not clear to me what the additional ARC cost is (other than the obvious inc and dec)... except it facilitates spending less time doing GC collection, which is probably a significant saving.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
That ref-count you hand-wave can actually cost quite a lot. Implementing ARC properly can have some serious perf implications on pointer-ops and count-ops due to the need to make sure that everything is performed atomically.</blockquote>
<div><br></div><div>I don't think this is true. There's no need to perform the ref fiddling with atomic operations unless it's shared.</div><div>Everyone expects additional costs for synchronisation of shared objects.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> And since this is a compiler thing, you can't say "Don't atomically operate here because I will never do anything that might race." because the compiler has to assume that at some point you will and the compiler cannot detect which mode it needs, or if a library will ruin your day. The only way you could get around this is with yet more syntax hints for the compiler like '@notatomic'.<br>
</blockquote><div><br></div><div>Ummm. I think D makes a clear assumption that if something isn't marked shared, that it doesn't have to compile code to protect against races.</div><div>That's the whole point of making shared an explicit attribute.</div>
<div>What you say is true in C++ which can't distinguish the cases, I don't think it applies in D.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Very quickly ARC starts needing a lot of specialized syntax to make it work efficiently. And that's not good for "Modern Convenience".<br></blockquote><div><br></div><div>Other than a weak attribute, what does it need? I'm not aware of anything else.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
However, you don't have to perform everything atomically with a GC as the collect phase can always be performed concurrently on a separate thread and in most cases,</blockquote><div><br></div><div>You don't have to perform everything atomically in ARC, and the GC is definitely like that now.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"> the mark phase can do stop-the-thread instead of stop-the-world and in some cases, it will never stop anything at all.</blockquote>
<div><br></div><div>If I only have one core?</div><div>ARC doesn't need to mark at all, that cost is effectively distributed among inc/dec ref's, and I'm confident ++ and -- operations performed only on relevant data and only when it's copied is much cheaper than the GC scanning the whole heap, and applying all that logic to determine what things are pointers that it needs to follow and what not.<br>
</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">That can very easily result in pause times that are less than ARC on average. So now I've got a marginal improvement in the speed of ARC over the GC at best, and I still haven't paid for the GC.<br>
</blockquote><div><br></div><div>What pause does the ARC produce?</div><div><br></div><div>Are you paying an ambient cost for the GC? When the ARC is doing it's job, the GC wouldn't run. When too many un-handled cycles add up, the GC might run a scan. If/when the GC does run a scan, it's precisely because you _haven't_ already paid the cost via the ARC; it missed it due to cycle, no cost was paid, no time was lost, it was just collected by the GC instead. I don't see how you can sum the costs of the 2 collection mechanisms in this case.</div>
<div>Either the ARC cleans it up, and the GC doesn't. Or the GC cleans it up because the ARC didn't.</div><div><br></div><div>ARC destruction can easily be deferred, and unlike a mark phase which MUST be executed entirely in one step, it is possible to process deferred ARC object destruction at leisure, using only idle time for instance, and arbitrary lengths of time are easily supported.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
And if we disable the GC to get the speed back</blockquote><div><br></div><div>I still don't follow, we never lost anything, we only moved it.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
we now require that everyone on the team learns the specialized rules and syntax for cyclic-refs. That might be relatively easy for someone coming from C++, but it will be difficult to teach someone coming from C#/Java, which is statistically the more likely person to come to D. And indeed would be more than enough to stop my company moving to D.<br>
</blockquote><div><br></div><div>Don't turn the GC off. Indeed, that would be silly for most applications.</div><div>You need to clarify how moving some collection cost from the GC to ARC makes it more expensive? I still can't see it. As far as I can see, everything the ARC cleans up is burden lifted from the GC, it could only result in the GC running less often, and ARC is not by nature more expensive than GC. I suspect ARC has a lower net cost, since ++/--, only on relevant things, and only when they're copied, is probably a lot less complicated work than a full mark phase, which touches everything, and follows many pointers, mostly unnecessarily. GC mark phase is quite a large workload, increases proportionally to the working data set, and it's also an absolute dcache disaster. No way inc/dec could compare to that workload, particularly as the heap grows large or nears capacity.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
I've seen you say more than once that you can't bond with the GC, and believe me I understand, if you search back through the forums, you'll find one of the first things I did when I got here was complain about the GC. But what you're saying is "I can't bond with this horrible GC so we need to throw it out and rebuild the compiler to support ARC." All I am saying is "I can't bond with the horrible GC, so why don't we make a better one, that doesn't ruin responsiveness, because I've seen it done in other places and there is no technical reason D can't do the same, or better." Now that I've started to read the GC Handbook I am starting to suspect that using D, there might be a way to create a completely pause-less GC. Don't hold me too it, I don't know enough yet, but D has some unique capabilities that might just make it possible.</blockquote>
<div><br></div><div>Well when you know, I'm all ears. Truly, I am. But I can't imagine a GC that will work acceptably in environments such as limited memory, realtime, single core, or various combinations of those.</div>
<div>I also get the feeling you haven't thought through the cost distribution of a GC backed ARC solution. Either that, or I haven't done my math correctly, which I'd be happy to have demonstrated where I went wrong.</div>
</div></div></div>