Smart pointers instead of GC?

Sat Feb 1 18:12:59 PST 2014

On Sat, 01 Feb 2014 17:58:01 -0800, Manu <turkeyman at gmail.com> wrote:

> On 1 February 2014 22:04, JR <zorael at gmail.com> wrote:
>
>> On Saturday, 1 February 2014 at 05:36:44 UTC, Manu wrote:
>>
>>> I write realtime and memory-constrained software (console games), and  
>>> for
>>> me, I think the biggest issue that can never be solved is the
>>> non-deterministic nature of the collect cycles, and the unknowable  
>>> memory
>>> footprint of the application. You can't make any guarantees or  
>>> predictions
>>> about the GC, which is fundamentally incompatible with realtime  
>>> software.
>>>
>> (tried to manually fix ugly linebreaks here, so apologies if it turns  
>> out
>> even worse.)
>>
>> (Maybe this would be better posted in D.learn; if so I'll crosspost.)
>>
>> In your opinion, of how much value would deadlining be? As in, "okay
>> handyman, you may sweep the floor now BUT ONLY FOR 6 MILLISECONDS;
>> whatever's left after that you'll have to take care of next time, your
>> pride as a professional Sweeper be damned"?
>>
>
> This has been my only suggestion for years. Although 6ms is way too much
> (almost half a frame), 500us is more realistic (a little over 3% of a
> frame, still quite a lot of time).
>
> It obviously doesn't address memory footprint, but you would get the
>> illusion of determinism in cases similar to where race-to-idle  
>> approaches
>> work. Inarguably, this wouldn't apply if the goal is to render as many
>> frames per second as possible, such as for non-console shooters where
>> tearing is not a concern but latency is very much so.
>>
>
> If it were running a small amount every frame, maybe the memory footprint
> would stay within a predictable level... don't know. I'd be interested in
> experimenting with this, but as far as I know, nobody knows how to do it.
>
> I'm very much a layman in this field, but I'm trying to soak up as much
>> knowledge as possible, and most of it from threads like these. To my
>> uneducated eyes, an ARC collector does seem like the near-ideal  
>> solution --
>> assuming, as always, the code is written with the GC in mind. But am I
>> right in gathering that it solves two-thirds of the problem? You don't  
>> need
>> to scan the managed heap, but when memory is actually freed is still
>> non-deterministic and may incur pauses, though not necessarily a
>> program-wide stop. Aye?
>>
>
> I'm not sure what you mean. If you mean pauses because it's simply doing
> work to free things, then that's deterministically issued workload just
> like anything else and can (should) be budgeted. It doesn't disappear in  
> a
> GC context, it just happens in bursts at unpredictable times.
> Like you say, it doesn't stop the world, no need for that. Additionally,  
> it
> leaves room for case-by-case flexibility in destruction approach. If
> something takes some time to destroy, and you're okay to have it hang
> around for a while, you can easily plonk it in a dead list waiting for  
> some
> idle time to work through it; thus truly allows the opportunity to use  
> idle
> time to clean stuff up since you don't need to scan the heap every time,  
> a
> list of things waiting to be destroyed is already known.
> You could also add dead objects to a list which can be processed by  
> another
> thread (without stopping all threads).
> There is far more opportunity to implement the destruction pattern that
> suits your application, and for those not interested in implementing  
> their
> own destruction patterns, it would be easy to offer a library function
> setDestructionPattern(patternType); which would configure a particular
> policy application-wide, and they'd never have to think about it again.
>
> At the same time, Lucarella's dconf slides were very, very attractive. I
>> gather that allocations themselves may become slower with a concurrent
>> collector, but collection times in essence become non-issues.  
>> Technically
>> parallelism doesn't equate to free CPU time; but that it more or less  
>> *is*
>> assuming there is a cores/thread to spare. Wrong?
>>
>
> IIRC, it assumes a particular operating system, and required significant
> resource overhead. I recall being impressed, but concluding that it  
> wasn't
> a solution that could be merged into D across the board.
> I'd need to go back and revisit the slides...
>
> Lastly, am I right in understanding precise collectors as identical to  
> the
>> stop-the-world collector we currently have, but with a smarter  
>> allocation
>> scheme resulting in a smaller managed heap to scan? With the additional
>> bonus of less false pointers. If so, this would seem like a good
>> improvement to the current implementation, with the next increment in  
>> the
>> same direction being a generational gc.
>>
>
> I think the reduction in heap to scan is due to only following what are
> known to be pointers, but I don't think that reduces the scan volume, the
> memory footprint of your app is the same (more because there's new
> metadata), but it wouldn't chase false pointers anymore.
> I recall most agreed that it was a good improvement (although it
> benchmarked something like 5% slower), I would appreciate it simply for  
> the
> improved precision... but it doesn't solve the problem in any way. The
> practical application for me is that it wouldn't leak (much) memory.
>
> I would *dearly* love to have concurrency in whatever we end up with,
>> though. For a multi-core personal computer threads are free lunches, or
>> close enough so. Concurrentgate and all that jazz.
>>
>
> The kicker for me with the whole GC thing, is that as long as I've been
> talking about it around here (among a whole bunch of experts), I've been
> saying that an acceptable GC would require to 1. not stop the world, 2.
> support incremental collection/time-slicing, so I can budget it a maximum
> amount of time per frame.
> I'm yet to hear anyone suggest how they can even IMAGINE writing a
> collector like that. As far as I can tell, the consensus is, that it's
> impossible.
> Certainly, nobody has ever given any suggestions how it could possibly be
> done in the future, or steps in that direction, which says a lot to me.
> I've abandoned the idea as unrealistic fantasy.

It is impossible with a conservative GC such as the current D  
implementation. Boehm tried with his C++ collector but could never get  
beyond the basic Mark-LazySweep because C++ enforces a conservative  
mindset.

When you have precision a number of options open up. Firstly, you can now  
parallelize all phases because pointers are precisely known. And second,  
you can start to do things like moving (a prereq for generational  
collection) because you don't have to pin things that you conservatively  
think are pointers. Now that you have a generational collector, it's  
trivial to do an incremental collection on one or more generations.

So therefore precision is the first step, and more to the point it's  
already been done in D, VisualD uses a precise version of the stock D GC  
and special build of the compiler with precise information emitted at  
compile time. That was what Rainer's talk was about at DConf last year.  
Getting this merged into the mainline will mean that there is finally  
enough information for us to go nuts building an actually useful GC in D,  
but until then you really can't progress beyond what we have right now.

It's quite possible, it's just a matter of getting the worked vetted at  
merged.

-- 
Adam Wilson
GitHub/IRC: LightBender
Aurora Project Coordinator