More radical ideas about gc and reference counting

Sat May 10 07:47:41 PDT 2014

Le 10/05/2014 15:25, Manu via Digitalmars-d a écrit :
> On 10 May 2014 19:07, Xavier Bigand via Digitalmars-d
> <digitalmars-d at puremagic.com> wrote:
>> Le 10/05/2014 01:31, Francesco Cattoglio a écrit :
>>
>>> On Friday, 9 May 2014 at 21:05:18 UTC, Wyatt wrote:
>>>>
>>>> But conversely, Manu, something has been bothering me: aren't you
>>>> restricted from using most libraries anyway, even in C++? "Decent" or
>>>> "acceptable" performance isn't anywhere near "maximum", so shouldn't
>>>> any library code that allocates in any language be equally suspect?
>>>> So from that standpoint, isn't any library you use in any language
>>>> going to _also_ be tuned for performance in the hot path?  Maybe I'm
>>>> barking up the wrong tree, but I don't recall seeing this point
>>>> addressed.
>>>>
>>>> More generally, I feel like we're collectively missing some important
>>>> context:  What are you _doing_ in your 16.6ms timeslice?  I know _I'd_
>>>> appreciate a real example of what you're dealing with without any
>>>> hyperbole.  What actually _must_ be done in that timeframe?  Why must
>>>> collection run inside that window?  What must be collected when it
>>>> runs in that situation?  (Serious questions.)
>>>
>>> I'll try to guess: if you want something running at 60 Frames per
>>> Second, 16.6ms is the time
>>> you have to do everything between frames. This means that in that
>>> timeframe
>>> you have to:
>>> -update your game state.
>>> -possibly process all network I/O.
>>> -prepare the rendering pipeline for the next frame.
>>>
>>> Updating the game state can imply make computations on lots of stuff:
>>> physics, animations, creation and deletion of entities and particles, AI
>>> logic... pick your poison. At every frame you will have an handful of
>>> objects being destroyed and a few resources that might go forgotten. One
>>> frame would probably only need very little objects collected. But given
>>> some times the amount of junk can grow out of control easily. Your code
>>> will end up stuttering at some point (because of random collections at
>>> random times), and this can be really bad.
>>
>>
>> As I know AAA game engine are reputed to do zero allocations during the
>> frame computation, but I think it less the case noways cause of the dynamism
>> of scene and huge environments that are streamed.
>
> Running a game in a zero-alloc environment was a luxury that ended
> about 10 years ago, for reasons that you say.
> Grand Theft Auto proved that you don't need to have loading screens,
> and now it's a basic requirement. We also have a lot more physics,
> environmental destruction, and other dynamic behaviour.
>
> It was also something we achieved in the past with extreme complexity,
> and the source of many (most?) bugs. We still allocated technically,
> but we had to micro-manage every single little detail, mini pools and
> regions for everything.
> Processors are better now, in the low-frequency code, we can afford to
> spend a tiny bit of time using a standard allocation model. The key is
> the growing separation between low-frequency and high-frequency code.
> On a 33mhz Playstation, there wasn't really much difference between
> the worlds. Now there is, and we can afford to allow 'safety' into the
> language at the cost of a little memory management. We just can't have
> that cost include halting threads all the time for lengthy collect
> processes.
> I know this, because we already use RC extensively in games anyway;
> DirectX uses COM, and most resources use manual RC because we need
> things to release eagerly, and for destructors to work properly. I
> don't see how ARC would add any significant cost to the manual RC that
> is basically standard for many years. It would add simplicity and
> safety, which are highly desirable.
>
>
>> I recently fix a performance issue due to a code design that force
>> destruction of walls (I am working on an architecture application) before
>> creating them back when the user move them. gprof, show me that this point
>> took around 30% of CPU time in a frame, and only allocations/destructions
>> was about 5%. This 5% percents contains destructions of object in the
>> architecture part and the 3D engine, same for construction. Construction
>> also add operation like new geometry updload, so I don't think place of new
>> and delete was high without the job made by constructors and destructors.
>> Reserving memory (malloc) isn't really an issue IMO, but operations related
>> to constructions and destructions of objects can be expensive.
>
> It's better to have eager destructors (which empowered you with the
> ability to defer them if you need to), than to not have destructors at
> all, which has been the center if this last few days argument.
> It's not hard to defer resource destruction to a background thread in
> an eager-destruction model.
>
Yep, to fix this performance bottleneck I just took a look of the call 
graph with cost create by gprof, and it reveals to me the easiest points 
need to be modified a cache insertion. I didn't just differs the 
destruction, cause the majority of times walls stay the same with a new 
position, in that case I can directly reuse the same geometry I just 
have to update it. The other case is when a wall is merged to another 
one (when you put 2 rooms one against each other).

One other issue we have, is the usage of containers of std that are 
really slow in debug (from 60fps to 15fps for parts runnings smoothly), 
on Android it's hard to build some part of application with optimization 
and an other one with.