A slice can lose capacity simply by calling a function

Jonathan M Davis via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Mon May 4 10:36:20 PDT 2015


On Monday, 4 May 2015 at 06:23:42 UTC, Ali Çehreli wrote:
> On 05/03/2015 06:06 PM, Jonathan M Davis via 
> Digitalmars-d-learn wrote:

> (I am eagerly waiting for your DConf talk to see how you make 
> sense of it all.)

Well, we'll see how much I'm able to cover about arrays. The 
focus of the talk is on ranges, not arrays, so I don't know if 
talking much about non-range stuff like array capacity is going 
to fit in with everything else that needs to be discussed that 
_is_ specific to ranges. I'd like to discuss it though.

Regardless, I keep meaning to write an article on ranges, and I'm 
increasingly convinced that I should take a crack at writing one 
on arrays, since while Steven's article is quite enlightening, I 
think that it approaches things the wrong way (e.g. it focuses on 
the memory buffers that the runtime manages rather than the 
dynamic arrays themselves) and uses the wrong terminology (e.g. 
talking about the memory buffers that the runtime manages as 
being dynamic arrays, when according to the language spec T[] is 
a dynamic array, and what it refers to really doesn't matter with 
regards to whether it's a dynamic array or not). So, I'll 
probably turn some portion of my talk into an article or two, and 
there won't be the same time pressures there.

At this point, I feel like I have how dynamic arrays work well 
ironed out in my head and that it's actually pretty 
straightforward, but in general, we seem to do a poor job of 
explaining it in a straightforward manner, which results in far 
more confusion on the topic than I think there should be.

> > For the most part, D's dynamic arrays just
> > work.
>
> I know you are not trolling but I can't take your brushing off 
> this issue with phares like "for the most part". That's the 
> frigging problem! "For the most part" is not sufficient. Unless 
> somebody explains the semantics in a consistent way, I will 
> continue to try to do it myself. (Remember: Never append to a 
> parameter slice. Good function, good!)

Aside from performance considerations, you can pretty much ignore 
the capacity issue. The only other concern that it raises is 
whether two dynamic arrays still refer to the same memory block, 
and once you append to either of them, you can't assume that they 
do, and you can't assume that they don't (though it's easy enough 
to check via their ptr properties). That can be managed on some 
level by checking the capacity ahead of time, but really, once 
you start appending, you have to treat each slice as possibly 
separate, and if you want to guarantee it, you really need to use 
dup or idup.

But most code just doesn't need to care about capacity. And if 
you really do need to care, odds are that you can either fix the 
problem with a reserve call or by using Appender, or you should 
just not use dynamic arrays directly. In general, I'd consider 
code that was worrying much about the capacity of dynamic arrays 
to be error-prone - or at least that it's not going about things 
in the best way. By its very nature, it's likely to end up being 
inefficient and is too likely to care about whether two dynamic 
arrays refer to the same memory or not.

Dynamic arrays are badly designed for situations where you can 
have random stuff appending to your array. They just are. Because 
there's no ownership, and they're not full reference types, 
making it trivial to end up with something appended to one 
dynamic array but not actually end up on the one you want it on. 
As such, I'd argue that anything that's doing a lot of random 
appending to arrays shouldn't be using dynamic arrays (at least, 
not without wrapping them so that there's clear ownership of the 
memory).

So, ultimately, I see array capacity as being pretty much a 
non-issue, because most code that would care much about is going 
about things in the wrong way. But maybe what we need is a clear 
set of guidelines about how dynamic array slices should be 
managed so that they're generally used efficiently and without 
risking weird behavior due to expecting two dynamic arrays to 
refer to the same array when they don't.

In general though, I'd argue that code should be constructing 
arrays up front and then processing them as ranges and not doing 
a lot of appending to them later. In particular, if you do a lot 
of appending and removals as you go along, it's going to be a 
performance hit, and you seriously risk having trouble due to 
having operated on a slice of the dynamic array you actually 
wanted to operate on.

- Jonathan M Davis


More information about the Digitalmars-d-learn mailing list