Array Operations: a[] + b[] etc.

Dmitry Olshansky dmitry.olsh at gmail.com
Fri Nov 23 07:58:14 PST 2012


11/23/2012 6:06 AM, John Colvin пишет:
> On Thursday, 22 November 2012 at 21:37:19 UTC, Dmitry Olshansky wrote:
>> Array ops supposed to be overhead-free loops transparently leveraging
>> SIMD parallelism of modern CPUs. No more and no less. It's like
>> auto-vectorization but it's guaranteed and obvious in the form.
>

> I disagree that array ops are only for speed.

Well that and intuitive syntax.

> I would argue that their primary significance lies in their ability to
> make code significantly more readable, and more importantly, writeable.
> For example, the vector distance between 2 position vectors can be
> written as:
> dv[] = v2[] - v1[]
> or
> dv = v2[] - v1[]
> anyone with an understanding of mathematical vectors instantly
> understands the general intent of the code.

Mathematical sense doesn't take into account that arrays occupy memory 
and generally the cost of operations.
Also :
dv = v2 - v1
Is plenty as obvious, thus structs + operator overloading covers the 
usability department of this problem. Operating on raw arrays directly 
as N-dimensional vectors is fine but hardly helps 
maintainability/readability as the program grows over time.

> With documentation something vaguely like this:
> "An array is a reference to a chunk of memory that contains a list of
> data, all of the same type. v[] means the set of elements in the array,
> while v on it's own refers to just the reference. Operations on sets of
> elements e.g. dv[] = v2[] - v1[] work element-wise along the arrays
> {insert mathematical notation and picture of 3 arrays as columns next to
> each other etc.}.
....
So far so good, but I'd rather not use 'list' to define array nor the 
'set' of elements. Semantically v[] means the slice of the whole array - 
nothing more and nothing less.

> Array operations can be very fast, as they are sometimes lowered
> directly to cpu vector instructions. However, be aware of situations
> where a new array has to be created implicitly, e.g. dv = v2[] - v1[];
> Let's look at what this really means: we are asking for dv to be set to
> refer to the vector difference between v2 and v1. Note we said nothing
> about the current elements of dv, it might not even have any! This means
> we need to put the result of v2[] - v1] in a new chunk of memory, which
> we then set dv to refer to. Allocating new memory takes time,
> potentially taking a lot longer than the array operation itself, so if
> you can, avoid it!",

IMHO I'd shot this kind of documentation on sight. "There is a fast tool 
but here is our peculiar set of rules that makes certain constructs slow 
as a pig. So, watch out! Isn't that convenient?"

> anyone with the most basic programming and mathematical knowledge can
> write concise code operating on arrays, taking advantage of the
> potential speedups while being aware of the pitfalls.
>
People typically are not aware as long as it seems to work.

> In short:
> Vector syntax/array ops is/are great. Concise code that's easy to read
> and write. They fulfill one of the guiding principles of D: the most
> obvious code is fast and safe (or if not 100% safe, at least not too
> error-prone).

This change fits scripting language more then system.
For me
a[] = b[] + c[];
implies:
a[0..$] = b[0..$] + c[0..$]
so it's obvious that lengths better match and 'a' must be preallocated.


> More vector syntax capabilities please!

It would have been nice to write things like:
a[] = min(b[], c[]);
where min is a regular function.

But again I don't see the pressing need:
- if speed is of concern then 'arbitrary function' can't be sped up much 
by hardware
- if flexibility then range-style operation is far more flexible
-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list