Sort Associative Array by Key

berni someone at somewhere.com
Wed Aug 28 07:26:57 UTC 2019


On Tuesday, 27 August 2019 at 16:25:00 UTC, Samir wrote:
> As I've mentioned on the list before, I really struggle to 
> understand how some of the std.algorithm functions such as 
> `map` work when combined with things like `array`, `sort` and 
> especially `zip` but really appreciate the support I find here 
> on the forum.

A few months ago I stuggled with that too. Meanwhile I think I 
got it.

Let's examine just the first step in this sequence:

> writeln(foo.byPair);

This leads to

> [Tuple!(string, "key", int, "value")("BZP", 5), Tuple!(string, 
> "key", int, "value")("VXE", 8), Tuple!(string, "key", int, 
> "value")("JLC", 2)]

Well, this look like an array of tuples, each tuple containing a 
key and a value. I assume, that you understand tuples, I think, 
they are not the problem here. The problem arises, because the 
result of "foo.byPair" isn't an array. Actually writeln converts 
the result to an array, before printing it (or at least it 
pretends to do so). You can see this with:

> writeln(typeof(foo.byPair).stringof);

Which is:

> MapResult!(__lambda2, Result)

Ooops, what's that? Having a look at the documentation isn't very 
instructive (at least at beginners level):

> auto byPair(AA)(AA aa)
> if (isAssociativeArray!AA);

That's almost all you get. Almost, because the description 
mentions, that the return value is a forward range. Ah, it's a 
range. I'm not sure, how much you know about ranges. If you want 
to understand that stuff, you should study them. E.g. try to 
write some ranges of your own.

When thinking of a range, I imagine it as sort of a voucher for 
something like an array. You can use that voucher to get elements 
from that array. In case of an forward range, you are restricted 
to get the elements ony by one from the beginning. You get them 
by using the function front(), which all ranges provide. Let's 
try it:

> auto something = foo.byPair;
> writeln(something.front);

And we get:

> Tuple!(string, "key", int, "value")("BZP", 5)

That's indeed the first of our elements. We don't know yet, what 
this range is doing behind the sceens. But I can tell you: When 
calling byPair it hasn't done much: It just signed that voucher 
and handed it out to you (that's called lazy evaluation, contrary 
to eager evaluation). Maybe you never use that voucher. In that 
case it would be wasted time to calculate all the elements, that 
are never used.

The moment you call front, that range starts to do some of the 
work. It assembles the first item of the range and hand's this 
out to you. Then it stops again - the other items are not yet 
calculated. That's done, when you call front again (but before 
you need to call popFront, to remove the first item, else you 
would get the item, you allready have, once again). And so on.

Now we know, how to use this range and that's what "array" does: 
It calls front and popFront as long as there are more elements 
(this can be checked by calling empty on the range), and puts 
them into an array. But what we don't understand yet is that 
strange type of the range: MapResult!(__lambda2, Result).

For that, we should have a look at the implementation (I'm using 
http://dpldocs.info/experimental-docs/std.array.byPair.html for 
that, because it has a link to the documentation at the bottom), 
we find out, that byPair is implemented using byKeyValue and map:

> return aa.byKeyValue
>        .map!(pair => tuple!("key", "value")(pair.key, 
> pair.value));

byKeyValue is defined in object, which has no source code - I 
guess it's implemented directly in the compiler somehow. By using 
typeof, we find out that the type is "Result" and by using 
writeln we find out, that the result of byKeyValue looks like an 
array of Pairs of pointers. Again it's a range, that we can use 
like above.

This range (think again of a voucher) is now send to map. Map 
needs, beside this range, a function. You could write that 
function like:

> auto make_tuple(Pair pair)
> {
>     return tuple!("key", "value")(pair.key, pair.value);
> }

Than the above could be written as

> return aa.byKeyValue.map!(make_tuple);

Beside the fact, that I cannot find the definition of Pair (and 
therefore would need to make a template instead of the function), 
it's not necessary to write that function explicitly. There is a 
way, to write functions more compressed, namely by using =>, like 
it's done above. Those functions are called lambda-functions 
(because at some places a greek lambda symbol was used for the 
function parameter).

Now, map is lazy too. It just remembers that function, the 
"voucher" it got and hand's you an other voucher. That's all. 
This new voucher is called "MapResult", which is actually a 
struct with functions "front", "popFront" and "empty" (and some 
more, but we don't care). And this struct has two 
template-parameters, namely a function and a range. We know that 
range already, it's Result from byKeyValue. And that function, 
which has no name, got the name "__lambda2" from the compiler. 
The 2 suggests, that there is an other lambda somewhere around, 
but I don't know where. Maybe somewhere inside byKeyValue.

When now calling front on this MapResult, map starts it's work by 
calling front on Result. And than it uses whatever it got (a Pair 
with two points, pointing at "VXE" and 8), to call the 
lambda-function and hands you back the result of that function 
(which is a tuple with key "VXE" and value 8).


If you now go through that whole chain once more, you can 
understand what happens: byPair takes foo and hands a voucher to 
array, which hands a voucher to sort, which hands a voucher to 
map which hands that voucher to writeln. writeln now uses the 
voucher by asking for the first element. Map calculates it's 
first element by asking sort. Sort cannot sort, when not knowing 
all elements (I think). Hence it askes for all the elements and 
array is busy asking byPair for all the elements and byPair get's 
them from foo. Now sort sorts and hand's the first element back 
to map, which transforms it and sends it's result to writeln, 
which prints.

When writeln continues and asks for the next element, things work 
a little bit different. It still asks map and map asks sort. But 
sort has allready done the whole work and now just sends back the 
next item. And so on.

Hope this helps.


More information about the Digitalmars-d-learn mailing list