Java streams Vs LINQ Vs D

Wed Mar 27 15:19:00 PDT 2013

Linked on Reddit I've seen a nice comparison of Java streams Vs 
dotnet LINQ:

http://blog.informatech.cr/2013/03/24/java-streams-preview-vs-net-linq/

Despite they aren't a complete list, those little challenges are 
well chosen, they are operations done commonly. So I have 
translatated them to D with Phobos. For most of them I have found 
a nice D translation. But few of them uncover holes in Phobos, 
that I alredy know. (Maybe some of them are not really Phobos 
holes, but just my lack of knowledge about Phobos and D. So your 
better solutions are welcome).

If you want to read the whole list of my translations:
http://codepad.org/0KtXu7nh

Below I list just the five troubled challenges, with the LINQ 
solution followed by one or more D solutions.

For all the solutions I import several modules:

import std.stdio, std.algorithm, std.range, std.typecons, 
std.traits,
        std.array, std.string;

- - - - - - - - - - - -

Challenge 2: Indexed Filtering

Find all the names in the array "names" where the length of the 
name is less than or equal to the index of the element + 1.

string[] names = { "Sam", "Pamela", "Dave", "Pascal", "Erik" };
var nameList = names.Where((c, index) => c.Length <= index + 
1).ToList();

In D:

     auto names2 = ["Sam","Pamela", "Dave", "Pascal", "Erik"];
     auto nameRange = iota(size_t.max)
                      .zip(names2)
                      .filter!q{ a[1].length <= a[0] }
                      .map!q{ a[1] };
     nameRange.writeln;

On Bugzilla I have proposed to add an enumerate():
http://d.puremagic.com/issues/show_bug.cgi?id=5550

With it the D code improves:

     auto nameRange2 = names2
                       .enumerate
                       .filter!q{ a[1].length <= a[0] }
                       .map!q{ a[1] };
     nameRange2.writeln;

If D gains a syntax to unpack tuples in function signatures the 
code becomes (untested):

     auto nameRange2 = names2
                       .enumerate
                       .filter!((i, n) => n.length <= i)
                       .map!q{ a[1] };
     nameRange2.writeln;

Beside adding enumerate() that is useful in many other 
situations, another (not alternative!) idea is to add 
iFilter/iMap (that mean indexed filter and indexed map), where 
the filtering or mapping function is supplied by an index+item 
2-tuple:

     auto nameRange2 = names2.iFilter!((i, a) => a.length <= i);

Or equivalently:

     auto nameRange2 = names2.iFilter!q{ a.length <= i };

Those ifilter/imap functions are present in the standard library 
of the F# language.

- - - - - - - - - - - -

Challenge 3: Selecting/Mapping

Say we have a list of names and we would like to print “Hello” in 
front of all the names:

List<string> nameList1 = new List(){ "Anders", "David", "James",
                                      "Jeff", "Joe", "Erik" };
nameList1.Select(c => "Hello! " + c).ToList()
          .ForEach(c => Console.WriteLine(c));

In Phobos there is no forEach(), so you have to use foreach:

     auto nameList1 = ["Anders", "David", "James", "Jeff", "Joe", 
"Erik"];
     foreach (name; nameList1)
         writeln("Hello! ", name);

The only advantage I see of a forEach() over foreach() is that 
it's usable at the end of an UFCS chain.

- - - - - - - - - - - -

Challenge 12: Grouping by a Criterium

Group the elements of a collection of strings by their length.

string[] names = {"Sam", "Samuel", "Samu", "Ravi", "Ratna",  
"Barsha"};
var groups = names.GroupBy(c => c.Length);

In Phobos there is a group() but it can't be used here because it 
returns just one of the equivalent grouped items. And I can't use 
std.array.assocArray for similar reasons.

     auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", 
"Barsha"];
     string[][size_t] groups;
     foreach (name; names3)
         groups[name.length] ~= name;
     groups.byValue.writeln;

Andrei has recently written a groupBy, not yet merged:
https://github.com/D-Programming-Language/phobos/pull/1186

Using that future groupBy the D code improves a little (untested. 
In DMD 2.063 schwartzSort accepts a string literal too):

     auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", 
"Barsha"];
     auto groups = names3
                   .schwartzSort!q{ a.length }
                   .groupBy!q{ a.length == b.length };
     groups.writeln;

By the way, I like Python for having a free len() function that's 
usable for higher order functions like map and filter. In Phobos 
there is walkLength():

     auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", 
"Barsha"];
     auto groups = names3
                   .schwartzSort!walkLength
                   .groupBy!q{ a.walkLength == b.walkLength };
     groups.writeln;

Unlike schwartzSort the Phobos group/groupBy use a comparison 
function like "a.length == b.length" instead of a less flexible 
but more handy single function like "c => c.Length". So I'd like 
something like a keyGroup/keyGroupBy that accepts a 
single-argument function as schwartzSort. (And I'd like 
schwartzSort to be renamed "keySort").

     auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", 
"Barsha"];
     auto groups = names3
                   .schwartzSort!walkLength
                   .keyGroupBy!walkLength;
     groups.writeln;

Another problem with group/groupBy is that they work by sorting. 
But a hash-based O(n) group/groupBy is also conceivable, 
potentially faster, and leading to simpler code, because you 
don't need to sort the items first:

     auto names3 = ["Sam", "Samuel", "Samu", "Ravi", "Ratna", 
"Barsha"];
     auto groups = names3.hashKeyGroupBy!walkLength;
     groups.writeln;

Uhm. The name "hashKeyGroupBy" is becoming a bit too much complex 
:-) So maybe it's better not go there.

- - - - - - - - - - - -

Challenge 13: Filter Distinct Elements

Obtain all the distinct elements from a collection.

string[] songIds = {"Song#1", "Song#2", "Song#2", "Song#2", 
"Song#3", "Song#1"};
var uniqueSongIds = songIds.Distinct();

This is not too much bad in D, there is uniq(), but first you 
need to .sort or .idup.sort or .array.sort the original 
array/range:

     auto songIds = ["Song#1", "Song#2", "Song#2", "Song#2", 
"Song#3", "Song#1"];
     auto uniqueSongIds = songIds.sort().uniq;
     uniqueSongIds.writeln;

A hash-based uniq that doesn't need a previous sorting is 
conceivable. But see also below.

- - - - - - - - - - - -

Challenge 14: Union of Two Sets

Join together two sets of items.
LINQ

List<string> friends1 = new List<string>() {"Anders", 
"David","James",
                                             "Jeff", "Joe", 
"Erik"};
List<string> friends2 = new List<string>() { "Erik", "David", 
"Derik" };
var allMyFriends = friends1.Union(friends2);

This seems a bit too much complex to do in D+Phobos:

     auto friends1 = ["Anders", "David","James", "Jeff", "Joe", 
"Erik"];
     auto friends2 = ["Erik", "David", "Derik"];
     auto allMyFriends = 
friends1.sort().setUnion(friends2.sort()).uniq;
     allMyFriends.writeln;

Note that you have to call uniq at the end because that's not a 
set union, it's a badly named function. A better name for it is 
"bagUnion" because it doesn't remove the duplications, and a set 
operation should.

For the Challenge 13 and 14 I suggest to not add more functions 
to std.algorithm, and instead just rely on a set data structure, 
as in Python:

>>> song_ids = ["Song#1", "Song#2", "Song#2", "Song#2", "Song#3", 
>>> "Song#1"]
>>> set(song_ids)
set(['Song#1', 'Song#2', 'Song#3'])

>>> friends1 = ["Anders", "David","James", "Jeff", "Joe", "Erik"]
>>> friends2 = ["Erik", "David", "Derik"]
>>> set(friends1).union(friends2)
set(['Erik', 'Joe', 'Jeff', 'Derik', 'James', 'Anders', 'David'])

In my D1 dlibs I had a Set!T data structure (with a set() helper 
function) that offered a similar syntax (here I use D2 UFCS):

     auto songIds = ["Song#1", "Song#2", "Song#2", "Song#2", 
"Song#3", "Song#1"];
     auto uniqueSongIds = songIds.set;

     auto friends1 = ["Anders", "David","James", "Jeff", "Joe", 
"Erik"];
     auto friends2 = ["Erik", "David", "Derik"];
     auto allMyFriends = friends1.set.united(friends2);

- - - - - - - - - - - -

Bye,
bearophile