Code layout for range-intensive D code

Sat Jun 9 03:43:31 PDT 2012

The introduction of UFCS in D offers new ways to format D code, 
especially when your code uses many high order functions. What is 
a good layout of the D code in such situations? I have tried 
several alternative layouts, and in the end I found to appreciate 
a layout similar to the one used in F# code. Below I show a kind 
of  extreme example :-)

A textual matrix of bits like this is the input of a little 
nonogram puzzle:

0 1 1 1 1 0
1 0 0 1 1 1
1 0 1 1 1 1
1 1 1 1 1 1
0 1 1 1 1 0

A program has to produce an output like this, in the first part 
of the output it looks at the columns and counts the lengths of 
the groups of "1", and in the second part of the output it does 
the same on the rows:

3
1 2
1 3
5
5
3

4
1 3
1 4
6
4

This is a possible solution program:

import std.stdio, std.algorithm, std.string, std.range, std.conv;

void main() {

     auto t = "table.txt"
              .File()
              .byLine()
              .map!(r => r.removechars("^01".dup))()
              .array();

     const transposed = t[0]
                        .length
                        .iota()
                        .map!(i => t.transversal(i).array())()
                        .array();

     (t ~ [(char[]).init] ~ transposed)
     .map!(r => r
                .group()
                .filter!(p => p[0] == '1')()
                .map!(p => p[1].text())()
                .join(" ")
          )()
     .join("\n")
     .writeln();
}

(Note: the second argument of removechars is "^01".dup because 
removechars is a bit stupid, it requires the same type argument 
on both arguments, and the 'r' given by byLine() is a char[]. 
Here the code performs the string->char[] conversion many times 
because the typical inputs for this program are small enough, 
otherwise it's a premature optimization.)

As you see you have to break the lines, because the processing 
chains often become too much long for single lines.
At first I have put the dots at the end of the lines, but later I 
have found that putting the dots at their start is better, it 
reminds me we are inside a processing chain still.
Putting a single operation on each line (instead of two or three) 
helps readability, allowing a bit of single-line nesting like in 
".map!(i => t.transversal(i).array())()".
And putting the dots and first part aligned vertically helps the 
eye find what chain we are in. In the last part of the program 
you see a nested chain too, inside a map.

I think this code layout is similar to the one used with F# pipe 
operators. In F# code that layout is a kind of standard, I see it 
used by most F# programmers. Maybe some D programmers will want 
to use this kind of layout for such kind of 
higher-order-function-heavy code.

I have found that breaking the chains and giving variable names 
to the intermediate parts of those processing chains doesn't help 
the readability a lot, and the names for those intermediate 
temporary variables tend to be dull and repetitive. On the other 
hand putting just one processing step on each row gives space for 
a short comment on each row, where you thik you need it:

auto t = File("table.txt", "r")                 // comment #1
          .byLine()                              // comment #2
          .map!(r => r.removechars("^01".dup))() // comment #3
          .array();                              // comment #4

In practice I think comment #3 is the only useful here, barely.

Bye,
bearophile