RFC on range design for D2

Tue Sep 9 06:33:34 PDT 2008

bearophile wrote:
> I don't have enough experience to have a clear view on the whole
> subject, so I write only some general comments:

Uh-oh. That doesn't quite bode well :o).

> 1) In time I have seen that everyone that likes/loves language X
> wants D to become like X. This is true for Andrei too, you clearly
> like C++ a lot.

Stop right there. This is just a presupposition. I think I could say I 
*know* C++ a lot, which is quite different. But then I know a few other 
languages quite well, along with the advantages and disadvantages that 
made them famous.

Maybe I could say I do like the STL. Not the quirks it takes to 
implement it in C++, but for the fact that it brings clarity and 
organization in defining fundamental data structures and algorithms. For 
my money, other collection/algorithms designs don't hold a candle to STL's.

> The worse fault of C++ is excessive complexity,
> that's the single fault that's killing it, that has pushed people to
> invent Java, and so on. Walter has clearly tried hard to make D a
> less complex language. This means that D is sometimes less powerful
> that C++, and every smart programmer can see that it's less powerful,
> but having 15% less power is a good deal if you have 1/3 of the
> complexity. As you may guess I don't like C++ (even if I like its
> efficiency I'm not going to use it for real coding), so I don't like
> D to become just a resyntaxed C++. Note that originally D was more
> like a statically compiled Java, and while that's not perfect, that's
> probably better than a resyntaxed C++. So I suggest less power, if
> this decreases a lot what the programmer has to keep in the mind
> while programming. If you don't believe me take a look at what the
> blurb of D says:
> 
>> D is a systems programming language. Its focus is on combining the
>> power and high performance of C and C++ with the programmer
>> productivity of modern languages like Ruby and Python. Special
>> attention is given to the needs of quality assurance,
>> documentation, management, portability and reliability.<
> 
> As you see it's a matter of balance.

I know it's a matter of balance. I am not sure what gave you the idea 
that I didn't.

> 2) Keep in mind that not all D programmers are lovers of C++ and they
> don't all come from C++, some of them are actually lovers of other
> languages, like Java, C#, Python, Ruby, Lisp, etc. And as you may
> expect, they may want to push D to look more lisp, Python, C#, etc.

I like many things in all of the languages above. Don't forget it was me 
who opened Walter to Lisp :o).

> 3) opApply has some problems, but it isn't infamous.

opApply has two problems:

1. It can't save the state of the iteration for later resumption.

2. It uses repeated function calls through a pointer, which is 
measurably slow.

Both disadvantages are major. The first fosters container design without 
true iteration. That's just bad design. (Look at built-in hashes.) The 
second is even worse because it makes foreach a toy useful when you read 
lines from the console, but not in inner loops. Which is kind of ironic 
because they are inner *loops*.

I think many would agree that foreach minus the disadvantages would be 
better.

> 4) Syntax matters, and function/ method /attribute names too. So they
> have to be chosen with care. I suggest to use single words when
> possible, and to consider "alllowercase" names too (I know this
> contrasts with D style guide).

It's very easy to give very general advice (which to be honest your post 
is replete of). It would help if you followed with a few concrete points.

> 4b) Coping function/method/attribute names from C++ isn't bad if
> people think such names are good. Inventing different names just to
> be different is not good.
> 
> 5) The source code of the current algorithm module of D2 is already
> very complex to follow, it smells of over-generalization here and
> there. Sometimes it's better to reduce the generality of things, even
> if that reduces their power a little, to reduce complexity, etc.
> Tango code too isn't perfect, but it often looks more human. While
> you have created the algorithm module I too have created something
> similar, but based on different grounds.

I am sure you like your library more than mine, because it's your design 
and your realization.

The code in std.algorithm is complex because it implements algorithms 
that are complex. I know if someone would look over partition, rotate, 
topN, or sort, without knowing how they work, they wouldn't have an easy 
job picking it up. That is fine. The purpose of std.algorithm's 
implementation is not to be a tutorial on algorithms.

On occasion std.algorithm does a few flip-flops, but always for a good 
reason. For example it takes some effort to allow multiple functions in 
reduce. Consider:

double[] a = [ 3.0, 4, 7, 11, 3, 2, 5 ];
// Compute minimum and maximum in one pass
auto r = reduce!(min, max)(double.max, -double.max, a);
// The type of r is Tuple!(double, double)
assert(r._0 == 2);  // minimum
assert(r._1 == 11); // maximum

A simpler reduce would have only allowed one function, so I would have 
had to write:

auto m = reduce!(min)(double.max, a);
auto M = reduce!(max)(-double.max, a);

On the face of it, this looks reasonable. After all, come one, why can't 
one write two lines instead of one. However, min and max are so simple 
functions that the cost of computing them is drown by the cost of 
looping alone. On a large array, things become onerous so I had to write 
the loop by hand.

How do I know that? Because I measured. Why did I measure? Because I 
care. Why do I care? Because the typical runtime of my programs measure 
in hours of sheer computation, and because things like reduce and others 
in std.algorithm are in the core loops. And I am not alone.

If I can help it, I wouldn't want to write an algorithms library for a 
systems-level programming language with fundamental limitations that 
makes them unsuitable for efficient computation. Costly abstractions are 
a dime a dozen. It's efficient abstractions that are harder to come across.

If you believe I am wasting time with cutesies in std.algorithm, please 
let me know of the places and of the suggested improvements.

> 6) Built-in data types are important, they aren't meant to replace a
> good standard library, where you can find more specialized and more
> efficient data structures. The built-in data types are meant to: -
> offer a very handy syntax, easy to use and remember, short to type
> too. - They have to be efficient in a very wide variety of
> situations, so they must avoid having really bad peformance in a
> large number of situations, while it's okay for them to be not much
> fast in any situation. - They are useful for example when you have
> little data, in 90% of the code where max performance isn't so
> important. In the other situations you are supposed able to import
> things like an IntrusiveRedBlackHashMap from the std lib.

I agree.

> 7) Take a look at the lazy "views" of keys/values/items of Python3,
> how they fit into your view of such ranges. Java too has something
> similar. (I can give a summary if you want. But in few words if you
> have an associative array (dict) d.keys() doesn't return an array,
> but a "view", a lazy iterable that's an object that can be sliced
> lazily, iterated, etc. This is way more efficient than the
> .keys/.values of the currenct D implementation).

To quote a classic, Lisp has had them all for 50 years. Well-defined 
ranges are the perfect stepping stone towards lazy computation. In 
particular input ranges are really generators. I plan to implement 
things like lazyMap and lazyReduce, and also generators that are so 
popular in functional languages.

> 8) Lot of functions/thinghies in my mostly-functional library are
> designed to be lazy, that is they generate items on the fly. At the
> moment D lacks a *lot* such view of the world. Again, take a good
> look at how Python 3 has shifted from eager to lazy in lot of its
> style. How do such things fit in your range view? I think they can
> fit well, but the D language has to shift its phylosophy a little to
> support such lazy generators/iterators more often and commonly in the
> built-ins too.

I agree that D could use more lazy iteration instead of eager computation.

> 9) I have a long discussion in the main D newsgroup, a partial
> conclusion was that a struct of 2 items may not be enough for the
> current implementation of dynamic arrays, because withot a capacity
> field, the append is dead-slow. I presume this is ortogonal to your
> range propostal, but I have mentioned it here because you keep
> talking about dynamic arrays as 2-pointer structs, and that may be
> changed in the future.

I saw that discussion. In my opinion that discussion clarifies why 
slices shouldn't be conflated with full-fledged containers. Handling 
storage strategy should not be the job of a slice. That's why there's a 
need for true containers (including straight block arrays as a 
particular case). Ranges are perfect for their internal implementation 
and for iterating them.

Andrei