range and algorithm-related stuff

Mon Jan 26 23:07:25 PST 2009

Andrei Alexandrescu:

>Probably array and assign will make the cut as well.<

assign() of my libs is an ugly hack, I'd like such feature to be built-in in the language, with the comma syntax:

a, b = foo(...)
I use such syntax all the time in Python and it's quite handy.
Something like my Record/record can be quite useful as built-in with a destructuring syntax too.

>Other are already present in similar forms and sometimes with different names (but I think that e.g. "chain" is better than my "span").<

As you may have seen xchain allows to chain two or more lazy or eager iterables. If you nest xchains, they look inside what you give them, and simplify, so if you write:
xchain(a, xchain(b, c))
In my libs that's exactly as writing:
xchain(a, b, c)
That improves performance.

In my libs there is also the Chainable!() mixing, that you can put into your iterable classes (and all lazy generators contain it already), so you can already to:

auto d = xrange(5) ~ xfilter([1,0,3]) ~ [3, 5];

In D2 I think such chaining syntax can be built-in. There is already the relative class method to overload.

>The problem with xslice is that it has a O(n) setup time on a input or forward range, and that that cost is somewhat hidden in a non-decomposable manner. People who need something like slice can call drop to advance the range in documented linear time, and then use take.<

In the end what I was looking for is this syntax:

auto e = xfilter([1,0,3,5,7]);
// now  array(e) == [1, 3, 5, 7]
auto e2 = e[1..3];
// now  array(e2) == [3, 5]

That is, to allow the concat and slicing syntax among lazy (and eager) iterables too.
Such stuff is present in Python and Haskell (but in Python the syntax is a bit worse, you need xchain and islice functions to do that).

Also, take a look at my boolean() function, it's quite handy.

------------------

>Oh, it's the name that threw me off. I automatically assumed that "frequency" is normalized counts.<

You are right. I have an idea regarding how to fix that: I already have a count() function that given an iterable (lazy or not) and an item, counts how many of such items are present, and returns their count.

So I can remove the frequency() function and make count more flexible: when you give one item (beside the iterable), it gives its count, when you give just the iterable it returns the associative arrays of items:count.

An optimization: currently D associative arrays are very slow when they are small (you can scan an array of length about 85+ items in the same time you can access an associative array of 85 pairs), so at the beginning, when the number of item-count pairs is low it uses internally just one array (or an ArrayBuilder). And switches to using an associative array when the number of keys is more than about 50. At the end such improved count()returns an an associative array when you give it only an iterable (so creates the associative array if not already present).

--------------

Regarding such topic, there are other things that will be very handy (I have already explained such things in the past, but if you want more info ask):
- Improve a lot the == among arrays and among associative arrays and among structs. This is a large hole in the language.
- foreach scan on associative arrays keys first, and values then. This will simplify *LOT* of code.
- A very clean syntax to define lazy/eager arrays/iterables (like in Python) as single expressions.
- A clean syntax to define lazy functions, better than the current opApply.
- More intelligent printing function.

At the moment D2 needs to become a little more handy regarding those things. The most common idioms must be so simple that you don't need to look for them into the manual each time. A purpose of a good language is to "force" you to look in the manual only when you are doing something uncommon (for example for performance).

Bye,
bearophile