Few things II

Sat Aug 11 02:17:45 PDT 2007

Oskar Linde:

>Many times people have been posting improvements to the built in AA. The problem is proving that the improvements really are better for the general case. The hardest part has proven to be defining the "general case".<

I understand. We may collect a large repository of code/snippets that use AAs, so they can be tuned against that repository.

-----------------

BCS:

>I am currently working on a project that is written in C# but which we are considering converting to D. The use of yield has turned out to be a significant performance problem. This is the reson I don't like it.<

I see. Let's assume you are right, that yield is too much slow for a low level language (as D. D is a low leven language that is able to be used as medium-level language too).
I think yield is good if you use Python, that's a high level language.
I also think a language like D can have both some hi-level constructs and some lower level ones. Often many parts of a program don't need max speed (a profiler usually shows just few hot spots in a program), so there you may use yield. Where speed is more important the language can offer you means to go faster. This looks like a win-win situation to me (but the programmer must know what language constructs are slow, so they don't have to be used in speed critical parts of the code). After developing ShedSkin I think languages fit for both low level and high level programming can be designed (but a programmer has to know them well to avoid using the slow features in the speed/memory critical parts of the code).

>D's opApply syntax IMO seems much cleaner and easier to understand.<

I can't agree (but I know Python much more than D, so I am biased), yield of Python is very simple :-) Here are some examples (to reduce space I have reduced the indent from 4 to 2 spaces, I have removed most comments and all the docstrings):

Space-efficient and lazy version of sieve of Eratosthene, by Eppstein:

def xsieve():
  yield 2         # Only even prime.  Sieve only odd numbers.

  # Generate recursively the sequence of primes up to sqrt(n).
  # Each p from the sequence is used to initiate sieving at p*p.
  roots = xsieve()
  root = roots.next()
  square = root*root

  # The main sieving loop.
  # We use a hash table D such that D[n]=2p for p a prime factor of n.
  # Each prime p up to sqrt(n) appears once as a value in D, and is
  # moved to successive odd multiples of p as the sieve progresses.
  D = {}
  n = 3
  while True:
    if n >= square:   # Time to include another square?
      D[square] = root+root
      root = roots.next()
      square = root*root

    if n not in D:    # Not witnessed, must be prime.
      yield n
    else:         # Move witness p to next free multiple.
      p = D[n]
      q = n+p
      while q in D:
        q += p
      del D[n]
      D[q] = p
    n += 2        # Move on to next odd number.

Generator that returns the integer codons of the given (string) sequence:

def xcodons(seq, overlap=0):
  assert 0 <= overlap <= 2
  len_seq = len(seq)
  pos = 0
  while len_seq - pos >= 3:
    yield seq[pos : pos+3]
    pos += 3 - overlap

Divides a given 2D mat in equal blocks, of given size, and yields them:

def xblockfy(mat, blocknrows, blockncols):
  if blocknrows <= 0: raise ValueError("blocknrows must be >= 1")
  if blockncols <= 0: raise ValueError("blockncols must be >= 1")
  if mat and mat[0]:
    nx = len(mat[0])
    ny = len(mat)
    for y in xrange(0, ny, blocknrows):
      ymax = min(y+blocknrows, ny)
      for x in xrange(0, nx, blockncols):
        block = []
        for yy in xrange(y, ymax):
          block.extend(mat[yy][x: x+blockncols])
        yield block

Arithmetic progression generator, that accepts noninteger increments too:

def xfrange(start, end=None, inc=None):
  if inc is None:
    inc = 1.0
  if end is None:
    end = start + 0.0
    start = 0.0
  else:
    start = float(start)
  count = int( ceil((end-start)/inc) )
  for i in xrange(count):
    yield start + i * inc

Lazy iterable split, with key function too (I have created a similar thing for D):

def xsplit(seq, key=bool, keepkeys=True):
  group = []
  for el in seq:
    if key(el):
      if group:
        yield group
      group = []
      if keepkeys:
        group.append(el)
    else:
      group.append(el)
  yield group

Yields all the distinct pairs of the given seq:

def xpairs(seq):
  lung = len(seq)
  for i,e1 in enumerate(seq):
    for j in xrange(i+1, lung):
      yield e1, seq[j]

Yields all the positions of item in a given iterable:

def xpositions(seq, item):
  for pos, el in enumerate(seq):
    if el == item:
      yield pos

I know you can do those things with D too, this is an example of a translation from Python that uses yield to D, a lazy generator of all the permutations of a sequence, algorithm from Phillip Paul Fuchs, modified:

def xpermutations(items, copy=True):
  if copy:
    items = list(items)
    n = len(items)
    p = range(n+1)
    i = 1
    yield items[:]
    while i < n:
      p[i] -= 1
      j = (p[i] if i & 1 else 0)
      items[j], items[i] = items[i], items[j]
      yield items[:]
      i = 1
      while p[i] == 0:
        p[i] = i
        i += 1
  else:
    items = items[:]
    n = len(items)
    p = range(n+1)
    i = 1
    yield items
    while i < n:
      p[i] -= 1
      j = (p[i] if i & 1 else 0)
      items[j], items[i] = items[i], items[j]
      yield items
      i = 1
      while p[i] == 0:
        p[i] = i
        i += 1

Xpermutations!(TyItem) xpermutations(TyItem)(TyItem[] items, bool copy=true) {
  return new Xpermutations!(TyItem)(items, copy);
}

class Xpermutations(TyItem) {
  int[] range(int stop) {
    int[] result;

    if (stop > 0) {
      result.length = stop;
      for(int i = 0; i < stop; i++)
        result[i] = i;
    }
    return result;
  }

  TyItem[] items;
  bool copy;

  this(TyItem[] items, bool copy=true) {
    this.items = items.dup;
    this.copy = copy;
  }

  int opApply(int delegate(ref TyItem[]) dg) {
    int result;

    int n = items.length;
    TyItem aux;
    auto p = range(n + 1);
    int i = 1;

    if (copy) { // yield duplicated arrays (safer)
      auto items2 = items.dup;
      result = dg(items2);  if (result) goto END; // yield items2
      while (i < n) {
        p[i] -= 1;
        int j = (i & 1) ? p[i] : 0;
        aux = items[j]; items[j] = items[i]; items[i] = aux; // swap
        items2 = items.dup;
        result = dg(items2);  if (result) goto END; // yield items2
        i = 1;
        while (p[i] == 0) {
          p[i] = i;
          i++;
        }
      }
    } else { // yield just references (faster)
      result = dg(items);  if (result) goto END; // yield items
      while (i < n) {
        p[i] -= 1;
        int j = (i & 1) ? p[i] : 0;
        aux = items[j]; items[j] = items[i]; items[i] = aux; // swap
        result = dg(items);  if (result) goto END; // yield items
        i = 1;
        while (p[i] == 0) {
          p[i] = i;
          i++;
        }
      }
    }

    END:
    return result;
  }
}

(If you spot problems in that D code you are encouraged to say it, I am a newbie of D still).
They do similar things, but I can't see how you can say that opApply of D "seems much cleaner and easier to understand" than Python yield :-)

>OTOH you can fake it quiet well with an delegate that stores most of it's state in an object.<

Something quite similar to a working version of your code is the builtin Python function iter(), that given an iterable (like an array, an iterable object, etc) return a lazy iterable object that allows to scan its elements in a flexible way (I have created some similar functions with D). But iter() doesn't replace yield.

Bye and thank you for the answers,
bearophile