Converting the nodes to tagged structs, and allocating the nodes with a memory pool data structure (that allocates large chunks from the C heap), seems to reduce the total running time to about 0.15 seconds. I don't know a good language to write this kind of code. Bye, bearophile