Optimizing Java using D

Sat Jul 5 20:19:39 PDT 2014

On Saturday, 5 July 2014 at 16:03:17 UTC, Dmitry Olshansky wrote:
> There are trade-offs. The world is not black and white and I 
> don't follow 'one rule everywhere'.

This is not a trade-off at all. You suggested to keep database 
records linearly, with space gaps between records to support 
"tiny updates". Without serious updates, this is major waste of 
space. With them, your design won't save the day, because any gap 
will be eventually consumed and without fragmentation/reordering, 
the storage will fail.

FYI, nowaday popular databases aren't designed this way, exactly 
for the reasons I described above. All databases I worked with 
(MySQL, Oracle and Firebird) hold records in very compact way, 
without a single byte gaps, with ability of fragmentation, and 
without total physical ordering. So at least designers of these 
DBMS have agreed that your design is not practical.

> Pointers are perfectly fine as long as there is no pointer 
> arithmetic.

Wrong. Merely holding a pointer (i.e. a physical address) is 
unsafe already. Non-deep serialization, or any other 
"preservation" of such a struct and GC is unable to keep the 
track of pointers. GC moves the object around or deletes it, and 
you have a tiny black hole in your app.

> Which is hopelessly admitting defeat. A pair may have 
> non-trivial comparator. And a minor step beyond that such as a 
> pair of double and integer and it fails completely.

I said above: in any non-trivial case, use classes instead of 
overly-clever structures. And if you really, really need 
premature optimization, there is java.nio and buffers. Create 
ByteBuffer (direct one if you need super optimized solution) and 
treat slices of it as "structs". That's possible and easy to 
implement, but really not needed in practice because all you get 
is 0.1% of memory saving and no gain in speed.

> And there Java-style indirection-happy classes "shine". For 
> instance modeling any complex stuff with classes alone would 
> lead to things like:
> House--reference->Bedroom---reference-->Bed--reference-->Pillow
>
> Which is incredible waste of memory and speed on something so 
> simple.

That's not complex. That's very simple. It would become slightly 
more complex if all house parts implemented the same root 
interface and basic operations, like examining what's around and 
finding an object given by its path or properties, or throwing an 
extra pillow or two onto the same bed. All that is just a dream 
for structs.

> Copy constructors or in D parlance postblits. There is also CoW 
> and whatnot. In fact swap doesn't create a copy in D, it just 
> does a bitwise swap in-place.

And here we have a tough choice for structs with pulled subdata 
"for efficiency": either assignment operator copies that data too 
(making routines like sort to perform horribly slow), or they 
perform only shallow copy, causing undue sharing of data and 
violating the whole struct "value" philosophy.

> Disagree - the moment a cache line is loaded it doesn't matter 
> how much you read in this line. Even a tiny read missing a 
> cache is paid in full.

But the amount of these "missed reads" is low, so less amount of 
cache is invalidated. CPU cache is not a single page that gets 
invalidated as a whole, it's more like many small subpages, each 
of them is treaten individually.

If you're really into the low-level mumbo jumbo, here are two 
more aspects for you to consider.

1. Since indirection does not require the object contents to be 
moved around while sorting, these objects can be read into cache 
once and never invalidated later - unlike "right in place" 
structs which invalidate CPU cache each time a swap is performed. 
That is a huge cache win already.

2. Don't forget that new objects are created in eden space in 
Java - which is essentially the stack area. Very fast, compact, 
sequential memory. If your array fits in eden (and that's surely 
true for the forest problem this thread has started from), then 
allocating array of objects is essentially the same as allocating 
array of structs: object contents aren't "scattered in memory" 
but are located one-after-one without gaps between them. That 
greatly aids caching as well. The main difference between eden 
and "array of structs" is that the allocation of the former never 
fails (assuming there's enough memory), only gets slightly slower 
in the worst case, and allocation of the latter can fail because 
"stack overflow error" or "too much memory fragmentation oops", 
even if there's enough free memory.

> And the only thing worse then copying a large file is copying a 
> large file fragmented in tiny splinters.

You surely meant "reading" here, not "copying". Copying large 
fragmented file is as slow as copying large unfragmented file, 
because writing operations are the bottleneck here.