Optimizing Java using D

Sat Jul 5 09:02:51 PDT 2014

05-Jul-2014 19:08, Wanderer пишет:
> On Saturday, 5 July 2014 at 14:20:33 UTC, Dmitry Olshansky wrote:
>> Provision some extra space in each record. DBs do this all the time,
>> regardless of layout.
>
> Which means waste of space you complained about just below.

There are trade-offs. The world is not black and white and I don't 
follow 'one rule everywhere'.

> Besides, you understand this is not a solution: one byte more than that
> reserve can provide, and the reordering should be done.

Which is what happens at a times, regardless of how DB was organized.

> Stuff like FAT
> was created exactly because linear storages are terribly inefficient
> unless they are read-only.

Frankly FAT is junk. But in any case file maintenance problem is all 
about balancing wasting some memory on gaps with the need defragment 
linked chains. There is no silver bullet you seem to imply.

>> It's funny to see how an _ability_ to avoid indirection can be framed
>> as a problem. Nothing prevents having an array of pointers to structs
>> or just using classes if desired.
>
> Pointers to structs? Manually add indirection for sorting? That's
> reinventing the wheel. Indeed better (and simpler) using classes
> instead. They provide the same functionality and without using unsafe
> features.

Pointers are perfectly fine as long as there is no pointer arithmetic.
Also there is no problem just constructing a sorted index (e.g. there is 
function for that in Phobos) for the array of big structs, and voila - 
the same reference to the item, but the stuff is neatly packed in a 
contiguous array.

>> On the contrary sorting an array of pairs of integers in say Java is
>> shamefully inefficient. No wonder packed objects (=value types)  are
>> coming to Java, to address these sorts of need.
>
> For pair of integers, you can use long and sort an array of longs.

Which is hopelessly admitting defeat. A pair may have non-trivial 
comparator. And a minor step beyond that such as a pair of double and 
integer and it fails completely.

> But
> if your structure is any more complex (or has nontrivial comparison
> algorithm), one should really consider using classes first.

And there Java-style indirection-happy classes "shine". For instance 
modeling any complex stuff with classes alone would lead to things like:
House--reference->Bedroom---reference-->Bed--reference-->Pillow

Which is incredible waste of memory and speed on something so simple.

>> Ehm. One can always redesign struct layout to push stuff out through
>> an indirection. And it's not like structs grow organically on their
>> own ;)
>
> You mean structs with pointers? That violates the whole principle of
> structs IMHO.

What? But again it seems to be the general direction of you post - 
puristic one size fits all. The idea that there is true holy way, is 
very much province of OOP language proponents.

> Ideally, when you assign/copy a struct, you assign/copy a
> *value*. Usage of pointers would add undue aliasing to the copy process,
> leading to bugs.

Copy constructors or in D parlance postblits. There is also CoW and 
whatnot. In fact swap doesn't create a copy in D, it just does a bitwise 
swap in-place.

>> In the same vane design of ref-to-instance is not scalable with
>> millions of items scattered across the memory causing awful cache miss
>> rate (as the indirection got to be crossed to compare things), not to
>> mention the memory wasted per each object.
>
> It depends. With classes, you have to read and write only small portions
> of memory - read only fields to compare

Strictly more then with structs to be precise as long as it's just 
compare we talk about. A reference + fields vs just fields. Also the 
cost of extra indirection is very much non-negligible.

> write only references to swap.

That much is true.

> Even with cache misses, this is still very small and efficient
> operations.

Disagree - the moment a cache line is loaded it doesn't matter how much 
you read in this line. Even a tiny read missing a cache is paid in full.

> With structs, you have to read and write much bigger regions
> of memory - to swap two items, you have to read both of them fully, then
 > write both of them fully into new locations -

I bet compare will load the whole thing into cache anyway, plus 
contiguous structs may fit into a single line. Sequential storage has 
advantages.

> which would make caching
> totally useless, whether structs are located sequentially or not.

How's that?

> It's
> like copying a large file - the cache gets exhausted instantly, and
> doesn't speed things up at all.

And the only thing worse then copying a large file is copying a large 
file fragmented in tiny splinters.

-- 
Dmitry Olshansky