Saving and loading large data sets easily and efficiently
Brett
Brett at gmail.com
Mon Sep 30 20:10:21 UTC 2019
I have done some large computations where the data set is around
10GB and takes several minutes to run. Rather than running it
every time regenerating the same data, can I simply save it to
disk easily?
The data is ordered in arrays and structs. It's just numbers/POD
except some arrays use pointers to elements in other arrays(so
the structs are not duplicated).
Hence any saving routine would have to take this in to account
and properly reference the references structs rather than
duplicate them...
Hence why it is not so simple as writing out the array data.
Ideally it would write to binary to save space.
Essentially pointers will get to file offsets and so, in some
sense, it is as if one had a memory map where ptr = 0 is the
start of the data structure and ptr=34 would reference the 34
byte. All pointers in the data struct refer to all pointers
relative to the main structure(no heal allocations except for the
arrays of struct pointers).
So it much more difficult than POD but would still be a little
more work to right... hoping that there is something already out
there than can do this. It should be
The way the data is structured is that I have a master array of
non-ptr structs.
E.g.,
S[] Data;
S*[] OtherStuff;
then every pointer points to an element in to Data. I did not use
int's as "pointers" for a specific non-relevant reason but I
should be able to convert every pointer to an index by simply
removing the offset. [Technically I do not know if this is
occurring but it should]
OtherStuff's elements just reference Data's elements.
I imagine it wouldn't be that difficult to write out the Data.
Save Data to a file then append the rest of the info and all
pointers from memory can be converted to pointers to disk by
simple a ptr - Data.ptr. It still requires managing some issues
stuff though. As I do have some associative arrays:
S*[int] MoreStuff;
So I'm looking for a more robust solution that will handle any
future expansions.
More information about the Digitalmars-d-learn
mailing list