serialization library

Sat Jan 6 02:07:20 PST 2007

On Thu, 09 Nov 2006 02:06:21 +0100, Bill Baxter  
<dnewsgroup at billbaxter.com> wrote:

> Christian Kamm wrote:
>> Based on initial work from Tom S and clayasaurus, I've written this  
>> serialization library. If hope something like this doesn't already  
>> exist!
>
> Great!
>
>>  http://www.math.tu-berlin.de/~kamm/d/serialization.zip
>>  Currently, it only provides binary file io through the Serializer  
>> class. It can
>> - write/read almost (hopefully) every type through a call to  
>> Serializer.describe
>> - track class references and pointers by default
>> - serialize classes and structs through a templated 'describe' member  
>> function
>> - write derived classes from base class reference*
>> - read derived classes into base class reference*
>> - serialize not default constructible classes*
>>  (* for this to work, the class needs to be registered with the archive  
>> type)
>>  It has far less features than boost::serialization but is already in a  
>> very usable state: FreeUniverse, a D game based on the Arc library,  
>> uses it for writing and loading savegames as well as other persistant  
>> state information.
>
> I'm using Boost::serialization but I'm not at all happy with it.  But  
> the things that I don't like mostly have to do with versioning, which it  
> looks like you don't support anyway.
>
>> What it does not do/is missing:
>> - exception safety / multithread safety
>> - out-of-class/struct serialization methods (is it possible to check  
>> whether a specific overload exists at compile time?)
>
> I could be mistaken but I think this is that ADL / Koenig Lookup  
> territory that Walter doesn't want go into.
>
>> - static arrays need to be serialized with describe_staticarray (static  
>> arrays can't be inout, so the general-purpose template method doesn't  
>> work... is there a way around the problem?)
>> - things I forgot right now
>
> Endian issues?
>
>>  Documentation is still rather sparse. This short example shows the  
>> basic usage
>
>
> Just a wish list item, but I'd prefer an actual "file format" library as  
> opposed to a serialization library.  Maybe a file format library would  
> build on top of the serialization library, but anyway, the key  
> difference is that a serialization lib aims to turn *particular* data  
> structures into a binary format that can be losslessly loaded back into  
> the same data structure later.
>
> But that is not the way people design generic file formats, like say the  
> Photoshop file format.  Things like that need to be very extensible and  
> shouldn't be tied to particular data structures.  I think that's where  
> boost::serialization gets into trouble.  Once you start talking about  
> versioning, you're no longer talking about one specific data structure.
>
> For instance Boost::serialization lacks a way to ignore blocks or skip  
> chunks of data that are not recognized or obsolete.  You actually have  
> to load the obsolete thing into the proper (possibly obsolete) data  
> structure and then delete the unnecessary thing you just created.  This  
> is not good from the forwards/backwards compatibility view.  Old code  
> simply cannot read the file (even if it understands the majority of the  
> chunks that matter), and new code is forced to maintain old data  
> structures just for the purpose of loading up obsolete data and throwing  
> it away.
>
> How do you fix it?  Very simple really.  Just store the file as a series  
> of chunks with fixed length headers, and each header contains the length  
> of the data in that chunk.  If you get a chunk header with a tag you  
> don't understand, just ignore it.  A particular chunk can have  
> sub-chunks too.  I think it's similar in many ways to a grammar  
> definition:
>
>    file:
>      header chunklist
>
>    chunklist:
>      chunk
>      chunk chunklist
>
>    header:
>      typeIndicator versionNumber DataEndianness
>
>    chunk:
>      chunkHeader data
>
>    chunkHeader:
>      chunkType DataLength
>
>    data:
>      // Here's where you list all the types of data known to you
>
> Or something like that.
> I'd like a library that helps me read and write my data in that sort of  
> data-structure independent format.
>
> --bb

Take a look at the HDF file format that is used to serialize huge amounts  
of scientific data.
It implements a format that is very similar to the one you described.

http://www.hdfgroup.org/

Paulo

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/