XML Benchmarks in D

Fri Mar 14 05:51:47 PDT 2008

On Fri, 14 Mar 2008 11:40:20 +0300, Alexander Panek  
<alexander.panek at brainsware.org> wrote:

> BCS wrote:
>> Reply to kris,
>>
>>> BCS Wrote:
>>>
>>>> what might be interesting is to make a version that works with slices
>>>> of the file rather than ram. (make the current version into a
>>>> template specialized on char[] and the new one on some new type?)
>>>> That way only the parsed meta data needs to stay in ram. It would
>>>> take a lot of games mapping stuff in and out of ram but it would be
>>>> interesting to see if it could be done.
>>>>
>>> It would be interesting, but isn't that kinda what memory-mapped files
>>> provides for? You can operate with files up to 4GB in size (on a 32bit
>>> system), even with DOM, where the slices are virtual addresses within
>>> paged file-blocks. Effectively, each paged segment of the file is a
>>> lower-level slice?
>>>
>>  Not as I understand it (I looked this up about a year ago so I'm a bit  
>> rusty). on 32bits, you can't map in 4GB because you need space for the  
>> programs code (and on windows you only get 3GB of address space as the  
>> OS gets that last GB) Also what about a 10GB file? My idea is to make  
>> some sort of lib that lest you handle larges data sets (64bit?) You  
>> would ask for a file to be "mapped in" and then you would get an object  
>> that syntactically looks like an array. Indexes ops would actually map  
>> in pieces, slices would generate new objects (with ref to the parent)  
>> that would, on demand, map stuff in. Some sort of GCish thing would  
>> start un mapping/moving stings when space gets tight. If you never have  
>> to actual convert the data to a "real" array you don't ever need to  
>> copy the stuff, you can just leave it in the file. I'm not sure it's  
>> even possible or how it would work, but it would be cool. (and highly  
>> useful)
>
>
> I've got this strange feeling in my stomach that shouts out "WTF?!" when  
> I read about >3-4GB XML files. I know, it's about the "if" and "whens",  
> but still; if you find yourself needing such a beast of an XML file, you  
> might possibly think of other forms of data structuring (a database,  
> perhaps?).
>

It sounds strange, but even large companies like Google or Yahoo store  
their temporary search indexes in ULTRA large XML files, and many of them  
can easily be tens or even hundreds of GBs in size (just ordinary daily  
index) before they get "repacked" into compacter format.