New XML parser written for D1 and D2.

Wed Oct 14 17:24:35 PDT 2009

Justin Johansson wrote:
> Andrei Alexandrescu Wrote:
> 
>> Saaa wrote:
>>> Michael Rynn wrote
>>>> Where and to whom can I post the 56 KB source code zip?
>>> Attaching it to an enhancement in bugzilla would be best, I think. 
>> Yes please. Making the code work with ranges as input would be great.
>>
>> Andrei
> 
> Hi Andrei,
> 
> Still being a D apprentice and not 100% conversant with D terminology yet, I assume,
> and not wanting to make an *ass* out of *u* and *me* :-),
> that by "ranges" you mean making use of D sub char[] arrays over the input so as to
> minimize/obviate the need to allocate lots of small(er) strings to hold element tagnames,
> attribute names and values, text node contents and so on.

He meant range structs as found in std.range and their array wrappers in 
std.array.

> This assumption being correct, can you confirm or otherwise that the consequence of such
> a design would mean that by parsing, say, a 1MB XML in-memory document, constructing a
> node tree from the same and having the nodes directly referencing substrings in the input document via string "ranges", the entire 1MB would be locked into memory by the GC and not
> collectable until the node tree itself is done with?

That is not the goal of ranges, a memory mapped file would be more 
efficient for what you describe.

A range is D's version of streams, so for example a simple reader might 
look like:

void read(T)(in T range) if(isInputRange!T) {
	while(!range.empty()) {
		auto elem = range.front();
		// process element
		range.popFront();
	}
}

The range implementation can be a simple 'string', a 'char[]', or a 
custom network channel that blocks on front() if the data is still loading.

> Now I might be completely off track;  perhaps instead you are thinking of SAX style
> parsing and passing arguments to the SAX event handling function via the said ranges.  In
> this scenario I guess the SAX client could decide whether or not to .dup the ranges.

I think you confuse ranges with slices. Ranges are simply an interface 
for sequential or random data access. DOM trees and SAX callbacks are 
different methods of parsing the xml, a range is a method of accessing 
the data :)

Speaking of SAX, do we have a D implementation yet? If not I could write 
one, it sounds fun.

> Depending on your clarification, I may have further comment based upon my practical
> experience in the XML domain.
> 
> Regards
> 
> Justin Johansson
>