XML API

Tue May 26 05:22:30 PDT 2009

On 2009-05-24 20:31:05 -0400, Daniel Keep <daniel.keep.lists at gmail.com> said:

> Michel Fortin wrote:
>> On 2009-05-24 12:51:43 -0400, Daniel Keep <daniel.keep.lists at gmail.com>
>> said:
> 
> (Cutting us mostly going back-and-forth on what a callback api would
> look like.
> 
>>> ...
>>> 
>>> Like I said, this seems like a lot of work to bolt a callback interface
>>> onto something a pull api is designed for.
>>> 
>>> ...
>>> 
>>> Except of course that you now can't easily control the loop, nor can do
>>> you do fall-through on the cases.
>> 
>> Again, my definition of a callback API doesn't include an implicit loop,
>> just a callback. And I intend the callback to be a template argument so
>> it can be dispatched using function overloading and/or function
>> templates. So you'll have this instead:
>> 
>> bool continue = true;
>> do
>> continue = pp.readNext!(callback)();
>> while (continue);
>> 
>> void callback(OpenElementToken t) { blah(t.name); }
>> void callback(CloseElementToken t) { ... }
>> void callback(CharacterDataToken t) { ... }
>> ...
>> 
>> No switch statement and no inversion of control.
> 
> Except that you can't define overloads of a function inside a function.

I didn't know that. Interesting point.

Perhaps that's just a bug in the compiler that we could get fixed 
though. Any clue on that? I notice it also happen if you want to 
specialize a nested template function.

> Which means you have to stuff all of your code in a set of increasingly
> obtusely-named globals or private members.  Like elemAStart, elemAData,
> elemAAttr, elemAClose, elemBStart, elemBData, elemBAttr, ...

But when inside a function you can still dispatch using a nested 
function template:

	void callback(T)(T t)
	{
		static if (is(T : OpenElementToken))
		{
			blah(t.name);
		}
		static if (is(T : CloseElementToken))
		{
			...
		}
	}

It sure is a little less elegant, but you still skip a switch.

> ...
> And at that point, I've just reinvented SAX.  Well, almost.  I have
> control over the loop.  I still can't simply break out of it; I've got
> to mess around with flags to get that done.
> 
> Meanwhile, if I write that code with a PullParser, it's just a
> collection of normal functions, one per element type with all the
> related code together in one place.  Or, if I don't want them all
> bundled together, I can dispatch to smaller functions.

There's no way I'm not including a pull API, most likely implemented as 
a range.

> I have a feeling you're going to head down this path irrespective, so
> I'll just hope you can figure out a way to make the api not suck.

I want to offer at least two API options (so you can choose the most 
appropriate parser API for what you do), and I want all of them to 
share the same underlying parser (so I don't write two or three 
parsers) with no compromise on speed.

I'm now realizing that an inversion of control can increase the 
performance of the parser by not having to rebranch on the current 
state each time you ask for a new token. I don't want to force 
inversion of control to anyone, but surely an API with inversion of 
control should be possible at full speed, and it can't be built on top 
of a pull parser.

So basically, the way I see it, you'd have two APIs: the inversion of 
control callback parser (for which you can specify a stop criterion so 
that it saves it state and release control) and the range parser. The 
range is built on top of the inversion of control parser with a stop 
criterion making it stop and save its state after each token. With 
inlining, both APIs should run at optimal speed.

Perhaps you'll say that it's complicated, but if you have a better idea 
capable of extracting a maximum of performance for both parser APIs, 
then I'd like to know.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/