XML API
Michel Fortin
michel.fortin at michelf.com
Tue May 26 05:22:30 PDT 2009
On 2009-05-24 20:31:05 -0400, Daniel Keep <daniel.keep.lists at gmail.com> said:
> Michel Fortin wrote:
>> On 2009-05-24 12:51:43 -0400, Daniel Keep <daniel.keep.lists at gmail.com>
>> said:
>
> (Cutting us mostly going back-and-forth on what a callback api would
> look like.
>
>>> ...
>>>
>>> Like I said, this seems like a lot of work to bolt a callback interface
>>> onto something a pull api is designed for.
>>>
>>> ...
>>>
>>> Except of course that you now can't easily control the loop, nor can do
>>> you do fall-through on the cases.
>>
>> Again, my definition of a callback API doesn't include an implicit loop,
>> just a callback. And I intend the callback to be a template argument so
>> it can be dispatched using function overloading and/or function
>> templates. So you'll have this instead:
>>
>> bool continue = true;
>> do
>> continue = pp.readNext!(callback)();
>> while (continue);
>>
>> void callback(OpenElementToken t) { blah(t.name); }
>> void callback(CloseElementToken t) { ... }
>> void callback(CharacterDataToken t) { ... }
>> ...
>>
>> No switch statement and no inversion of control.
>
> Except that you can't define overloads of a function inside a function.
I didn't know that. Interesting point.
Perhaps that's just a bug in the compiler that we could get fixed
though. Any clue on that? I notice it also happen if you want to
specialize a nested template function.
> Which means you have to stuff all of your code in a set of increasingly
> obtusely-named globals or private members. Like elemAStart, elemAData,
> elemAAttr, elemAClose, elemBStart, elemBData, elemBAttr, ...
But when inside a function you can still dispatch using a nested
function template:
void callback(T)(T t)
{
static if (is(T : OpenElementToken))
{
blah(t.name);
}
static if (is(T : CloseElementToken))
{
...
}
}
It sure is a little less elegant, but you still skip a switch.
> ...
> And at that point, I've just reinvented SAX. Well, almost. I have
> control over the loop. I still can't simply break out of it; I've got
> to mess around with flags to get that done.
>
> Meanwhile, if I write that code with a PullParser, it's just a
> collection of normal functions, one per element type with all the
> related code together in one place. Or, if I don't want them all
> bundled together, I can dispatch to smaller functions.
There's no way I'm not including a pull API, most likely implemented as
a range.
> I have a feeling you're going to head down this path irrespective, so
> I'll just hope you can figure out a way to make the api not suck.
I want to offer at least two API options (so you can choose the most
appropriate parser API for what you do), and I want all of them to
share the same underlying parser (so I don't write two or three
parsers) with no compromise on speed.
I'm now realizing that an inversion of control can increase the
performance of the parser by not having to rebranch on the current
state each time you ask for a new token. I don't want to force
inversion of control to anyone, but surely an API with inversion of
control should be possible at full speed, and it can't be built on top
of a pull parser.
So basically, the way I see it, you'd have two APIs: the inversion of
control callback parser (for which you can specify a stop criterion so
that it saves it state and release control) and the range parser. The
range is built on top of the inversion of control parser with a stop
criterion making it stop and save its state after each token. With
inlining, both APIs should run at optimal speed.
Perhaps you'll say that it's complicated, but if you have a better idea
capable of extracting a maximum of performance for both parser APIs,
then I'd like to know.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list