Finalizing D2

Sun May 24 04:49:40 PDT 2009

On 2009-05-24 03:22:47 -0400, Daniel Keep <daniel.keep.lists at gmail.com> said:

> Callbacks are "easier" to set up, but are incredibly complicated for any
> sort of structured parsing.  The problem is that you can't easily change
> the behaviour of the parser once it's started.
> 
> I had to write a SAX parser for a structured data format a few years
> ago.  I swear that 90% of the code (and it's a monstrously huge module)
> was just boilerplate to work around the bloody callback system.  I've
> come to the conclusion that the SAX api is about the worse POSSIBLE way
> of parsing anything more complex than a flat file that shouldn't have
> been XML in the first place.

A callback API isn't necessarily SAX. A callback API doesn't 
necessarily have to parse everything until completion, it could parse 
only the next token and call the appropriate callback.

If I can construct a range class/struct over my callback API I'll be 
happy. And if I can recursively call the parser API inside a callback 
handler so I can reuse the call stack while parsing then I'll be very 
happy.

> Something like Tango's PullParser is the superior API because although
> it's more verbose up-front, that's as bad as it gets.  Plus, you can
> actually do stuff like call subroutines.

All that is needed really is a callback system that parses only one 
token. Then the callback can update the PullParser state, or the 
token-range state, run in a loop to produce a SAX-like API, or directly 
do what you want to do, which may include parsing more tokens using 
different callbacks until you reach a closing tag.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/