std.xml and Adam D Ruppe's dom module

Adam D. Ruppe destructionator at gmail.com
Wed Feb 8 07:19:04 PST 2012


On Wednesday, 8 February 2012 at 08:12:57 UTC, Johannes Pfau 
wrote:
> Use buffering, return strings(better:
> w/d/char[]) as slices to that buffer. If the user needs to keep 
> a string, he can still copy it. (String decoding should also be 
> done on-demand only).

The way Document.parse works now in my code is with slices.
I think the best way to speed mine up is to untangle the mess
of recursive nested functions.

Last time I attacked dom.d with the profiler, I found a lot
of time was spent on string decoding, which looked like this:

foreach(c; str) { if(isEntity) value ~= decoded(value); else 
value ~= c; }

basically.


This reallocation was slow... but I got a huge speedup, not by
skipping decoding, but by scanning it first:

bool decode = false;
foreach(c; str) { if(c == '&') { decode = true; break; } }

if(!decode) return str;
// still uses the old decoder, which is the fastest I could find;
// ~= actually did better than appender in my tests!




But, quickly scanning the string and skipping the decode loop if
there are no entities about IIRC tripled the parse speed.

Right now, if I comment the decode call out entirely, there's very
little difference in speed on the data I've tried, so I think
decoding like this works well.


More information about the Digitalmars-d mailing list