Which option is faster...

H. S. Teoh hsteoh at quickfur.ath.cx
Mon Aug 5 10:50:46 PDT 2013


On Mon, Aug 05, 2013 at 07:30:36PM +0200, JS wrote:
> On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
> >
> >Greetings!
> >
> >I have this code,
> >
> >foreach (...)
> >{
> >
> >  if (std.string.tolower(fext[0]) == "doc" ||
> >    std.string.tolower(fext[0]) == "docx" ||
> >    std.string.tolower(fext[0]) == "xls" ||
> >    std.string.tolower(fext[0]) == "xlsx" ||
> >    std.string.tolower(fext[0]) == "ppt" ||
> >    std.string.tolower(fext[0]) == "pptx")
> >   continue;
> >}
> >
> >foreach (...)
> >{
> >  if (std.string.tolower(fext[0]) == "doc")
> >    continue;
> >  if (std.string.tolower(fext[0]) == "docx")
> >    continue;
> >  if (std.string.tolower(fext[0]) == "xls")
> >    continue;
> >  if (std.string.tolower(fext[0]) == "xlsx")
> >    continue;
> >  if (std.string.tolower(fext[0]) == "ppt")
> >    continue;
> >  if (std.string.tolower(fext[0]) == "pptx")
> >   continue;
> >  ...
> >  ...
> >}
> >
> >thanks.
> >
> >josé
> 
> 
> Both are slow, those trying to optimize out tolower are missing the
> boat. A good compiler should optimize that out.
> 
> The issue is that n compare are done in those cases with each
> successive compare getting slower and slower. if the match is doc
> then it happens in one compare in both cases. If it's pptx then it
> is 6. So if there are many pptx matches then the code will perform
> slower than for doc.
> 
> 
> This can be good or bad behavior depending on if you are guarantee
> that your order is representative of what actually happens(many
> doc's few pptx's).
> 
> To get O(1) behavior, you must use some sort of jump list. A hash
> table can provide such behavior. If there are many comparisons
> needed then at some point this will be, on average, faster.
[...]

There's no need to reinvent the wheel here. std.regex.ctRegex will
create the equivalent of jump tables for you.

Besides, I doubt the OP's real performance bottleneck is in this code;
sounds like the program is I/O bound. Compared to I/O, even repeatedly
calling std.string.tolower repeatedly is just roundoff error in terms of
total execution time. Using a profiler will reveal the *real*
bottleneck. Based on past experience, I'm expecting that it will reveal
that the bottleneck isn't anywhere we're predicting it is. :)


T

-- 
VI = Visual Irritation


More information about the Digitalmars-d-learn mailing list