Which option is faster...

Mon Aug 5 10:30:36 PDT 2013

On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
>
> Greetings!
>
> I have this code,
>
> foreach (...)
> {
>
>   if (std.string.tolower(fext[0]) == "doc" ||
>     std.string.tolower(fext[0]) == "docx" ||
>     std.string.tolower(fext[0]) == "xls" ||
>     std.string.tolower(fext[0]) == "xlsx" ||
>     std.string.tolower(fext[0]) == "ppt" ||
>     std.string.tolower(fext[0]) == "pptx")
>    continue;
> }
>
> foreach (...)
> {
>   if (std.string.tolower(fext[0]) == "doc")
>     continue;
>   if (std.string.tolower(fext[0]) == "docx")
>     continue;
>   if (std.string.tolower(fext[0]) == "xls")
>     continue;
>   if (std.string.tolower(fext[0]) == "xlsx")
>     continue;
>   if (std.string.tolower(fext[0]) == "ppt")
>     continue;
>   if (std.string.tolower(fext[0]) == "pptx")
>    continue;
>   ...
>   ...
> }
>
> thanks.
>
> josé

Both are slow, those trying to optimize out tolower are missing 
the boat. A good compiler should optimize that out.

The issue is that n compare are done in those cases with each 
successive compare getting slower and slower. if the match is doc 
then it happens in one compare in both cases. If it's pptx then 
it is 6. So if there are many pptx matches then the code will 
perform slower than for doc.

This can be good or bad behavior depending on if you are 
guarantee that your order is representative of what actually 
happens(many doc's few pptx's).

To get O(1) behavior, you must use some sort of jump list. A hash 
table can provide such behavior. If there are many comparisons 
needed then at some point this will be, on average, faster.

If performance is critical and the string length is small enough 
then the fastest method is to use a direct lookup table.

For n character of char. It is 256^n. For n = 4, this is 4GB of 
memory! (This is one reason to use a hash table). Since you are 
not using the full ascii you could compact this a bit, say 36^n, 
for n = 4 it is only 1.6MB of memory. The cost is packing the 
strings for lookup and computing the index.

One way to speed up the loop is to also compare for what is not 
used. In this case you can compare just on the first char.

    if (fext[0][0]) == "d")
    {
       ...
    }
    if (fext[0][0]) == "x")
    {
       ...
    }
    if (fext[0][0]) == "p")
    {
       ...
    }

This pares down the results. In this case it is only 3 
comparisions + 2 comparisons = 5 for pptx rather than 6.

You can try and develop your own hash to get optimal amortized 
performance.