Which option is faster...

Mon Aug 5 10:04:34 PDT 2013

On Monday, 5 August 2013 at 15:18:42 UTC, John Colvin wrote:
>
> better:
>
> foreach (...)
> {
>     auto tmp = std.string.tolower(fext[0]);
>     if(tmp == "doc" || tmp == "docx"
>        || tmp == "xls" || tmp == "xlsx"
>        || tmp == "ppt" || tmp == "pptx")
>     {
>         continue;
>     }
> }
>
> but still not super-fast as (unless the compiler is very 
> clever) it still means multiple passes over tmp. Also, it 
> converts the whole string to lower case even when it's not 
> necessary.
>
> If you have large numbers of possible matches you will probably 
> want to be clever with your data structures / algorithms. E.g.
>
> You could create a tree-like structure to quickly eliminate 
> possibilities as you read successive letters. You read one 
> character, follow the appropriate branch, check if there are 
> any further branches, if not then no match and break. Else, 
> read the next character and follow the appropriate branch and 
> so on.... Infeasible for large (or even medium-sized) 
> character-sets without hashing, but might be pretty fast for 
> a-z and a large number of short strings.

Arguably, you'd do even better with:
foreach (...)
{
     auto tmp = std.string.tolower(fext[0]);
     switch(tmp)
     {
         case "doc", "docx":
         case "xls", "xlsx":
         case "ppt", "pptx":
             continue;
         default:
     }
}

Since it gives the compiler more wiggle room to optimize things 
as it wishes. For example, it *could* (who knows :D !) implement 
the switch as a hash table, or a tree.

BTW, a very convenient "tree-like" structure that could be used 
is a "heap": it is a basic binary tree, but stored inside an 
array. You could build it during compilation, and then simply 
search it.

A possible optimization is to first switch on string length:

foreach (...)
{
     auto tmp = std.string.tolower(fext[0]);
     switch(tmp.length)
     {
         case 3:
         switch(tmp)
         {
             case "doc", "xls", "ppt":
                 continue;
             default:
         }
         break;

         case 4:
         switch(tmp)
         {
             case "docx", "xlsx", "pptx":
                 continue;
             default:
         }
         break;

         default:
     }
}

That said, I'm not even sure this would be faster, so a bench 
would be called for. Further more, I'd really be tempted to say 
that at this point, we are in the realm of premature optimization.