Which option is faster...

Mon Aug 5 10:09:32 PDT 2013

On Monday, 5 August 2013 at 17:04:36 UTC, monarch_dodra wrote:
> On Monday, 5 August 2013 at 15:18:42 UTC, John Colvin wrote:
>>
>> better:
>>
>> foreach (...)
>> {
>>    auto tmp = std.string.tolower(fext[0]);
>>    if(tmp == "doc" || tmp == "docx"
>>       || tmp == "xls" || tmp == "xlsx"
>>       || tmp == "ppt" || tmp == "pptx")
>>    {
>>        continue;
>>    }
>> }
>>
>> but still not super-fast as (unless the compiler is very 
>> clever) it still means multiple passes over tmp. Also, it 
>> converts the whole string to lower case even when it's not 
>> necessary.
>>
>> If you have large numbers of possible matches you will 
>> probably want to be clever with your data structures / 
>> algorithms. E.g.
>>
>> You could create a tree-like structure to quickly eliminate 
>> possibilities as you read successive letters. You read one 
>> character, follow the appropriate branch, check if there are 
>> any further branches, if not then no match and break. Else, 
>> read the next character and follow the appropriate branch and 
>> so on.... Infeasible for large (or even medium-sized) 
>> character-sets without hashing, but might be pretty fast for 
>> a-z and a large number of short strings.
>
> Arguably, you'd do even better with:
> foreach (...)
> {
>     auto tmp = std.string.tolower(fext[0]);
>     switch(tmp)
>     {
>         case "doc", "docx":
>         case "xls", "xlsx":
>         case "ppt", "pptx":
>             continue;
>         default:
>     }
> }
>
> Since it gives the compiler more wiggle room to optimize things 
> as it wishes. For example, it *could* (who knows :D !) 
> implement the switch as a hash table, or a tree.
>
> BTW, a very convenient "tree-like" structure that could be used 
> is a "heap": it is a basic binary tree, but stored inside an 
> array. You could build it during compilation, and then simply 
> search it.
>
> A possible optimization is to first switch on string length:
>
> foreach (...)
> {
>     auto tmp = std.string.tolower(fext[0]);
>     switch(tmp.length)
>     {
>         case 3:
>         switch(tmp)
>         {
>             case "doc", "xls", "ppt":
>                 continue;
>             default:
>         }
>         break;
>
>         case 4:
>         switch(tmp)
>         {
>             case "docx", "xlsx", "pptx":
>                 continue;
>             default:
>         }
>         break;
>
>         default:
>     }
> }
>
> That said, I'm not even sure this would be faster, so a bench 
> would be called for. Further more, I'd really be tempted to say 
> that at this point, we are in the realm of premature 
> optimization.

:-)  Now that's funny.  He he he he...
By the way, your coding makes sense.  For 100K plus files, it may 
not mean much, but we are thinking that this amount will be only 
growing.