Which option is faster...
JS
js.mdnq at gmail.com
Mon Aug 5 10:30:36 PDT 2013
On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
>
> Greetings!
>
> I have this code,
>
> foreach (...)
> {
>
> if (std.string.tolower(fext[0]) == "doc" ||
> std.string.tolower(fext[0]) == "docx" ||
> std.string.tolower(fext[0]) == "xls" ||
> std.string.tolower(fext[0]) == "xlsx" ||
> std.string.tolower(fext[0]) == "ppt" ||
> std.string.tolower(fext[0]) == "pptx")
> continue;
> }
>
> foreach (...)
> {
> if (std.string.tolower(fext[0]) == "doc")
> continue;
> if (std.string.tolower(fext[0]) == "docx")
> continue;
> if (std.string.tolower(fext[0]) == "xls")
> continue;
> if (std.string.tolower(fext[0]) == "xlsx")
> continue;
> if (std.string.tolower(fext[0]) == "ppt")
> continue;
> if (std.string.tolower(fext[0]) == "pptx")
> continue;
> ...
> ...
> }
>
> thanks.
>
> josé
Both are slow, those trying to optimize out tolower are missing
the boat. A good compiler should optimize that out.
The issue is that n compare are done in those cases with each
successive compare getting slower and slower. if the match is doc
then it happens in one compare in both cases. If it's pptx then
it is 6. So if there are many pptx matches then the code will
perform slower than for doc.
This can be good or bad behavior depending on if you are
guarantee that your order is representative of what actually
happens(many doc's few pptx's).
To get O(1) behavior, you must use some sort of jump list. A hash
table can provide such behavior. If there are many comparisons
needed then at some point this will be, on average, faster.
If performance is critical and the string length is small enough
then the fastest method is to use a direct lookup table.
For n character of char. It is 256^n. For n = 4, this is 4GB of
memory! (This is one reason to use a hash table). Since you are
not using the full ascii you could compact this a bit, say 36^n,
for n = 4 it is only 1.6MB of memory. The cost is packing the
strings for lookup and computing the index.
One way to speed up the loop is to also compare for what is not
used. In this case you can compare just on the first char.
if (fext[0][0]) == "d")
{
...
}
if (fext[0][0]) == "x")
{
...
}
if (fext[0][0]) == "p")
{
...
}
This pares down the results. In this case it is only 3
comparisions + 2 comparisons = 5 for pptx rather than 6.
You can try and develop your own hash to get optimal amortized
performance.
More information about the Digitalmars-d-learn
mailing list