Slower than Python

Steven Schveighoffer schveiguy at yahoo.com
Mon Mar 4 07:39:00 PST 2013


On Sat, 02 Mar 2013 17:29:38 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail at erdani.org> wrote:

> On 3/2/13 2:32 PM, Steven Schveighoffer wrote:
>> This is not a personal attack, please don't take it that way.
>
> Not at all. I'm just confused about this exchange. You mention some  
> correct facts ("MySplitter finished in 6 seconds instead of 10"), then  
> some mistaken suppositions ("a string-specific version could be  
> written"), and derive conclusions from both ("Maybe there is a bad  
> constraint somewhere"). Hard to deal with the mix.

I didn't realize the situation.  I assumed that there wasn't a version of  
splitter for strings that took array-specific shortcuts.  My corrected  
statement should read "the existing string-specific version could be  
improved."

My conclusion of "Maybe there is a bad constraint somewhere" was not  
derived from that, it was based on your statement elsewhere that "I wrote  
a custom splitter specialized for arrays. It's as fast."

Given that my tests have shown I can write a faster one quite easily than  
the implementation selected by phobos, and that you said you wrote one  
that's "as fast," I took that to mean you had written one that was more  
optimized than the chosen splitter version (and logically assumed you had  
included this version in Phobos with the intent it would be chosen when  
compiling with strings), leading me to suggest possibly that due to some  
bug, the "fast" implementation wasn't being chosen.  I didn't realize that  
"as fast" didn't mean "as fast as yours".  I actually don't know what you  
meant by that now.

>> My
>> anecdotal tests with hand writing a custom splitter range to handle the
>> OP's program gave me a 40% savings. Whether it's find or not, I'm not
>> sure, but there definitely is room for improvement.
>
> I think I understand where the confusion comes from. If you're referring  
> to MySplitter, that's not comparable. It uses this at the core:
>
> for(; i + 1 < source.length; i++)
> {
>      if(source[i] == '\r' && source[i + 1] == '\n')
>      {
>          found = true;
>          break;
>      }
>      ...
> }
>
> This is not close to the code that would work with arrays. We could  
> specialize things for small arrays, but that hasn't been done yet. My  
> point is it's not comparable with the classic brute force subsequence  
> search.

Very good point, here is a new version that takes any string as a  
separator:

struct MySplitter
{
     private string s;
     private string separator;
     private string source;
     this(string src, string sep)
     {
         source = src;
         separator = sep;
         popFront();
     }

     @property string front()
     {
         return s;
     }

     @property bool empty()
     {
         return s.ptr == null;
     }

     void popFront()
     {
         if(!source.length)
         {
             s = source;
             source = null;
         }
         else
         {
             size_t i = 0;
             bool found = false;
             for(; i + separator.length <= source.length; i++)
             {
                 if(source[i] == separator[0])
                 {
                     found = true;
                     for(size_t j = i+1, k=1; k < separator.length; ++j,  
++k)
                         if(source[j] != separator[k])
                         {
                             found = false;
                             break;
                         }
                     if(found)
                         break;
                 }
             }
             s = source[0..i];
             if(found)
                 source = source[i + separator.length..$];
             else
                 source = source[$..$];
         }
     }
}

Takes 7 seconds on my machine instead of 6, but not 10 like  
std.algorithm.splitter.  I don't even like the loop that well, it looks  
crude, I can probably optimize it further.

And it does not use any of your specified tricks.

Is this sufficiently comparable, or am I missing something else?

-Steve


More information about the Digitalmars-d mailing list