Slower than Python
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Sat Mar 2 14:29:38 PST 2013
On 3/2/13 2:32 PM, Steven Schveighoffer wrote:
> This is not a personal attack, please don't take it that way.
Not at all. I'm just confused about this exchange. You mention some
correct facts ("MySplitter finished in 6 seconds instead of 10"), then
some mistaken suppositions ("a string-specific version could be
written"), and derive conclusions from both ("Maybe there is a bad
constraint somewhere"). Hard to deal with the mix.
> My
> anecdotal tests with hand writing a custom splitter range to handle the
> OP's program gave me a 40% savings. Whether it's find or not, I'm not
> sure, but there definitely is room for improvement.
I think I understand where the confusion comes from. If you're referring
to MySplitter, that's not comparable. It uses this at the core:
for(; i + 1 < source.length; i++)
{
if(source[i] == '\r' && source[i + 1] == '\n')
{
found = true;
break;
}
...
}
This is not close to the code that would work with arrays. We could
specialize things for small arrays, but that hasn't been done yet. My
point is it's not comparable with the classic brute force subsequence
search.
What is comparable is shown below. indexOf2 defines the straight
array-in-array brute force search. With dmd on my machine it runs in
some 7 seconds. So does indexOf (this was my point). Then indexOf3 uses
a simple trick that the Java implementation does: cache the first
character. That reduces the running time to 5 seconds. Then indexOf4
optimizes the loop searching that first character, bringing run time to
about 4 seconds. On that machine the Java implementation runs in 3.5
seconds.
Andrei
import std.stdio, std.string, std.datetime;
long indexOf2(string haystack, string needle)
{
if (!needle.length) return 0;
if (needle.length > haystack.length) return -1;
auto limit = haystack.length - needle.length;
bigloop:
for (size_t i; i <= limit; ++i)
{
foreach (q; 0 .. needle.length)
{
if (haystack[i + q] != needle[q]) continue bigloop;
}
return i;
}
return -1;
}
long indexOf3(string haystack, string needle)
{
if (!needle.length) return 0;
if (needle.length > haystack.length) return -1;
immutable c = needle[0];
auto limit = haystack.length - needle.length;
assert(limit < uint.max, "not handled");
bigloop:
for (uint i; i <= limit; ++i)
{
if (haystack.ptr[i] != c) continue;
foreach (q; 1 .. needle.length)
{
if (haystack[i + q] != needle[q]) continue bigloop;
}
return i;
}
return -1;
}
long indexOf4(string haystack, string needle)
{
if (!needle.length) return 0;
if (needle.length > haystack.length) return -1;
immutable c = needle[0];
auto limit = haystack.length - needle.length;
size_t i = 0;
bigloop:
for (;;)
{
while (i <= limit && haystack[i++] != c) {}
if (i-- > limit) break;
foreach (q; 1 .. needle.length)
{
if (haystack[i + q] != needle[q]) continue bigloop;
}
return i;
}
return -1;
}
unittest
{
assert(indexOf2("hello, world", ",") == 5);
assert(indexOf2("hello, world", "a") == -1);
assert(indexOf3("hello, world", ",") == 5);
assert(indexOf3("hello, world", "a") == -1);
assert(indexOf4("hello, world", ",") == 5);
assert(indexOf4("hello, world", "a") == -1);
}
void main(string[] args)
{
auto message = "REGISTER sip:example.com SIP/2.0\r\nContent-Length:
0\r\nContact:
<sip:12345 at 10.1.3.114:59788;transport=tls>;expires=4294967295;events=\"message-summary\";q=0.9\r\nTo:
<sip:12345 at comm.example.com>\r\nUser-Agent: (\"VENDOR=MyCompany\" \"My
User Agent\")\r\nMax-Forwards: 70\r\nCSeq: 1 REGISTER\r\nVia:
SIP/2.0/TLS 10.1.3.114:59788;branch=z9hG4bK2910497772630690\r\nCall-ID:
2910497622026445\r\nFrom:
<sip:12345 at comm.example.com>;tag=2910497618150713\r\n\r\n";
auto t1 = Clock.currTime();
for (int i=0; i<10000000; i++)
{
long index1 = 0;
while (true)
{
auto index2 = indexOf(message[index1 .. $], "\r\n");
if (index2 == -1)
break;
auto notused = message[index1 .. index1 + index2];
if (args.length == 42) writeln(notused);
index1 += index2 + 2;
}
}
writeln(Clock.currTime()-t1);
}
More information about the Digitalmars-d
mailing list