faster splitter
Patrick Schluter via Digitalmars-d
digitalmars-d at puremagic.com
Fri May 27 12:34:20 PDT 2016
On Friday, 27 May 2016 at 19:17:44 UTC, Chris wrote:
> Oops, I've been found out. :-) Thanks. You're right of course,
> and I've already noticed that bug as it fails on not found. I
> got the bounds wrong.
I had the same "bug" when I wrote my search function on the
project at work. I also found out that using the simplest loop is
the generally the best way to go. The processors nowaday have
loop-caches, fast level1 caches etc, all the fancy search
algorithms like Boyer-Moore etc. can only overcome their memory
and initialisation overhead for really huge haystacks.
Here's the C code of my memmem() function that replaced the
Boyer-Moore I had before. This implementation outran the BM
search for sizes up to several hundreds megaytes on SPARC64 VII+
machine. On x86-64 I'm sure it would even be worse.
void * memmem(const void *haystack, size_t haystack_len, const
void *needle, size_t needle_len)
{
if(!haystack_len || !needle_len || needle_len > haystack_len)
return NULL;
uint8_t u8 = *((uint8_t *)needle);
const uint8_t *p = (const uint8_t *) haystack;
for(size_t i = haystack_len - needle_len + 1; i; --i) {
if(*p == u8 && !memcmp(p, needle, needle_len))
return (void *)p;
else
p++;
}
return NULL;
}
(sorry for posting C code, as I'm only beginning to learn D).
More information about the Digitalmars-d
mailing list