<div class="gmail_quote">On Mon, Jul 19, 2010 at 03:19, Andrei Alexandrescu <span dir="ltr"><<a href="mailto:SeeWebsiteForEmail@erdani.org">SeeWebsiteForEmail@erdani.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
I agree that a findAll function yielding a range is nice, however it's not the simplest interface. I think there should be a function that just finds something somewhere.<div class="im"><br></div></blockquote><div><br>
Yes, I agree that's the common case. <br>What I personally would like is a way to find an element in an ordered container (binary tree...)<br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Now, for many needles... Currently, you pass the predicate "a==b" to<br>
find() and use it on all needles. A possible generalization is to have a<br>
isOneOf() predicate, called with isOneOf(a, b0, b1, ... bn) where a is<br>
the current element being tested and the b's are the needles passed to<br>
find(). This generic calling of pred with (a, needles) could be done for<br>
any predicate, like for example:<br>
<br>
find!(isBetween)(haystack, 0, 10)<br>
</blockquote>
<br></div>
Aren't such scenarios better served by putting the limits inside the predicate? Otherwise the list of what can be done could go on forever.</blockquote><div><br>The problem is that, as the predicate is passed at compile-time, the limits must be CT-defined too. I'd like the limits to be defined at runtime. I run into this regularly, while using predicates in std.algorithms.<br>
<br>A not-so-uncommon scenario for me is having one range delivering limits (or zones to explore, or any other runtime arguments), and finding a value inside those limits in another range. Ideally, I'd like to use find() on a spatial structure like an R-tree. ( <a href="http://en.wikipedia.org/wiki/R-tree">http://en.wikipedia.org/wiki/R-tree</a> ).<br>
<br>But I admit it's not that common.<br><br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im">
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
The limit of this is you cannot easily get the index of the found<br>
needle. btw, could you remind us why you used a 1-based index? My<br>
opinion on this is that if you want to find any needle, then you mostly<br>
care for them being present, and much less for wich one is the first.<br>
</blockquote>
<br></div>
I used one-based index to make tests easier - as you said, most often you care for the presence so zero/nonzero is easiest to tell. Then I thought it wouldn't hurt to provide the index since it's not any extra work.</blockquote>
<div><br>Ah, I thought the scenario was: 1) test for the returned range (as the tuple._0 element) being empty or not. 2) if not empty, something was found, get the index. I forgot getting 0 as index would mean 'no element found'.<br>
If I understood you correctly, then I'd prefer a -1 index indicating failure, and a 0-based indexing on the needles if on is found. But that's a cosmetic issue.<br><br><br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Also, the found element is accessible by taking the front of the retuned<br>
range. Why provide it to the caller?<br>
</blockquote>
<br></div>
Because that doesn't generalize well to searching ranges (as oposed to individual elements).</blockquote><div><br>Good point.<br><br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Another problem I see is to get the nice current behavior of having a<br>
needle being a range like haystack:<br>
<br>
find([0,1,2,3,4,3,2,1], [4,3]) => [4,3,2,1]<br>
<br>
And that's I particularly like.<br>
</blockquote>
<br></div>
Not sure I understand whether that's good or bad, </blockquote><div><br>That's good. I mean, the current situation is good and that's a functionality I like.<br><br> </div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
but I also want to provide the likes of findSkip() (it's been discussed here under a different name, findConsume) that positions the searched range after the found range:<br>
<br>
findSkip([0,1,2,3,4,3,2,1], [4,3]) => [2,1]<br>
<br>
It's very useful in practice.<br></blockquote><div><br>I see that. And and empty range if it found nothing? <br>Another useful one is split() to cut the range in two or three parts : before the match, the match itself (which can be one of the needles, so we do not know it in advance) and the part after that. Return that triple as a tuple. Hey, that way you can even implement replace: return chain(firstpart, replacedmatch, tailpart). <br>
<br>Anything that can mimic regex functions on generic ranges is good, IMO. I regularly find myself wanting to do some regex-like matching. We do not need real pattern matching but even with simplified predicates, we can do many things.<br>
<br> <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
*****<div class="im"><br>
Another aspect I'd like to discuss is use of Boyer-Moore and<br>
related fast finding techniques. Currently the use of B-M is explicit<br>
and a bit awkward. I was thinking instead to automatically detect its<br>
appropriatedness and use it transparently. Bearophile posted at a link<br>
to such a blend string search routine<br>
(<a href="http://effbot.org/zone/stringlib.htm" target="_blank">http://effbot.org/zone/stringlib.htm</a>) that I think can be generalized<br>
quite easily.<br></div><div class="im">
*****<br>
<br>
Indeed, reading this, it seems like it could be generalized easily. You<br>
need a random-access range with a defined length. So the usage of such<br>
an algo could indeed be done transparently. Hmm...<br>
</div></blockquote>
<br>
The current B-M implementation already works with any random-access range with length, but it's not integrated transparently with the regular find. I wonder what the threshold conditions might look like.<br></blockquote>
<div><br>I'm sorry I didn't look at your implementation, as I don't know Boyer-Moore that much. Does it need to know the alphabet size or do you use another variant?<br><br>Philippe<br></div></div>