cashew.utils.array 0.1a.2

Thu Sep 14 13:22:43 PDT 2006

Chris Nicholson-Sauls wrote:
> Mikola Lysenko wrote:
> 
>> Chris Nicholson-Sauls wrote:
>>
>>> /*********************************************************************************** 
>>>
>>>  * Return an array composed of common elements of two arrays.
>>>  */
>>> T[] intersect (T) (inout T[] alpha, inout T[] beta)
>>> body {
>>>   T[] result;
>>>
>>>   foreach (x; alpha) {
>>>     if (beta.contains(x)) {
>>>       result ~= x;
>>>     }
>>>   }
>>>   return result;
>>> }
>>> unittest {
>>>   writef("\t.intersect(T[]) ----> ");
>>>   auto foo = array!(int)(1, 2, 3, 4, 5      );
>>>   auto bar = array!(int)(      3, 4, 5, 6, 7);
>>>   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
>>>   writefln("Pass");
>>> }
>>>
>>> /*********************************************************************************** 
>>>
>>>  * Remove duplicate values from an array.
>>>  */
>>> void unique (T) (inout T[] haystack)
>>> body {
>>>   size_t ridx ;
>>>
>>>   for (int i = 0; i < haystack.length; i++) {
>>>     ridx = size_t.max;
>>>     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
>>>       haystack.removeIndex(ridx);
>>>     }
>>>   }
>>> }
>>> unittest {
>>>   writef("\t.unique() ----------> ");
>>>   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
>>>   foo.unique();
>>>   assert(foo == array!(int)(1, 2, 3, 4, 5));
>>>   writefln("Pass");
>>> }
>>>
>>
>> These algorithms can be implemented more efficiently.  The running 
>> time for the intersect is O(n * m) (where n,m are the length of the 
>> two arrays).  However, what about the case where alpha and beta are 
>> ordered?  Here is a simple algorithm for getting the intersect:
>>
>> T[] intersect (T) (inout T[] alpha, inout T[] beta)
>> body
>> {
>>
>> /*     Assume alpha, beta are ordered such that alpha[i] < alpha[i+1]
>>  *    and beta[i] < beta[i+1] for all i.
>>  */
>>
>>         T[] result;
>>         size_t a_pos = 0;
>>         size_t b_pos = 0;
>>
>>         while(a_pos < alpha.length && b_pos < beta.length)
>>         {
>>                 if(alpha[a_pos] == beta[b_pos])
>>                 {
>>                         result ~= alpha[a_pos];
>>                         ++a_pos;
>>                         ++b_pos;
>>                 }
>>                 else if(alpha[a_pos] < beta[b_pos])
>>                         ++a_pos;
>>                 else if(beta[b_pos] < alpha[a_pos])
>>                         ++b_pos;
>>         }
>>
>>         return result;
>> }
>>
>>
>> The cost for this algorithm is only O(n + m).  However, ordering the 
>> two arrays is effectively an O(nlogn + mlogm) operation.  Therefore 
>> the running time for this modified algorithm is overall O(nlogn + 
>> mlogm).  A similar argument can be made for the unique function.
> 
> 
> True, and not a bad idea.  I'm not sure how it would be with arrays of 
> classes/structs which have no defined order.  I'll try a rewrite and 
> post it later.  :)  Maybe I should just ask for a DSource entry for Cashew.
> 
> -- Chris Nicholson-Sauls

Well I just tried it and... woo, intersect is notably faster indeed.  I did a benchmark 
with arrays of 1000 elements (random values), reset and intersected 1000 times, and the 
numbers are striking:

<Benchmark old> 19.330000
<Benchmark new> 3.080000

I'll rewrite unique later, for now here's an attachment of the module with the rewritten 
intersect, which is actually almost identical to yours, so I'll be sure and include credit 
for it.  :)

-- Chris Nicholson-Sauls
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: array.d
Url: http://lists.puremagic.com/pipermail/digitalmars-d-announce/attachments/20060914/09f46c60/attachment.txt