cashew.utils.array 0.1a.2

Thu Sep 14 12:32:54 PDT 2006

Mikola Lysenko wrote:
> Chris Nicholson-Sauls wrote:
> 
>> /*********************************************************************************** 
>>
>>  * Return an array composed of common elements of two arrays.
>>  */
>> T[] intersect (T) (inout T[] alpha, inout T[] beta)
>> body {
>>   T[] result;
>>
>>   foreach (x; alpha) {
>>     if (beta.contains(x)) {
>>       result ~= x;
>>     }
>>   }
>>   return result;
>> }
>> unittest {
>>   writef("\t.intersect(T[]) ----> ");
>>   auto foo = array!(int)(1, 2, 3, 4, 5      );
>>   auto bar = array!(int)(      3, 4, 5, 6, 7);
>>   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
>>   writefln("Pass");
>> }
>>
>> /*********************************************************************************** 
>>
>>  * Remove duplicate values from an array.
>>  */
>> void unique (T) (inout T[] haystack)
>> body {
>>   size_t ridx ;
>>
>>   for (int i = 0; i < haystack.length; i++) {
>>     ridx = size_t.max;
>>     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
>>       haystack.removeIndex(ridx);
>>     }
>>   }
>> }
>> unittest {
>>   writef("\t.unique() ----------> ");
>>   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
>>   foo.unique();
>>   assert(foo == array!(int)(1, 2, 3, 4, 5));
>>   writefln("Pass");
>> }
>>
> 
> These algorithms can be implemented more efficiently.  The running time 
> for the intersect is O(n * m) (where n,m are the length of the two 
> arrays).  However, what about the case where alpha and beta are ordered? 
>  Here is a simple algorithm for getting the intersect:
> 
> T[] intersect (T) (inout T[] alpha, inout T[] beta)
> body
> {
> 
> /*     Assume alpha, beta are ordered such that alpha[i] < alpha[i+1]
>  *    and beta[i] < beta[i+1] for all i.
>  */
> 
>         T[] result;
>         size_t a_pos = 0;
>         size_t b_pos = 0;
> 
>         while(a_pos < alpha.length && b_pos < beta.length)
>         {
>                 if(alpha[a_pos] == beta[b_pos])
>                 {
>                         result ~= alpha[a_pos];
>                         ++a_pos;
>                         ++b_pos;
>                 }
>                 else if(alpha[a_pos] < beta[b_pos])
>                         ++a_pos;
>                 else if(beta[b_pos] < alpha[a_pos])
>                         ++b_pos;
>         }
> 
>         return result;
> }
> 
> 
> The cost for this algorithm is only O(n + m).  However, ordering the two 
> arrays is effectively an O(nlogn + mlogm) operation.  Therefore the 
> running time for this modified algorithm is overall O(nlogn + mlogm).  A 
> similar argument can be made for the unique function.

True, and not a bad idea.  I'm not sure how it would be with arrays of classes/structs 
which have no defined order.  I'll try a rewrite and post it later.  :)  Maybe I should 
just ask for a DSource entry for Cashew.

-- Chris Nicholson-Sauls