cashew.utils.array 0.1a.2

Thu Sep 14 08:44:04 PDT 2006

Chris Nicholson-Sauls wrote:
> /***********************************************************************************
>  * Return an array composed of common elements of two arrays.
>  */
> T[] intersect (T) (inout T[] alpha, inout T[] beta)
> body {
>   T[] result;
> 
>   foreach (x; alpha) {
>     if (beta.contains(x)) {
>       result ~= x;
>     }
>   }
>   return result;
> }
> unittest {
>   writef("\t.intersect(T[]) ----> ");
>   auto foo = array!(int)(1, 2, 3, 4, 5      );
>   auto bar = array!(int)(      3, 4, 5, 6, 7);
>   assert(foo.intersect(bar) == array!(int)(3, 4, 5));
>   writefln("Pass");
> }
> 
> /***********************************************************************************
>  * Remove duplicate values from an array.
>  */
> void unique (T) (inout T[] haystack)
> body {
>   size_t ridx ;
> 
>   for (int i = 0; i < haystack.length; i++) {
>     ridx = size_t.max;
>     while ((ridx = haystack.rindexOf(haystack[i], ridx)) != i) {
>       haystack.removeIndex(ridx);
>     }
>   }
> }
> unittest {
>   writef("\t.unique() ----------> ");
>   auto foo = array!(int)(1, 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5);
>   foo.unique();
>   assert(foo == array!(int)(1, 2, 3, 4, 5));
>   writefln("Pass");
> }
> 

These algorithms can be implemented more efficiently.  The running time 
for the intersect is O(n * m) (where n,m are the length of the two 
arrays).  However, what about the case where alpha and beta are ordered? 
  Here is a simple algorithm for getting the intersect:

T[] intersect (T) (inout T[] alpha, inout T[] beta)
body
{

/* 	Assume alpha, beta are ordered such that alpha[i] < alpha[i+1]
  *	and beta[i] < beta[i+1] for all i.
  */

         T[] result;
         size_t a_pos = 0;
         size_t b_pos = 0;

         while(a_pos < alpha.length && b_pos < beta.length)
         {
                 if(alpha[a_pos] == beta[b_pos])
                 {
                         result ~= alpha[a_pos];
                         ++a_pos;
                         ++b_pos;
                 }
                 else if(alpha[a_pos] < beta[b_pos])
                         ++a_pos;
                 else if(beta[b_pos] < alpha[a_pos])
                         ++b_pos;
         }

         return result;
}

The cost for this algorithm is only O(n + m).  However, ordering the two 
arrays is effectively an O(nlogn + mlogm) operation.  Therefore the 
running time for this modified algorithm is overall O(nlogn + mlogm).  A 
similar argument can be made for the unique function.