how to handle very large array?

Wed Feb 9 19:48:49 UTC 2022

On Wed, Feb 09, 2022 at 11:05:23AM -0800, Ali Çehreli via Digitalmars-d-learn wrote:
> On 2/9/22 10:38, MoonlightSentinel wrote:
> 
> > There's a way to use a much smaller array to manage the lanternfish
> > population...
> 
> As soon as I read your comment, I was reminded of a certain ingenious
> sorting algorithm that is O(N). ;) After reading the problem
> description, I see your hint was useful.
[...]

If you know beforehand that your data is discrete and bounded, you can
achieve O(N) sort complexity (as opposed to O(N log N), which is the
lower bound if you cannot make any assumptions about your data beyond
their ordering).  You can use pigeonhole sort, for example, i.e., create
an M-element array of counters where M is the number of distinct values
your elements may have, then just iterate over your data and increment
the counter corresponding to each element you find (using array indexing
for O(1) access to each counter). Then iterate over the array of
counters in order and replace the input data with that many copies of
each element. Your overall complexity then would be O(N+M), which would
be O(N) if M is about the same order of magnitude as N or less.

However, this algorithm quickly becomes impractical when M grows large,
because you will soon need to create many more slots than there are
input elements. For example, sorting a general int[] this way would
require 2^32 counters, which is usually overkill unless your input has
close to 2^32 elements. (You could do it if you knew your int's are ≤ M
for some small value of M, though. But it won't work for general int[]
that can contain any value.) And trying to sort a general ulong[] this
way will not work because you would need an array of 2^64 counters,
which would not only exhaust all available RAM in today's computers, but
also take a ridiculously long amount of time to iterate in the second
step.

The insight here is *taking advantage of additional structure in your
data*.  If you take the general, minimal-assumptions approach, the best
you can do is the general O(N log N) algorithm.  However, by exploiting
additional structure in your data, you can do better.  Similarly, if you
take the naïve approach to modelling your lanternfish, you'll end up
with an unwieldy array that's far too large to fit in RAM.  If you
exploit the additional structure in your data, however, you can
accomplish the same task with a much smaller array.

T

-- 
Latin's a dead language, as dead as can be; it killed off all the Romans, and now it's killing me! -- Schoolboy