Trying to reduce memory usage
frame
frame86 at live.com
Fri Feb 12 07:57:56 UTC 2021
On Friday, 12 February 2021 at 07:23:12 UTC, frame wrote:
> On Friday, 12 February 2021 at 02:22:35 UTC, H. S. Teoh wrote:
>
>> This turns the OP's O(n log n) algorithm into an O(n)
>> algorithm, doesn't
>> need to copy the entire content of the file into memory, and
>> also uses
>> much less memory by storing only hashes.
>
> But this kind of hash is maybe insufficient to avoid hash
> collisions. For such big data slower but stronger algorithms
> like SHA are advisable.
>
> Also associative arrays uses the same weak algorithm where you
> can run into collision issues. Thus using the hash from string
> data as key can be a problem. I always use a quick hash as key
> but hold actually a collection of hashes in them and do a
> lookup to be on the safe side.
Forgot to mention that this kind of solution needs a better
approach if you don't want to miss a potential different line:
You can use a weak hash but track the line position and count how
often the same hash occurs as a pre-process. In the post-process
you look for this lines again and compare if they are really
identical or hash collisions to correct.
More information about the Digitalmars-d-learn
mailing list