Trying to reduce memory usage

Josh moonburntm at gmail.com
Fri Feb 12 01:23:14 UTC 2021


I'm trying to read in a text file that has many duplicated lines 
and output a file with all the duplicates removed. By the end of 
this code snippet, the memory usage is ~5x the size of the infile 
(which can be multiple GB each), and when this is in a loop the 
memory usage becomes unmanageable and often results in an 
OutOfMemory error or just a complete lock up of the system. Is 
there a way to reduce the memory usage of this code without 
sacrificing speed to any noticeable extent? My assumption is the 
.sort.uniq needs improving, but I can't think of an easier/not 
much slower way of doing it.

Windows 10 x64
LDC - the LLVM D compiler (1.21.0-beta1):
   based on DMD v2.091.0 and LLVM 10.0.0

-----------------------------------

auto filename = "path\\to\\file.txt.temp";
auto array = appender!(string[]);
File infile = File(filename, "r");
foreach (line; infile.byLine) {
   array ~= line.to!string;
}
File outfile = File(stripExtension(filename), "w");
foreach (element; (array[]).sort.uniq) {
   outfile.myrawWrite(element ~ "\n"); // used to not print the \r 
on windows
}
outfile.close;
array.clear;
array.shrinkTo(0);
infile.close;

-----------------------------------

Thanks.


More information about the Digitalmars-d-learn mailing list