Parsing and splitting textfile

Hugo Florentino hugo at acdam.cu
Mon Feb 24 12:19:06 PST 2014


On Mon, 24 Feb 2014 19:08:16 +0000 (UTC), Justin Whear wrote:
>
> Specifically std.regex.splitter[1] creates a lazy range over the 
> input.
> You can couple this with lazy file reading (e.g. 
> `File("mailbox").byChunk
> (1024).joiner`).
>

Would something like this work? (I cannot test it right now)

auto themailbox = args[1];
immutable uint chunksize = 1024 * 64;
static auto re = regex(`\n\nFrom .+ at .+$`);
auto mailbox;
auto mail;
while (mailbox = File(themailbox).byChunk(chunksize).joiner) != EOF)
{
   mail = splitter(mailbox, re);
}

If so, I have a couple of furter doubts:

Using splitter actually removes the expression from the string, how 
could I reinsert it to the beginning of each resulting string in an 
efficient way (i.e. avoiding copying something which is already loaded 
in memory)?

I am seeing the splitter fuction returns a struct, how could I 
progressively dump to disk each resulting string, removing it from the 
struct, so that so that it does not end up having the full mailbox 
loaded into memory, in this case as a struct?

Regards, Hugo


More information about the Digitalmars-d-learn mailing list