Splitting up large dirty file
Dennis
dkorpel at gmail.com
Tue May 15 20:36:21 UTC 2018
I have a file with two problems:
- It's too big to fit in memory (apparently, I thought 1.5 Gb
would fit but I get an out of memory error when using
std.file.read)
- It is dirty (contains invalid Unicode characters, null bytes in
the middle of lines)
I want to write a program that splits it up into multiple files,
with the splits happening every n lines. I keep encountering
roadblocks though:
- You can't give Yes.useReplacementChar to `byLine` and `byLine`
(or `readln`) throws an Exception upon encountering an invalid
character.
- decodeFront doesn't work on inputRanges like
`byChunk(4096).joiner`
- std.algorithm.splitter doesn't work on inputRanges either
- When you convert chunks to arrays, you have the risk of a split
being in the middle of a character with multiple code units
Is there a simple way to do this?
More information about the Digitalmars-d-learn
mailing list