Prevent opening binary/other garbage files
bauss
jj_1337 at live.dk
Sun Sep 30 06:17:20 UTC 2018
On Saturday, 29 September 2018 at 15:52:30 UTC, helxi wrote:
> I'm writing a utility that checks for specific keyword(s) found
> in the files in a given directory recursively. What's the best
> strategy to avoid opening a bin file or some sort of garbage
> dump? Check encoding of the given file?
>
> If so, what are the most popular encodings (in POSIX if that
> matters) and how do I detect them?
What I would do is read the frist 512 bytes and the last 512
bytes and if over 50% of those bytes are below 32 and not 8, 9,
10, 11, 12 or 13 then chances are you have a binary file, but
there is nothing that stops someone from writing "invalid" bytes
into a text file. There are no limitations on what a file can
hold and generally the system treats all files the same.
The reason I recommend to read the first 512 and last 512 bytes
is because some binary files may contain legit text strings etc.
so by picking two places chances are you won't have two segments
with text.
More information about the Digitalmars-d-learn
mailing list