Prevent opening binary/other garbage files
Adam D. Ruppe
destructionator at gmail.com
Mon Oct 1 19:40:23 UTC 2018
On Monday, 1 October 2018 at 15:21:24 UTC, helxi wrote:
> I tried out https://dlang.org/library/std/utf/validate.html
> before manually checking for encoding myself so I ended up with
> the code below. I was fairly surprised that "*.o" (object)
> files are UTF encoded! Is it normal?
Yes. Any random collection of bytes <= 127 is valid utf-8. Lines
will read until it sees a byte 10, and cut off from there.
Quite a few file formats have a 10 early on to detect text/binary
transmission corruption, but even if they don't, it is a fairly
common byte to see before too long and that cuts off your scan
for later bytes.
You really are better off looking for those <32 bytes like I
described earlier - a .o file will likely have some 1's and 3's
early on which that will quickly detect, but those will also pass
the validate test.
More information about the Digitalmars-d-learn
mailing list