Prevent opening binary/other garbage files

Adam D. Ruppe destructionator at gmail.com
Mon Oct 1 19:40:23 UTC 2018


On Monday, 1 October 2018 at 15:21:24 UTC, helxi wrote:
> I tried out https://dlang.org/library/std/utf/validate.html 
> before manually checking for encoding myself so I ended up with 
> the code below. I was fairly surprised that "*.o" (object) 
> files are UTF encoded! Is it normal?

Yes. Any random collection of bytes <= 127 is valid utf-8. Lines 
will read until it sees a byte 10, and cut off from there.

Quite a few file formats have a 10 early on to detect text/binary 
transmission corruption, but even if they don't, it is a fairly 
common byte to see before too long and that cuts off your scan 
for later bytes.


You really are better off looking for those <32 bytes like I 
described earlier - a .o file will likely have some 1's and 3's 
early on which that will quickly detect, but those will also pass 
the validate test.



More information about the Digitalmars-d-learn mailing list