A bit of binary I/O

Chris Nicholson-Sauls ibisbasenji at gmail.com
Sat Jan 20 16:42:42 PST 2007


Heinz wrote:
> Jarrett Billingsley Wrote:
> 
>> "Heinz" <billgates at microsoft.com> wrote in message 
>> news:eou69k$8tf$1 at digitaldaemon.com...
>>
>>> The first way is to write primitives manually one by one:
>>>
>>> // primitive way
>>> ulong i = 9;
>>> char[] s = "hello world";
>>> myFile.writeExact(&i, i.sizeof);
>>> myFile.writeExact(&s, s.sizeof);
>>>
>>> Reading data:
>>> // Is done by reading each primitive.
>>> ulong i2; char[] s2;
>>> myFile.readExact(&i2, i2.sizeof);
>>> myFile.readExact(&s2, s2.sizeof);
>> You're writing the string wrong.  All you're doing is writing the length and 
>> pointer of the array data, without actually writing the data.
>>
>> The Stream class (and by extension, the File class) provides functions for 
>> writing out every basic type:
>>
>> ulong i = 9;
>> char[] s = "hello world";
>> myFile.write(i);
>> myFile.write(s);
>>
>> ...
>> ulong i2;
>> char[] s2;
>> myFile.read(i2);
>> myFile.read(s);
>>
>>> The second way is to write a structure with all the primitives as members:
>>>
>>> // struct way
>>> struct t
>>> {
>>> ulong i;
>>> char[] s;
>>> }
>>>
>>> t mt;
>>> mt.i = 9;
>>> mt.s = "hello world";
>>> myFile.writeExact(&mt, mt.sizeof);
>>>
>>> Reading data:
>>> // We read the entire struct.
>>> t mt2;
>>> myFile.readExact(&mt2, mt2.sizeof);
>> Again, you're just writing out the array reference without writing its 
>> contents.  You have to write out each member individually.  If there were no 
>> reference types in the struct, this would work fine.
>>
>>> And the third way is to write a class with all the primitives as members:
>>>
>>> // class way
>>> class tt
>>> {
>>> ulong i;
>>> char[] s;
>>> }
>>>
>>> tt mtt = new tt();
>>> mtt.i = 9;
>>> mtt.s = "hello world";
>>> ResFile.writeExact(&mtt, mtt.sizeof);
>>>
>>> Reading data:
>>> // We read the entire class.
>>> tt mtt2;
>>> myFile.readExact(&mtt2, mtt2.sizeof);
>>>
>> This is incorrect, and is only working because of how you've written your 
>> program.  You're not writing the data out at all, you're writing a class 
>> reference.  The 00913FC0 is just the memory address of the class instance 
>> that mtt points to, and when you read that address back in, you're just 
>> looking at the data in memory.  This program wouldn't work if you write the 
>> file, exited, then had another program that read the data.  You'd end up 
>> with a memory access violation, and none of the data in the class is 
>> actually written out.
>>
>> If you want to write a class out to a file, a common way is to have some 
>> kind of generic "serialize" and "unserialize" functions for the class:
>>
>> class C
>> {
>>     ulong i;
>>     char[] s;
>>
>>     void serialize(Stream s)
>>     {
>>         s.write(i);
>>         s.write(s);
>>     }
>>
>>     static C unserialize(Stream s)
>>     {
>>         C c = new C();
>>         s.read(c.i);
>>         s.read(c.s);
>>         return c;
>>     }
>> }
>>
>> ...
>> C c = new C();
>> c.i = 5;
>> c.s = "foo";
>> c.serialize(myFile);
>>
>> ...
>>
>> C c = C.unserialize(myFile);
>>
>>> All of these methods works perfect. I'm able to retrieve values from all 
>>> of them. Now lets check at the outputs:
>>>
>>> // Primitive
>>>
>>> 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00
>>>
>>> // Structure
>>>
>>> 09 00 00 00 00 00 00 00 0B 00 00 00 A0 C7 41 00
>>>
>>> // Class
>>>
>>> C0 3F 91 00
>>>
>>> My questions are:
>>>
>>> 1) What's the best method to write data (in terms of data 
>>> protection/encryption against reversion). The class way seems to me at 
>>> first look the most secure way.
>> As explained before, the class method is wrong, and there is no encryption 
>> going on here.  It's just a memory address, and you should never, ever write 
>> memory addresses to a file.
>>
>> That being said, the best way is probably to just use the primitive .read 
>> and .write methods of File.  Just .. never, ever write pointers or 
>> references of any kind to a file.
>>
>>> 2) Wich method is the faster in retrieving data?
>> If you implement them correctly, all three sample programs should make the 
>> exact same output file using the same number of writes (and read it in the 
>> same number of reads), and so they are all the same in terms of performance. 
>>
>>
> 
> Wow, that covers all, thanks for your reply.
> 
> But, can i still write an entire structure with writeExact()? or you suggest writting each member of the structure with write()?
> 
> Another question: Writting a type char[] with write() writes string as ASCII? if so then is a legible string, how can i protect that data?
> 
> Thanks man

Well technically it will write it as UTF8, which is as near to ASCII as makes no 
nevermind.  If you don't want it readable (and this is a binary file anyway) you could 
just use some simple reversable encryption algorithm.  Something like this for a silly random.

<code>
module silly;

import tango .io .Stdout ;

struct SillyCrypt {

   alias process opCall ;

   static const CHUNK_SIZE = 32_U ;
   static const ROT        = 16_U ;
   static const XOR        = 24_U ;

   static char[] process (char[] src) {
     char[] result ;

     foreach (ch; chunks(src)) {
       result ~= mutate(ch);
     }
     return result;
   }

   private static char[][] chunks (char[] x) {
     char[]   source = x ;
     char[][] result     ;

     while (source.length >= CHUNK_SIZE) {
       result ~= source[0          .. CHUNK_SIZE] ;
       source  = source[CHUNK_SIZE .. $         ] ;
     }
     if (source.length) {
       result ~= source;
     }
     return result;
   }

   private static char[] mutate (char[] x) {
     char[] result ;

     if (x.length > ROT) {
       result = x[ROT .. $] ~ x[0 .. ROT];
     }
     else {
       result = x.dup;
     }
     foreach (inout c; result) {
       c ^= XOR;
     }
     return result;
   }

}

const SOURCE = "I would say hello to you, but you couldn't read it even if I did."c ;

void main () {
   auto enc = SillyCrypt(SOURCE) ;
   auto dec = SillyCrypt(enc   ) ;

   Stdout
     ("Source  -> "c)(SOURCE).newline()
     ("Encrypt -> "c)(enc   ).newline()
     ("Decrypt -> "c)(dec   ).newline()
     .flush
   ;
}
</code>

The output when I tried it was this:
Source  -> I would say hello to you, but you couldn't read it even if I did.
Encrypt -> w8lw8awm48zml8awQ8owmt|8kya8p}ttql8}n}v8q~8Q8|q|m8{wmt|v?l8j}y|86
Decrypt -> I would say hello to you, but you couldn't read it even if I did.

I know I don't personally know anyone who can read 
"w8lw8awm48zml8awQ8owmt|8kya8p}ttql8}n}v8q~8Q8|q|m8{wmt|v?l8j}y|86" at all.  :)

-- Chris Nicholson-Sauls


More information about the Digitalmars-d-learn mailing list