std.hash design

Fri Jun 22 10:12:20 PDT 2012

On Fri, 22 Jun 2012 14:21:28 +0100, Johannes Pfau <nospam at example.com>  
wrote:
> Am Fri, 22 Jun 2012 12:03:27 +0100
> schrieb "Regan Heath" <regan at netmail.co.nz>:
>
>>
>> It might help (or it might not) to have a glance at the "design" of
>> the hashing routines in Tango:
>> http://www.dsource.org/projects/tango/docs/current/
>> (see tango.util.digest etc)
>>
>> I contributed some of the initial code for these, though it has
>> since evolved a lot.  I started with structs, mirroring the phobos
>> MD5 code but used all sorts of unnecessary mixins to get the code
>> reuse I wanted.  The result was ugly :p
>>
>> Later someone contacted me about it, and wanted a class based
>> approach so I did some refactoring and the result was much cleaner.
>> I'm not trying to say that a struct approach cannot be clean, just
>> that I did a bad job of it initially, and also structs don't lend
>> themselves to the factory pattern though which is a nice way to use
>> hashing.
>
> I had a short look at Piotr Szturmaj's sha implementations, and it
> seems this kind of code would benefit a lot from inheritance. I
> understand that it was probably impossible to do this in D1, but don't
> you think 'alias this' could work in D2? This wouldn't solve the
> problem with the factory pattern, but that can be solved by providing
> wrapper classes.

My original code was D1 and I used structs and mixins.. so perhaps alias  
this will solve the code re-use problem.  I haven't done enough D2 to be  
helpful here I'm afraid.

>> > toString doesn't make sense on a hash, as finish() has to be called
>> > before a string can be generated. So a helper function could be
>> > useful.
>>
>> toString() could output the intermediate/internal state at the time
>> of the call, which if called after "finish" would be the hash
>> result.  I can't recall if this has any specific usefulness, tho I
>> have a nagging/niggling itch which says I did use this intermediate
>> result for something at some stage.
>>
>> It might be useful to have toString on a hash so that we can pass a
>> completed hash object around and repeatedly obtain the string
>> representation vs obtaining it once on "finish" and passing the
>> string around.  However, that said, it's probably more secure to
>> destroy and scrub the memory used by the hash object ASAP and only
>> retain the resulting string or ubyte[] result.
>>
>> I think I've talked myself round in a circle.. I think if we have a
>> way to obtain the current state as ubyte[] that would satisfy the
>> niggle I have. Having a separate routine for turning a ubyte[] into a
>> hex string is probably better than attaching toString to a hash
>> object.
>
> We could also provide a finishString function or something like that.
> But toString returning a intermediate state would be confusing.

Agreed.  In fact I wouldn't bother with finishString either TBH, people  
can always pass the result of finish string into the method which produces  
the hex string representation.

IIRC when I wrote my Tiger implementation it was fairly new, and I had a  
different method for formatting the hex string representation.  Either  
they later changed the Tiger spec, or I was confused at the time because I  
have this niggling memory that I later "discovered" it was the same all  
along, or something.

In any case, we can probably have one static toHexString method for all  
digests.

> Tango doesn't seem to offer a way to peek at the current state. But if
> it's really useful, it could be added.

Probably just cobwebs in my memory, ignore me :p

> BTW: Do you know why digestSize is a function in tango? Are there
> digests that produce variable length hashes?

Not to my knowledge.. perhaps there is a time/place where you want to know  
the size of the digest result before calculating the digest?  Might be  
useful in generic code perhaps..

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/