std.hash design

Regan Heath regan at netmail.co.nz
Fri Jun 22 04:03:27 PDT 2012


On Fri, 22 Jun 2012 10:11:10 +0100, Johannes Pfau <nospam at example.com>  
wrote:

> Pull request #221 and #585 both introduce modules for a new std.hash
> package. As those already change API compared to the old std.crc32 and
> std.md5 modules we should probably decide on a common interface for all
> std.hash modules.
>
> These are the imho most important questions:
>
> Free function (std.crc32) vs object(std.md5) interface
> -----------------------------------------------------
> I think we need a object based interface anyway as md5, sha-1 etc have
> too much state to pass it around conveniently.
>
> Structs and templates vs. classes and interfaces
> -----------------------------------------------------
> It's common to use a hash in a limited scope (like functions). So
> allocating on the stack is important which favors the struct approach.
> However, classes could also be allocated on the stack with scoped.
>
> Classes+interfaces have the benefit that we could provide different
> _ABI_ compatible implementations. E.g. MD5 hashes could be implemented
> with D/OpenSSL wrapper/windows crypto API and we could even add a
> configure switch to phobos to choose the default implementation.
> Doing the same with structs likely only gives us API compatibility, so
> switching the default implementation in phobos could cause trouble.
>
> Basic design:
> ---------------
> If we'll implement an object based interface (struct/class), it should
> probably be an output range. Something like this:
>
> struct/interface Hash
> {
>     void put(const(ubyte)[] data);
>     void put(ubyte data);
>     void start(); //initialize
>     void reset(); //reset
>     ubyte[] finish(ref ubyte[] buffer = null); //See below
>     enum size_t hashLength; //optional? See below
> }
>
> The finish function signature is a little controversial. The length of
> the result differs between hash implementations. For structs+templates
> we could use static arrays, but for classes+interface we'd have to use
> dynamic arrays.

It might help (or it might not) to have a glance at the "design" of the  
hashing routines in Tango:
http://www.dsource.org/projects/tango/docs/current/
(see tango.util.digest etc)

I contributed some of the initial code for these, though it has since  
evolved a lot.  I started with structs, mirroring the phobos MD5 code but  
used all sorts of unnecessary mixins to get the code reuse I wanted.  The  
result was ugly :p

Later someone contacted me about it, and wanted a class based approach so  
I did some refactoring and the result was much cleaner.  I'm not trying to  
say that a struct approach cannot be clean, just that I did a bad job of  
it initially, and also structs don't lend themselves to the factory  
pattern though which is a nice way to use hashing.

As Dmitry has said, we can likely get the best of both worlds with classes  
wrapping structs or similar.

> toString doesn't make sense on a hash, as finish() has to be called
> before a string can be generated. So a helper function could be useful.

toString() could output the intermediate/internal state at the time of the  
call, which if called after "finish" would be the hash result.  I can't  
recall if this has any specific usefulness, tho I have a nagging/niggling  
itch which says I did use this intermediate result for something at some  
stage.

It might be useful to have toString on a hash so that we can pass a  
completed hash object around and repeatedly obtain the string  
representation vs obtaining it once on "finish" and passing the string  
around.  However, that said, it's probably more secure to destroy and  
scrub the memory used by the hash object ASAP and only retain the  
resulting string or ubyte[] result.

I think I've talked myself round in a circle.. I think if we have a way to  
obtain the current state as ubyte[] that would satisfy the niggle I have.   
Having a separate routine for turning a ubyte[] into a hex string is  
probably better than attaching toString to a hash object.

R

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/


More information about the Digitalmars-d mailing list