The review of std.hash package

Tue Aug 7 12:53:48 PDT 2012

Am Tue, 07 Aug 2012 20:07:07 +0200
schrieb David <d at dav1d.de>:

> Is the API already set in stone?
No, as Jonathan already mentioned reviewing the API is an important
part of the review. (Actually the most important part for this review,
as we define one API for many implementations)

> Using .start and .finish, feels like the use of OpenSSL. Why not
> making a Hash object not "restartable" (since e.g. MD5.start() sets
> `this` just to MD5.init)
I'm not an expert in the hash/digest area. We had incompatible APIs for
CRC32, MD5 and SHA1 and I proposed we should have a unique interface.
As I had some free time and nobody else stepped up I implemented it,
asking for expert help in the newsgroup. This probably also explains
why this API isn't revolutionary, it's inspired by other designs
(tango, .net, ...).

BTW: making it possible to wrap OpenSSL and other C hash libraries was
also a goal. This is one reason why a start() function is necessary.
OK, that was the intro ;-)

start is here as structs can't have default constructors. For all
current hashes it's really mostly useless (except it can be used as a
simple way to reset a hash, which is also why the hash structs don't
have a reset member), but I just don't know if there are hashes which
would require a more complex (runtime) initialization.

Classes don't have a start method as they can do that initialization in
the default constructor. But the default constructor can't be used as a
cheap way to reset a hash, therefore the reset function was added to
the OOP interface.

> and making finish private and implementing a
> digest and hexdigest property
property is not a good idea, as finish must reset the internal state
for some objects, so you can't access the property repeatedly. It'd
have to be a function.

>which calls finish and returns an
> ubyte[] array (or string for hexdigest)

This could be done and is probably a matter of taste. I prefer the free
function in this case, it makes clear that digest() really is a generic
function which can be used with any hash. It's also one function less
which must be implemented by hash writers. I'd like to keep the number
of members required to implement a hash as low as possible.

(This is however unrelated to the actual naming of the functions. If
you think 'digest' is not a good name, please let me know)

>, this would also eliminate
> the need of the `digest` wrapper (e.g. you can mixin these properties
> with template mixins, so no code duplication)

there's actually no code duplication. 'digest' is the only
implementation, sha1Of, md5Of are just aliases.

> 
> Also I am not sure if we want to use `hash.put` instead of
> `hash.update` (which is used by python and openssl). But that's just
> a minor point (e.g. perl uses .add)

I initially called it update, I agree it's a better name. But phobos is
all about ranges, so we need the put member anyway. Providing an
'update' alias just for the name isn't worth it, imho (see above,
keeping the number of hash members low).

> 
> Furthermore, I don't like the `sha1Of` and `md5Of` etc. functions,
> why not having a static method? e.g. MD5.hexdigest("input"), which
> returns the hexdigest and not a MD5 object.

Same reason as above, sha1Of, md5Of are just convenience aliases, it
could be argued you don't need those at all. You can use 'digest'
directly and it will provide exactly the same result. So hash writers
don't have to implement anything to support digest, providing an alias
is optional and just 1 line of code. A static method would have to be
implemented by every hash. Yes you could use mixins to make that
painless, but I think it's still to much work. (And mixins can't be
documented in DDOC, but that's an unrelated compiler issue)

BTW: it is implemented as a member for the OOP API, as we can just
implement it in the base interface, so the actual implementations don't
have to care about it.

> 
> One more thing, `.reset` does the same as `start`? If so, why do both 
> exist?
See above. It's because start is used in the template/struct API and
structs can't have default constructors, so we need start.

The OOP/class api can use a default constructor. But unlike the start
function it can't work as a reset function, so there's an explicit
reset function.

>(I also find the examples quite confusing, why do you want to 
> reset the hash onject?
My approach was to document how to use the functions, not as much when
it makes sense to use them. Maybe I should care more about that?

It's more efficient than allocating a new hash with 'new' (or do you
mean why use reset if finish resets anyway? It's useful if you put some
data into a hash, then decide you don't want the hash (because the
user aborts the operation, network connection is lost,...) but you still
need the hash object later on.) You could just call finish and
disregard the result, but the reset implementation is faster. I agree
there's probably no _strong_ need for reset, but I think it doesn't do
any harm.

>I would documentate the function but wouldn't
> use it in the examples)
> 
> Well, that's it from my side :)

Thanks for your review!