The review of std.hash package

Wed Aug 8 01:44:26 PDT 2012

Am Wed, 08 Aug 2012 00:55:47 +0200
schrieb David <d at dav1d.de>:

> Am 07.08.2012 21:53, schrieb Johannes Pfau:
> > Am Tue, 07 Aug 2012 20:07:07 +0200
> > schrieb David <d at dav1d.de>:
> >
> > start is here as structs can't have default constructors. For all
> > current hashes it's really mostly useless (except it can be used as
> > a simple way to reset a hash, which is also why the hash structs
> > don't have a reset member), but I just don't know if there are
> > hashes which would require a more complex (runtime) initialization.
> 
> Ok this point with the one above makes sense (I implemented my
> OpenSSL hashing wrapper as a class, initilaization is done in the
> constructor), it still doesn't feel right, if you have to call .start
> first. What about a struct-constructor which calls .start internally,
> so you get a bit more of a modern API (imo) and you're still able to
> implement the same interface for any kind of wrapper/own
> implementation/whatever.

You mean an external function which constructs the digest context and
calls start? That's similar to the appender function in std.array and
probably a good idea. We'd need a good name for it though.
* initContext
* initializeContext
* context
* startContext
* startHash / startDigest

> > property is not a good idea, as finish must reset the internal state
> > for some objects, so you can't access the property repeatedly. It'd
> > have to be a function.
> 
> Well, you could store the result internally, property generates on
> the first call the digest, stores it (that's not too much, 16 byte
> for a md5) and returns the already stored value on the 2nd call.
> 

yes, that could be done but to be honest I don't think it's worth it.

> >> which calls finish and returns an
> >> ubyte[] array (or string for hexdigest)
> >
> > This could be done and is probably a matter of taste. I prefer the
> > free function in this case, it makes clear that digest() really is
> > a generic function which can be used with any hash. It's also one
> > function less which must be implemented by hash writers. I'd like
> > to keep the number of members required to implement a hash as low
> > as possible.
> 
> With digest you mean: 
> http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html#digest ?
> 
> You normally always want the hexdigest (you barely need "real"
> digest),

really? I though it's the other way round and comparing/verifying
hashes is the common case?

> so it's a matter of convenience. Well, thanks to UFCS, you
> can call it like a method.
> 
a hexDigest which works like digest could probably be useful, although
it's very easy to implement this in user code as well:
string hexDigest(Hash, Order order = Order.increasing)(scope
const(void[])[] data...) if(isDigest!Hash)
{
    return digest!Hash(data).toHexString!order();
}

> > A static
> > method would have to be implemented by every hash.
> 
> Yes, you would have to implement it for every hash, but that's 3
> lines:
> 
> static string hexdigest(void[] data) {
> 	return toHexString(digest!(typeof(this))("yay"));
> }

The implementation is indeed quite simple, so it probably is a matter
of taste whether hexDigest is implemented as a member or as a free
function.

> > BTW: it is implemented as a member for the OOP API, as we can just
> > implement it in the base interface, so the actual implementations
> > don't have to care about it.
> 
> I've seen that, but I am wondering why it's not a static method, the 
> creation of the Hash Object could be hidden inside. You might say,
> this would allocate a class instance just for a single use, without
> the possibility to reuse it. Correct, but I think that's the use of
> such a function, if you just need a Hash, nothing else, otherwise you
> could use the class "directly", with all its other functionality.

I have to think about this. It's too bad we can't have static & member
functions with the same name in D.

I'd probably argue if you want the behavior of a static function, use
the struct API: digest!MD5(data);
Doesn't this do everything you need, without the GC allocation the OOP
api would cause? I mean the OOP API is mostly about polymorphism (and
about reference semantics). If you use a static method as proposed,
you can't use polymorphism and reference semantics are not useful
either, as you never get a reference to the context?

> > The OOP/class api can use a default constructor. But unlike the
> > start function it can't work as a reset function, so there's an
> > explicit reset function.
> 
> I am not sure what do think about that. On the one side it's useful
> if you really need that speed, but then I think, is it really worth
> it resetting the state hidden in a function. If you really want to
> avoid another allocation you could use emplace (if I didn't
> misunderstand the use of emplace). To quote from the Python Zen:
> "Explicit is better than implicit."

Yes, it should be possible to use emplace for that. Emplace is not
well-known though and I'm not sure how well it works now. Also I'm not
sure if emplace is enough in this case, I guess you'd have to destroy
the old instance as well (so destructors are run)?

But I'm not really sure about the importance of reset either. As long
as the main implementation is in the struct API and WrapperDigest is
used to wrap it into the OOP API there's no extra work to implement
reset. If the main implementation is in the OOP interface, the reset
function probably looks a lot like the constructor, so simply calling
reset from the constructor should avoid code duplication.
> 
> >> (I also find the examples quite confusing, why do you want to
> >> reset the hash onject?
> > My approach was to document how to use the functions, not as much
> > when it makes sense to use them. Maybe I should care more about
> > that?
> 
> When I saw the documentation/examples, I was confused, why do you
> want to reset a hash object, so does it really reset the state? So I
> had to take a look into the code before I was sure, it surely does
> reset the whole thingy (is it called context?).

Yes, the whole terminology is probably a little confusing. The old API
called the contexts like this: MD5_CTX, but that doesn't fit the phobos
naming conventions. MD5CTX, MD5Ctx, Md5Ctx and Md5CTX all look ugly
(and only the first fits our naming conventions). So MD5Context is the
next best choice, but it's a little long. It'd more precise though.

> 
> > It's more efficient than allocating a new hash with 'new' (or do you
> > mean why use reset if finish resets anyway? It's useful if you put
> > some data into a hash, then decide you don't want the hash (because
> > the user aborts the operation, network connection is lost,...) but
> > you still need the hash object later on.) You could just call
> > finish and disregard the result, but the reset implementation is
> > faster. I agree there's probably no _strong_ need for reset, but I
> > think it doesn't do any harm.
> 
> "Explicit is better than implicit."
> 

I'm just not sure if reset can be implemented in a reasonable way
outside of the class in all cases. But then it's probably not important
enough to justify an extra member...