Vote for the new std.hash (oops, std.digest)

Mon Aug 27 04:40:16 PDT 2012

Am Sun, 26 Aug 2012 23:57:24 +0400
schrieb Dmitry Olshansky <dmitry.olsh at gmail.com>:

> On 26-Aug-12 19:11, Andrei Alexandrescu wrote:> On 8/22/12 8:36 AM, 
> Dmitry Olshansky wrote:
>  >> The discussion around new unified API for digest hash functions
>  >> has subdued, just in time as the review period has ended.
>  >>
>  >> The voting for std.digest package starts today and ends on 29
>  >> August.
>  >>
>  >> Rules are simple: reply in this thread with definite "YES" or
>  >> "NO" on whether this package should go into Phobos. Including
>  >> descriptive short notes is appreciated (esp. the NO ones).
>  >
>  > I couldn't make time for a review during the window, apologies.
>  > Nevertheless I'd like to make a pass in hope that a few issues can
>  > be fixed before the code is publicly released.
>  >
> Well I don't think it'll make any harm to see what issues you have
> found that need to be addressed. If they prove to be way too much we
> may consider prolonging review/postponing voting. I'll reply to some
> points.
> 
>  > * No need for start() after default construction in the
>  > documentation example of CRC32. BTW the documentation of start()
>  > suggests that it _must_ be called even right after default
>  > construction, is that the case?
>  >
> I believe it is. FWIW that's how OpenSSL and it's ilk work and people 
> thought that being able to wrap it cleanly would good idea. And there
> is Q of being able to do initialization at C-T, say if wrapping OS 
> primitives (like Win32 Crypto Provider).

Yes, generic code should always call start for the reasons mentioned
above. Examples using the CRC32, MD5 and SHA1 digests directly could
avoid the call though.

I'll also document that the current implementations don't need a call
to start after default construction.
> 
>  > * In the template interface, no need for void peek(ref ...).
>  > Returning by value is fine due to the RVO. Same about finish() -
>  > just keep the functions returning by value.

OK, that makes sense. As it also makes implementing a digest easier,
I'm all for it.

>  >
>  > * This example:
>  >
>  > copy(oneMillionRange, &ctx); //Note: You must pass a pointer to
>  > copy!
>  >
>  > suggests we're doing something wrong. I think a better solution
>  > would be to have copy() take the target range by "auto ref", and
>  > institute this passing convention as a general rule for output
>  > ranges.
>  >
> This one actually is the most important (IMO).
> Namely the suggested pattern is error prone and needs to get fixed. 
> Interestingly though, some array code may break because it never 
> modified passed slice (and it would now).

I agree this should be fixed in copy. Should we special case arrays
then?
BTW: This does work as well, but it's still error-prone:
ctx = copy(oneMillionRange, ctx);
> 
>  > * s/digestType/DigestType/

OK
>  > * s/isDigestable/isDigestible/

OK
>  > * s/startDigest/makeDigest/ - but then again I wonder if requiring
>  > start() after default construction is needed.

OK. As makeDigest is meant to work correctly for all digests a call to
start is needed (see above, for example OpenSSL wrappers need this)

>  > * s/asArray/asStaticArray/. Better yet, we should include this
>  > primitive in std.conv. Then we get to use to!(ubyte[16])(dynarray).
>  >
> Better yet there should be (there is ?) a better way to overload to! 
> template.

I just removed asArray from the documentation for now. I'll probably
file a pull request to merge it into std.conv.
Currently the code just asserts in debug mode to make sure the dynamic
array is big enough. Would I have to use enforce to get this merged
into std.conv.to?

BTW: Is the asArray behavior really what'd be expected from
to!(ubyte[16])(dynarray)?

asArray does explicitly not make a copy, so changing the returned
static array will change the dynamic array's content.

>  > * The fact that digests aren't unique is trivial by the pigeonhole
>  > principle, and should not be listed as a bug (the BUGS: tag
>  > implies bugs in the implementation). It's more interesting if a
>  > document can be altered to produce a specific digest. If that's
>  > what you meant, include a link to the appropriate work.
> 
> I do suspect it's the latter as it is a well known issue that MD5 is 
> reversible in reasonable amount of time on modern PCs. See rainbow
> tables.

That was part of the original std.md5 documentation and I just left it
in. To be honest I'm not sure what was meant by that statement. I
always thought it referred to hash collisions, but as you said those
occur for every digest.
So I think you're right and it's better to remove this from the BUGS
section.

> 
>  > * I don't think the aliases for WrapperDigest!Xyz should be
>  > defined. Just call that Digest!Xyz and have people use it. It's
>  > about as short as the aliases.
> 
> Actually +1. I'm just wondering if there is something that'll make 
> people do class version but not the struct one (bypassing the
> wrapper).

The interface is already called Digest ;-)
I'm not sure, I'd rather leave things as they are. There might be
reasons not to use the wrapper template for some digests (e.g. special
functionality not covered by the wrapper)

> 
>  > I vote for inclusion whether or not the points above are
>  > addressed, but of course it would be great if they were minded
>  > before release. Apologies for the late review.
> 
> 

Thanks for the review, I hope I addressed most of your points.