The review of std.hash package

Wed Aug 8 01:48:46 PDT 2012

Am Wed, 08 Aug 2012 01:27:45 +0200
schrieb Piotr Szturmaj <bncrbme at jadamspam.pl>:

> Johannes Pfau wrote:
> > As I had some free time and nobody else stepped up I implemented it,
> 
> I tried it before but I wanted to create whole crypto package at
> once,
I'm sorry, I didn't want to conceal your work. What I meant with
'nobody stepped up' is this: We had a pull request which moved the crc
code into std.crc and it was already merged in for 2.060. I
complained that it had yet another API and that we should have a common
API. In the end that pull request was reverted but I felt kinda guilty
as that left us without a public crc module. In that discussion, nobody
else had time to implement that API, so I thought we'd need a
solution soon and started implementing it.

> and I guess it was a huge mistake.
We still need your crypto work ;-)
I'd personally like to see a bcrypt implementation. There's no
widespread bcrypt C library which could be used, so there's no painless
way to use bcrypt in D. (Although there _is_ public domain C source
code)

> I know you have used my code
> in your uuid module, you can use it to add it to std.hash if you
> want. There are all SHA implementations, MD5 and a Tiger (both
> versions). They all support bit hashing and work with CTFE.
Great, I'll have a look if this review works out (or as soon as we have
a standardized hash API in phobos).

BTW: How does it work in CTFE? Don't you have to do endianness
conversions at some time? According to Don that's not really supported.

Another problem with prevents CTFE for my proposal would be that the
internal state is currently implemented as an array of uints, but the
API uses ubyte[] as a return type. That sort of reinterpret cast is not
supposed to work in CTFE though. I wonder how you avoided that issue?

And another problem is that void[][] (as used in the 'digest' function)
doesn't work in CTFE (and it isn't supposed to work). But that's a
problem specific to this API.

> I made a quick look at your proposal, but currently don't have time
> to do a deep investigation :). One note though. I don't think it's 
> neccessary to maintain two different API sets. Sure, classes need
> heap allocation but usually that's only one allocation, it's more
> important to not allocate during algorithm computations. Fortunately
> crypto algorithms don't do this. So, struct allocation efficency is
> negligible here. I think separate struct based API set is not worth
> it (or I'm missing something not related to allocation cost).

I'm not sure, I guess some people would disagree. I initially asked
whether we should use a struct based API or a class based API (+emplace
where necessary). At that time the struct API was favored.

It's about the allocation cost (and the additional GC work) although
people could probably also complain about the indirection classes bring
along.

There's one important example where you really don't want any
allocation to happen (and especially no GC allocation): You have a
function which always calculates the same type of hash (e.g. md5) and
you don't care about the implementation. You only care about the
final hash, the hash context is never used outside of that function.
This is an perfect fit for stack allocation, especially if the
function is used a lot (GC). With classes you'd have to do
some caching to do it efficiently (which then gives problems with
const).
It would be possible to use a class API + emplace, but that was deemed
to be too cumbersome.