std.hash: More questions

Sun Jul 8 06:14:23 PDT 2012

On 08-Jul-12 17:09, Johannes Pfau wrote:
> Am Fri, 06 Jul 2012 01:24:04 +0400
> schrieb Dmitry Olshansky <dmitry.olsh at gmail.com>:
>
>>
>> The only thing  I can think of that would require start function is
>> using unconventional initial vectors.
>>
>
> Those could be done as template parameters though? (If the hash is
> written as a templated struct).

Well probably, but it will lead to code duplication for no real benefit. 
Maybe it'll be faster with constant vectors, but I'm not so sure.
>
> But e.g. OpenSSL has *_init functions as well, so we probably should
> keep the start function even if it's just to allow wrappers for
> OpenSSL?
>
>>>
>>> CRC32 sums are usually presented as a uint, not a ubyte[4]. To fit
>>> the rest of the API ubyte[4] is used. Now there's a small annoying
>>> detail: The CRC32 should be printed in LSB-first order.
>> You probably meant MSB first.
>
> The rosettacode.org site which I used to verify the CRC32 results said
> LSB-first but it seems it only describes the data layout of the uint
> value (Little Endian). The printf/writef result is indeed MSB-first.
>
>>
>>> When printing an uint like this, that works well:
>>> writefln("%#x", 4157704578); //0xf7d18982
>>> but this doesn't:
>>> toHexString(*cast(ubyte[4]*)&4157704578); //8289D1F7
>>
>> There is no problem it's just order of printing that at fault. So I
>> suggest to *stop* doing a bswap.
>>
>> It's just that printing something as an array of ubytes does it from
>> least significant byte to most significant. You could try to add
>> MSB/LSB first options to toHexString.
>
> Yes, but that's not very intuitive. Most people would expect the same
> result (by default) that other languages provide:
> http://rosettacode.org/wiki/CRC-32
>
> I'll add the order option to toHexString but I think I'll also
> add an alias crcToHexString/crcHexString or something like that.
>
>>
>>>
>>> I can't change toHexString as it's used for all hashes and it's
>>> correct for SHA1, MD5, ...
>>> So I currently use bswap in the CRC32 finish() implementation to fix
>>> this issue.
>>>
>> no-no-no see the above ;)
>>
>>> Now the question is should I provide an additional finishUint
>>> function which avoids the bswap?
>>>
>>>
>>> Implementation issue:
>>>
>>> The current implementation of SHA1 and MD5 uses memcpy which doesn't
>>> work in CTFE IIRC and which also prevents the code from being pure.
>>> I could replace those memcpy calls with array copying but I'm not
>>> sure if memcpy was used for performance, so I'd like to keep it as
>>> long as we have no performance tests.
>>>
>> Replace memcpy with and array ops:
>> ptr1[x..y] = ptr2[x2..y2];
>> note that it's better to have them be pointers as it avoid bounds
>> check & D runtime magic.
>>
>> If need be I can provide benchmarks but I'm certain from the days of
>> optimizing std.regex that it's faster or on par with memcpy.
>>
>
> OK great, pure is working. CTFE not yet, but that can be added later.
>
> Do we want to add 'pure' as part of the functions in the Digest
> interface? This would require all implementations to be pure, I don't
> know if that's a good idea right now.
>

Some implementations may choose to call into kernel for respective 
crypto-primitives. I'd say no need to slap pure on top of it in a harry.

-- 
Dmitry Olshansky