Review: std.uuid

Sat Jun 16 04:27:54 PDT 2012

Am Wed, 13 Jun 2012 18:57:51 -0700
schrieb Jonathan M Davis <jmdavisProg at gmx.com>:

> * I'm not terribly fond of having names like Variant and Version, but
> they _are_ local to UUID, so I guess that they're clear enough. But
> the fact that you have to put a note about it _not_ being
> std.variant.Variant definitely indicates that another name might be
> better. I know that the RFC uses the word variant, but it also
> indicates that type would probably be a better name, and if Variant
> is unclear enough that a note is needed, then perhaps it really isn't
> the right name.

Well I added the note as a result of this review. I think whether the
name variant is confusing depends a lot on whether you've already used
a UUID library or std.variant. Naming it 'Type' might confuse those who
already worked with UUIDs. And 'Type' is a very generic name, even more
than UUID and at some point we'll have other 'Type' symbols in phobos
as well.

So I'd rather like to keep the name consistent with the RFC.
> 
> * nameBasedMd5 should probably be nameBasedMD5, and nameBasedSha1
> should probably be nameBasedSHA1. The correct names are MD5 and
> SHA-1, not md5 and sha1, so it would be more correct to use
> nameBasedMD5 and nameBasedSHA1. With acronyms, the thing that most of
> Phobos does is to keep all of the letters the same case, so if the
> first letter is lowercase, then the whole thing is lowercase, and if
> the first letter is uppercase, then the whole thing is uppercase. I'm
> not sure if MD5 and SHA-1 are acronyms or not, but they're similar
> enough, that I'd say that the same naming scheme should be used,
> which you do in most cases, but you didn't with the enums for some
> reason.
yep seems I missed those when I refactored the names.

> * For clarity's sake, please always put const after the function
> signatures for all const functions. In same cases you do, and in
> others you don't.
OK (Imho it's no confusing though if const is part of a attribute
list, as in '@safe nothrow const pure')
> 
> * If all that's preventing toString from being nothrow is idup, then
> just do this:
> 
> string toString() @safe const pure nothrow
> {
>     try
>         return _toString().idup;
>     catch(Exception e)
>         assert(0, "It should be impossible for idup to throw.");
> }
> 
> and if there's a bug report for idup, then list it in a comment so
> that we know to get rid of the try-catch block once that's been fixed
> (and if there isn't a bug report, one should be created).

_toString is already nothrow, so yes, it's only idups fault ;-)
Added links to http://d.puremagic.com/issues/show_bug.cgi?id=5700

> 
> * Get rid of nilUUID. Just use UUID.init. It just seems like an
> extra, unnecessary complication when the two are identical.
OK

> 
> * Please use SHA-1 in the documentation rather than SHA1, since SHA-1
> is the correct name. Also, make sure that it's SHA-1 and not just
> SHA. You're not always consistent (e.g. take a look at sha1UUID).
OK

> 
> * parseUUID should be clear on whether it takes both lowercase and
> uppercase hex digits, and I'd argue that it should definitely accept
> both. [0-9a-f] implies that it only accepts lowercase hex digits.
OK

> * Use std.exception's assertThrown. It'll make your tests cleaner.
> And if you can't do that, because you need to look at the exception
> itself, then use collectException. So,
> 
> thrown = false;
> try
>     //make sure 36 characters in total or we'll get a 'tooMuch' reason
>     id = UUID("{8ab3060e-2cba-4f23-b74c-b52db3bdf6}");
> catch(UUIDParserException e)
> {
>     assert(e.reason == UUIDParserException.Reason.invalidChar);
>     thrown = true;
> }
> assert(thrown);
> 
> becomes
> 
> //make sure 36 characters in total or we'll get a 'tooMuch' reason
> auto e =
> collectException!UUIDParserException(UUID("{8ab3060e-2cba-4f23-b74c-
> b52db3bdf6}")); assert(e !is null);
> assert(e.reason == UUIDParserException.Reason.invalidChar);
> 
> It's much shorter that way and less error-prone.

OK

> * You should probably change empty's body to
> 
> enum ubyte[16] emptyUUID = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0];
> return data == emptyUUID[];
> 
> I'm not sure whether it currently allocates or not if you do that (I
> _think_ that it'll do the heap allocation at compile time and then
> only have the value type at runtime), but regardless of whether it
> does or not, I'd expect that once it's fixed so that assigning an
> array literal to a static array, it definitely _won't_ allocate on
> the heap, whereas your current implementation will regardless of that
> fix.
> 
> And if that doesn't seem like a good change, then you can just make a
> module- level or struct variable which is immutable ubyte[16] and
> have empty use that. As it stands, it's always going to allocate the
> array literal on the heap. It's probably not a big deal, since empty
> probably won't be called all that often, but if we can easily avoid
> the extra heap allocation, we should.

I totally agree, we shouldn't allocate there.

> * I know that
> 
> void toString(scope void delegate(const(char)[]) sink)
> 
> is what has been previously discussed for a toString which doesn't
> allocate, but I have to wonder why we don't just use an output range.
> I suppose that that's a whole other discussion, but that's what I
> would have suggested rather than a delegate - though in some
> circumstances it probably _is_ easier to just create a nested
> function rather than a whole output range. Maybe we should be moving
> to having 3 overloads of toString (normal, accepts delegate, and
> accepts output range).

Probably, but as you said that's a whole other discussion :-)

> 
> * rndGen returns a Random (which is already aliased to Mt19937), and
> already initializes it with an unpredictable seed. So, why not just
> us rndGen in randomUUID? It becomes
> 
> return randomUUID(rndGen());
> 
> It's much shorter and just as good as far as I can tell.

Good catch. We'd have to change rndGen to use the new seed function
though. And the documentation for Random says: "You may want to use it
if [...] you don't care for the minutiae of the method being used"

I'm not sure, do we care about the method? We need something which
generates 'good' random numbers, I don't think a LCG would be
acceptable. But rndGen could default to something like that on embedded
systems?

Then again, I know nothing about random number generators, so someone
please tell me: Do we need  Mersenne twister for random UUIDS or could
we use any RNG?

> 
> * Change randomValue in randomUUID to auto. The default random number 
> generator _does_ use uint, but others could use ubyte, ushort, or
> ulong instead. I'd also be tempted to argue that you should make it
> immutable and use a different variable for each random value (you
> won't ever accidentally fail to set it to a new value that way - that
> I don't think that reusing variables is a good idea in general), but
> the benefits of that are debatable.
> 
> Actually, just make it a loop. Something like
> 
> foreach(size_t i; iota(0, 16, 4))
> {
>     randomGen.popFront();
>     immutable randomValue = randomGen.front;
>     u.data[i .. i + 4] = *cast(ubyte[4]*)&randomValue;
> }
> 
> That's probably slightly less efficient (though I wouldn't think by
> much), but it's much cleaner. And actually, since the randomValue
> isn't guarantee to be a uint, but your code needs it to be 4 bytes,
> you should probably change that line to
> 
> immutable uint randomValue = cast(uint)randomGen.front;

I don't think that's a good idea. If the randomGen.front the is e.g.
ubyte, casting it to uint will produce 3 bytes which are not set to a
random value. That's not acceptable.

------------
@trusted UUID randomUUID(RNG)(ref RNG randomGen) if(isUniformRNG!(RNG)
&&
    isIntegral!(typeof(RNG.front)) && typeof(RNG.front).sizeof <= 16)
{
    enum elemSize = typeof(RNG.front).sizeof;
    UUID u;
    foreach(size_t i; iota(0, 16, elemSize))
    {
        randomGen.popFront();
        immutable randomValue = randomGen.front;
        u.data[i .. i + elemSize] = *cast(ubyte[elemSize]*)&randomValue;
    }
------------
?

> 
> * I'd advise giving more descriptive names to parseError's
> parameters. It's true that it's a nested function and not in the
> actual API, but it would make the function easier to understand.
OK

> * You should probably use R or Range for template parameters which
> are ranges. That's the typical thing to do. T is usually used for
> generic types rather than for ranges. The code will be somewhat
> easier to read if you use R or Range given that that's how the rest
> of Phobos is.
OK

> 
> * In parseUUID, element should be a size_t, since it's used
> specifically for indexing. As it stands, the compiler is going to
> have to do extra conversions in some of the places that it's used
> (e.g. converting it to size_t when indexing).
OK

> 
> * In the unit tests, I'd argue that it would be cleaner to just use
> the result directly in an assertion rather than saving it if it's not
> going to be used outside of the assertion. For instance,
> 
> id = parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"d);
> assert(id.data == [86, 104, 18, 45, 157, 240, 73, 164, 173,
>         11, 185, 176, 165, 127, 136, 106]);
> 
> should probably be
> 
> assert(parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"d).data ==
>            [86, 104, 18, 45, 157, 240, 73, 164, 173, 11, 185, 176,
> 165, 127, 136, 106]);
> 
> It's generally shorter, cleaner, and avoids the risk of reusing the
> variable without setting it again.
OK

> * For your functions which can take range types, test with
> filter!"true" so that you get a range which isn't an array or string.
Does take!(string, int) return a slice? Should have thought about
that....

I can't use filter (and map) though:
std/algorithm.d(1141): Error: nested structs with constructors are not
yet supported in CTFE (Bug 6419) std/uuid.d(1273):        called from
here: filter("00000000-0000-0000-0000-000000000000"d)

> 
> * When testing various string types, you should take advantage of
> TypeTuple and std.conv.to to be thorough. For instance, instead of
> 
> id = parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"d);
> assert(id.data == [86, 104, 18, 45, 157, 240, 73, 164, 173,
>     11, 185, 176, 165, 127, 136, 106]);
> id = parseUUID("5668122d-9df0-49a4-ad0b-b9b0a57f886a"w);
> assert(id.data == [86, 104, 18, 45, 157, 240, 73, 164, 173,
>     11, 185, 176, 165, 127, 136, 106]);
> 
> do something like
> 
> foreach(S; TypeTuple!(char, const char, immutable char,
>                       wchar, const wchar, immutable wchar,
>                       dchar, const dchar, immutable dchar))
> {
>     assert(parseUUID(to!S("5668122d-9df0-49a4-ad0b-b9b0a57f886a")).data
> == [86, 104, 18, 45, 157, 240, 73, 164, 173, 11, 185, 176, 165, 127, 
> 136, 106]);
> }
> 
> And while you're at it, you could throw _all_ of your string tests in
> that loop save for those which wouldn't really make sense to put in
> there (and you probably don't want to put very many exception tests
> in the loop, since those get to be expensive). You get much better
> testing this way.
OK (I'll put the exception tests in the loop as well - There'll be ~100
exception tests in total and throwing/catching 100 exceptions takes
~120 msec here, so it should be fast enough)

> 
> However, do make sure that some of the exception tests use multiple
> string types even if they aren't in the TypeTuple loop, since some of
> your exceptions (in parseUUID in particular) depend on the type of
> range passed in.
OK

> 
> * I'm not sure if you do or not, but if you have a spot where you
> have to create a UUIDParserException from another exception but don't
> want to have to specify the file and line number, then you should
> create an overload of UUIDParserException which takes the Throwable
> first (and probably forwards to the other constructor to avoid code
> duplication). Exception does this, and it can be quite useful. But
> UUIDParserException can't be publicly constructed, so only do it if
> it's actually useful within std.uuid.
OK
> 
> * Personally, I'd prefer UUIDParsingException to UUIDParserException,
> since you're not writing a full-on parser but rather just doing
> parsing in some of your functions. But that's fairly subjective.
OK
> 
> - Jonathan M Davis

Thanks for your feedback!