Next in Review Queue: The New std.path

Sat Jul 16 12:39:13 PDT 2011

On Fri, 15 Jul 2011 20:08:03 -0700, Jonathan M Davis wrote:

> Names
> -----------
> dirSeparator:
> I'd love to see this as something much shorter like dirSep. I think that
> dirSeparator is overly long - particularly for something which is likely
> to be used primarily in longer expressions.
> 
> currentDirSymbol and parentDirSymbol: The same here. These are _way_
> longer than the strings that they represent. If we have them, they
> should be much shorter (e.g. currDirSym and parDirSym). But really, I
> think that I have to concur with dsimcha on this one. The strings for
> these are completely standard, easy to remember, and even shorter names
> for them would still be longer than the actual strings by a fair margin.
> We should just toss them.
> 
> isDirSeparator:
> I'd love to see this shortened to isDirSep.
> 
> extension:
> I'd love to see this shortened to ext.
> 
> stripExtension:
> I'd love to see this shortened to stripExt.

This has been discussed before, and even voted upon.  While the voting 
was very informal, with hardly enough participants to claim statistical 
significance, it at least showed that the current names have (perhaps 
even majority) support in the community:

http://www.digitalmars.com/d/archives/digitalmars/D/
Proposal_for_std.path_replacement_131121.html#N131332

> setExtensions:
> I think that it should have replace in its name rather than set, since
> it's not altering the original string. It's replacing what's there, not
> setting it. And again, I think that Extension should be  shortened to
> Ext, so the renamed function would be replaceExt.

To me, "replace" has an even stronger connotation than "set" that the 
original string is being modified.  Even so, that's actually not the 
reason I chose "set".  Rather, I think that

- replaceExtension implies that an existing extension gets replaced,
- addExtension (the current std.path uses addExt) implies that an
  extension gets added -- appended, that is -- regardless of whether
  one exists already, while
- setExtension implies that the extension is set to the given value
  whether the path already has one or not.

Maybe that's just me, but at least I'd like to get more opinions before 
changing it.

> defaultExtension:
> This function is not a property, so it probably shouldn't be a noun,
> though I'm not quite sure what to call it. Technically, it's something
> like setWithDefaultExtensionIfNoExtension, but that's obviously a
> horrible name even if you abbreviate some of it. So, I don't really have
> a better name, though if it's not gonig to be completely renamed, I
> definitely think that it should be shortened to defaultExt rather than
> having the full Extension in its name.
> 
> joinPath:
> I'm not sure that this function should be renamed to joinPath. When
> discussing other modules such as as std.ascii and std.uni, we decided
> that they should have the same function names and _not_ put the module
> name in their names. Now, this function _is_ joining paths, so it's not
> purely a case of having the module's name in the function name, but I'm
> still not sure that we want to have it as joinPath instead of join.

Nick expressed my feelings about this naming choice pretty well.  My main 
reason for not going with join() here is that this function does 
something very different from the other join().

> absolutePath:
> It really shouldn't be a noun, since it's not a property. Something like
> makeAbsolute would be better.

I originally called it toAbsolute(), but someone complained about that, 
too.  Sigh.  When it comes to function names, it is very hard to please 
everyone.

> Implementation
> ------------------------
> You should probably consider testing the various functions with varying
> string types - both in size and mutability. Just put the test in a
> foreach loop like
> 
> foreach (S; TypeTuple!(string, wstring, dstring))
> 
> (but with whatever combination of string types you want to test; In the
> ideal case it would be every possible combination of character type and
> mutability, but that's not necessarily necessary). Then you'll probably
> end up using to!S() on pretty much every string. And for functions which
> take multiple strings, you can test with a combination of string types.
> It'll make sure that the functions work appropriately with the various
> types. The functions are templated, but sometimes something gets missed
> and they don't work with a particular character size or level of
> mutability. So, while it might be overkill to test _every_ combination,
> verifying at least some of them would be desirable.

Actually, I do this in most unittests, although not as systematically as 
what you suggest.  Look at the baseName unittests, for example:  
"file.ext"w is a wstring, "file.ext"d is a dstring and "file"w.dup is a 
wchar[].  The only thing I'm not testing is const(T)[].  Maybe your 
approach would be better.

> Personally, I think that it's better practice to test for empty rather
> than length == 0, but I don't suppose that it really matters given that
> we're always dealing with arrays here.

I agree.

> Your note on baseName about it adhering to the requirements for Posix's
> basename utility should probably put in the actual documentation. The
> same for dirName.

I considered doing that, but I was afraid of alienating Windows 
programmers.  It's probably a good idea.

> There's no point in checking for "== null" like you do in a few places.
> If you want to check whether it's actually null, you need to check for
> "is null", and if you want to check whether it's empty, then you should
> just use empty. "== null" has a tendancy to be misleading, because
> people tend to think that it actually checks whether an array is null,
> which it doesn't do. Personally, I'm likely to think that it's a bug
> when I see it.

Agreed.  I'll fix that.

> It would probably be more efficient for setExtension to use ~=. e.g.
> 
> auto retval = stripExtension(path);
> retval ~= '.';
> retval ~= ext;
> return cast(typeof(return))(retval);
>
> Maybe it doesn't matter (and the code _is_  a bit less clean that way),
> but it might reduce the number of necessary heap allocations. The same
> goes for pretty much anywhere else that ~ is used. It probably doesn't
> matter all that much, but if we want to get the functions as efficient
> as possible, then it's better to use ~= than ~ unless the compiler is
> actually smart enough to tranlate the ~'s into ~='s, but I doubt that it
> is (particularly since some chunk of that is in druntime rather than in
> the compiler itself).

Yeah, David pointed this out as well.

> There's no such thing as an OutOfMemoryException. It's an
> OutOfMemoryError. And being an error, I don't think that there's any
> need to mention it in the documentation. _Anything_ which allocates
> memory could cause the application to run out of memory, and being an
> Error, you're really not supposed to catch it, so there's no real point
> in mentioning it in the documentation.

I'll fix the documentation. 

> Okay. I think that that covers it, though I might have missed something.
> Aside from a few nitpicks, my main issue with the module as it stands is
> the unnecessarily long function names. The function names should be as
> long as is necessary to make them clear but not longer. And putting
> Extension and Separator in names is completely unnecessary. Ext and Sep
> are quite clear enough (especially in context) and are _much_ shorter.
> The longer names add nothing and make the functions/variables much more
> annoying to deal with in longer expressions.
> 
> So overall, I think that it looks good and is a definite improvement,
> but I really think that the function names need to be shortened.

Thanks for a thorough review! :)

-Lars