Next in Review Queue: The New std.path

Sat Jul 16 14:25:36 PDT 2011

On Saturday 16 July 2011 19:39:13 Lars T. Kyllingstad wrote:
> On Fri, 15 Jul 2011 20:08:03 -0700, Jonathan M Davis wrote:
> > Names
> > -----------
> > dirSeparator:
> > I'd love to see this as something much shorter like dirSep. I think that
> > dirSeparator is overly long - particularly for something which is likely
> > to be used primarily in longer expressions.
> > 
> > currentDirSymbol and parentDirSymbol: The same here. These are _way_
> > longer than the strings that they represent. If we have them, they
> > should be much shorter (e.g. currDirSym and parDirSym). But really, I
> > think that I have to concur with dsimcha on this one. The strings for
> > these are completely standard, easy to remember, and even shorter names
> > for them would still be longer than the actual strings by a fair margin.
> > We should just toss them.
> > 
> > isDirSeparator:
> > I'd love to see this shortened to isDirSep.
> > 
> > extension:
> > I'd love to see this shortened to ext.
> > 
> > stripExtension:
> > I'd love to see this shortened to stripExt.
> 
> This has been discussed before, and even voted upon.  While the voting
> was very informal, with hardly enough participants to claim statistical
> significance, it at least showed that the current names have (perhaps
> even majority) support in the community:
> 
> http://www.digitalmars.com/d/archives/digitalmars/D/
> Proposal_for_std.path_replacement_131121.html#N131332

Well, looking at that, it looks like it was pretty much a toss up on most of 
those, with them being one vote off from one another. There just isn't a large 
enough sample there to say much in most cases. It could easily go either way 
if a decent number of people voted on it. Personally, I'm definitely against 
putting Extension and Separator in the names because it makes them much longer 
without really adding any information.

> > setExtensions:
> > I think that it should have replace in its name rather than set, since
> > it's not altering the original string. It's replacing what's there, not
> > setting it. And again, I think that Extension should be  shortened to
> > Ext, so the renamed function would be replaceExt.
> 
> To me, "replace" has an even stronger connotation than "set" that the
> original string is being modified.  Even so, that's actually not the
> reason I chose "set".  Rather, I think that
> 
> - replaceExtension implies that an existing extension gets replaced,
> - addExtension (the current std.path uses addExt) implies that an
>   extension gets added -- appended, that is -- regardless of whether
>   one exists already, while
> - setExtension implies that the extension is set to the given value
>   whether the path already has one or not.
> 
> Maybe that's just me, but at least I'd like to get more opinions before
> changing it.

Well, it's certainly debatable, but I don't like the idea of having set in a 
function which isn't altering its arguments. It is a valid point about replace 
impliing that an extension exists, but you could also argue that every file has 
an extension but that some have an empty extension, so replace is replacing 
the empty one - though as it stands if you try and do the reverse and give 
setExtension an empty extension, you end up with "." on the end instead of 
nothing.

So, anyway, it's certainly debatable, but personally, I think that set 
shouldn't be used on functions which don't alter at least one of their 
arguments.

> > defaultExtension:
> > This function is not a property, so it probably shouldn't be a noun,
> > though I'm not quite sure what to call it. Technically, it's something
> > like setWithDefaultExtensionIfNoExtension, but that's obviously a
> > horrible name even if you abbreviate some of it. So, I don't really have
> > a better name, though if it's not gonig to be completely renamed, I
> > definitely think that it should be shortened to defaultExt rather than
> > having the full Extension in its name.
> > 
> > joinPath:
> > I'm not sure that this function should be renamed to joinPath. When
> > discussing other modules such as as std.ascii and std.uni, we decided
> > that they should have the same function names and _not_ put the module
> > name in their names. Now, this function _is_ joining paths, so it's not
> > purely a case of having the module's name in the function name, but I'm
> > still not sure that we want to have it as joinPath instead of join.
> 
> Nick expressed my feelings about this naming choice pretty well.  My main
> reason for not going with join() here is that this function does
> something very different from the other join().

Actually, it really isn't all that much different from std.array.join. It's 
just that it's always using a specific separator, and unlike std.array.join, it 
doesn't add the separator if it's already there. So, it wouldn't be at all 
amiss to call it just join. But I can understand not wanting the two to have 
the same name. As I said, I'm a bit divided on it. In general, we shouldn't be 
putting the module's naming in a function's name, but at the same time, 
joinPath _is_ joining paths rather than path just being the module name. I 
brought it up because it does fall in the camp of functions with the same name 
with essentially the same functionality (albeit tweaked for their particular 
module and use case), and it was decided that in the general case, such 
functions should have the same name and let the module system be used to 
distinguish them. But while that's what we should be doing in the general 
case, that doesn't necessarily mean that we _have_ to do it in every case. The 
pros and cons must be weighed with a major reason to use the same naming being 
that that's the general convention. But if there's enough reason to use a 
different name, then we can use a different name. I don't feel strongly about it 
in either direction in this particular case. I think that it could go either 
way. I just felt that it needed to be brought up.

> > absolutePath:
> > It really shouldn't be a noun, since it's not a property. Something like
> > makeAbsolute would be better.
> 
> I originally called it toAbsolute(), but someone complained about that,
> too.  Sigh.  When it comes to function names, it is very hard to please
> everyone.

Well, I'm not terribly hung up on what the exact name is, but in general, 
variables, properties, and types should be nouns whereas functions should be 
verbs. absolutePath is a normal function, but its name is a noun. And as such, 
I think that makeAbsolute or toAbsolute makes a lot more sense. If other 
people think that path should be in the name for clarity or whatnot, then it 
could be change to toAbsPath or something similar.

> > Implementation
> > ------------------------
> > You should probably consider testing the various functions with varying
> > string types - both in size and mutability. Just put the test in a
> > foreach loop like
> > 
> > foreach (S; TypeTuple!(string, wstring, dstring))
> > 
> > (but with whatever combination of string types you want to test; In the
> > ideal case it would be every possible combination of character type and
> > mutability, but that's not necessarily necessary). Then you'll probably
> > end up using to!S() on pretty much every string. And for functions which
> > take multiple strings, you can test with a combination of string types.
> > It'll make sure that the functions work appropriately with the various
> > types. The functions are templated, but sometimes something gets missed
> > and they don't work with a particular character size or level of
> > mutability. So, while it might be overkill to test _every_ combination,
> > verifying at least some of them would be desirable.
> 
> Actually, I do this in most unittests, although not as systematically as
> what you suggest.  Look at the baseName unittests, for example:
> "file.ext"w is a wstring, "file.ext"d is a dstring and "file"w.dup is a
> wchar[].  The only thing I'm not testing is const(T)[].  Maybe your
> approach would be better.
> 
> > Personally, I think that it's better practice to test for empty rather
> > than length == 0, but I don't suppose that it really matters given that
> > we're always dealing with arrays here.
> 
> I agree.
> 
> > Your note on baseName about it adhering to the requirements for Posix's
> > basename utility should probably put in the actual documentation. The
> > same for dirName.
> 
> I considered doing that, but I was afraid of alienating Windows
> programmers.  It's probably a good idea.

It makes it clearer what it's doing, and unless Windows has an equivalent 
function, then it's not like you're choosing the functionality of one over the 
other. And from the little poking around I just did, it seems that Windows 
does _not_ have an equivalent function (apparently you have to use something 
like _splitpath). So, I think that it's just better to put in tho docs. It 
makes it clearer, and if the link has good info, then it's that much better.

> > There's no point in checking for "== null" like you do in a few places.
> > If you want to check whether it's actually null, you need to check for
> > "is null", and if you want to check whether it's empty, then you should
> > just use empty. "== null" has a tendancy to be misleading, because
> > people tend to think that it actually checks whether an array is null,
> > which it doesn't do. Personally, I'm likely to think that it's a bug
> > when I see it.
> 
> Agreed.  I'll fix that.
> 
> > It would probably be more efficient for setExtension to use ~=. e.g.
> > 
> > auto retval = stripExtension(path);
> > retval ~= '.';
> > retval ~= ext;
> > return cast(typeof(return))(retval);
> > 
> > Maybe it doesn't matter (and the code _is_  a bit less clean that way),
> > but it might reduce the number of necessary heap allocations. The same
> > goes for pretty much anywhere else that ~ is used. It probably doesn't
> > matter all that much, but if we want to get the functions as efficient
> > as possible, then it's better to use ~= than ~ unless the compiler is
> > actually smart enough to tranlate the ~'s into ~='s, but I doubt that it
> > is (particularly since some chunk of that is in druntime rather than in
> > the compiler itself).
> 
> Yeah, David pointed this out as well.
> 
> > There's no such thing as an OutOfMemoryException. It's an
> > OutOfMemoryError. And being an error, I don't think that there's any
> > need to mention it in the documentation. _Anything_ which allocates
> > memory could cause the application to run out of memory, and being an
> > Error, you're really not supposed to catch it, so there's no real point
> > in mentioning it in the documentation.
> 
> I'll fix the documentation.

Actually, core.memory needs to be fixed on that count (that's probably where 
you got the idea about OutOfMemoryException). I should probably create a pull 
request to fix that.

- Jonathan M Davis