Andrei's std.path review

Lars T. Kyllingstad public at kyllingen.NOSPAMnet
Thu Aug 11 00:10:08 PDT 2011


I am replying to this in a new thread, so as not to pollute the voting 
thread.  (Speaking from experience as a review manager, noise in the 
voting thread makes it harder to count votes.)


On Wed, 10 Aug 2011 12:37:01 -0600, Andrei Alexandrescu wrote:

> On 8/5/11 7:26 AM, dsimcha wrote:
>> My apologies for not announcing this yesterday. For some reason I
>> thought today was the official end of review. Anyhow, Lars
>> Kyllingstad's new std.path module, which has been in review for the
>> past 3 weeks, is up for vote. Please vote yes or no in this thread.
>>
>> [...]
> 
> My vote is "yes", with a few advisory comments.
> 
> Comments on the documentation:
> 
> * "This module is used to parse path strings." -> "This module is used
> to manipulate path strings."

Ok.


> * "perform any I/O" -> "perform any actual file system actions"

Ok.


> * "use std.file.isDir and std.file.exists" -> use the XREF macro to
> generate cross-reference links.

Cool, didn't know about that one.  Will fix.  Should it also be used to 
generate internal references (such as linking to filenameCharCmp in the 
filenameCmp documentation) or is there a separate macro for this?


> * "backslashes on this platform" -> "backslashes on that platform"

Ok.


> * "The result of calling a function on an ill-formed path is undefined."
> This simplifies documentation but is a bit extreme. We could and should
> specify the behavior for strings that don't look quite like paths.

I am not so sure about this.  This is easy in the old std.path, because 
its behaviour is generally very simplistic.  "Search for some character, 
return everything to the left/right of it" and so on.

The new version is designed to correctly handle a lot of corner cases, 
and such specifications will often tend to be very convoluted.  For 
example, the old getDrive() simply searches for a colon and returns 
everything to the left of it, period.  The new driveName() also includes 
support for UNC paths, i.e.

  assert (driveName(`\\server\share\file`) == `\\server\share`);

Furthermore, making assumptions about the well-formedness of the input 
paths allows for simpler, more performant code.  Using Windows drive 
letters as an example again, only one-character drive specifiers are 
allowed -- c:, d:, etc.  It is therefore unnecessary to search the entire 
string for a colon, it suffices to look at the second character.


> * BTW we should have a validPath() function that tells whether a string
> looks like a valid path or not.

I actually had such a function in an earlier version of std.path, but I 
removed it for some reason I can't recall at the moment.  I agree such a 
function could be useful, so I'll dig it out again.

One use case for validPath() -- or isValidPath(), to follow the module's 
convention -- would be in "in" contracts for more or less every function 
in the module.  Then the statement you criticised above could be 
moderated to "well-formedness is checked in non-release mode".


> * Add example:
> 
> assert (baseName("dir/file.ext", ".xyz") == "file.ext");

Agreed.


> * defaultExtension -> setDefaultExtension?

I think it's too late to change this now that the voting is almost 
finished.


> * absolutePath calls getcwd(), negating the assertion in the beginning
> that there's no real directory access going on.

Strictly speaking, now that getcwd() is the default value for the "base" 
argument to absolutePath() and relativePath(), it is implicitly called by 
the user as he or she neglects to provide a value for this parameter.

For practical purposes, I guess there isn't a real difference between 
this and getcwd() getting called inside the functions, so I will change 
the documentation to reflect this.


> * absolutePath and others use string, others use generic characters.
> Why? (This is my strongest comment.)

absolutePath() and relativePath() depend, in the majority of cases, on 
the result of getcwd(), which returns string.  expandTilde() depends on 
the contents of $HOME or /etc/passwd, which will both also be of string 
type.  I have tried keeping the number of allocations in this module at 
an absolute minimum, and decided that if any UTF-8/16/32 transcoding is 
to happen, it should happen at the explicit request of the user, not 
behind the scenes.


> * filenameCharCmp and filenameCmp -> why long and not int?

filenameCharCmp() returns a-b, and since a and b are dchars, the 
corresponding signed type is long.  filenameCmp() returns long because 
filenameCharCmp() does.


> * Example in expandTilde uses odd ALL_UPERCASE variable names.

Ah, that would be an example from the old std.path.  Can't believe I 
didn't spot this before.  Will fix.


> Comments on the implementation:
> 
> * We're increasingly moving towards consolidating imports into one.

Seriously?  I haven't noticed this.  While I will certainly follow the 
Phobos conventions, it would be nice to know the rationale for this.  Is 
it simply for vertical compactness?  As you may have noticed, I'm not as 
worried about that as you are. ;)

Arguments in favour of separate imports are, in my opinion:

 - Adding/removing imports is easier.
 - Seeing which modules are imported is easier.


> * "." is hardcoded as a symbol for the current dir.

Yes.  Unlike dir and path separators, the symbols "." and ".." are 
uniform across all platforms.

You may also have noticed that I've removed the curdir and pardir 
constants, defined in the old std.path as "." and "..", respectively.  
Their utter uselessness was pointed out at some point during the review, 
and no one could provide a compelling use case (actually, no one could 
provide *any* use case) for them.


> * Would be interesting to figure what it would take to make pathSplitter
> reuse splitter.

It would be interesting, yes, but due to the various corner cases it 
handles, and the special nature of the front-most segment of an absolute/
rooted path, I suspect it will be impossible.  At the very least, it 
won't be worth it.


> * Misalignment in lines 2195--2238. I think you don't need the extra
> scope there anyway, but if you do, don't make a special rule for that
> case - obey normal brace indentation.

This code is not mine -- it's from the old std.path -- and I didn't 
notice the extra braces until now.  I don't think there is a good reason 
to have them there, so I'll try to remove them.

Thanks for the review!

-Lars


More information about the Digitalmars-d mailing list