std.path review: second update

Jonathan M Davis jmdavisProg at gmx.com
Tue Aug 2 01:38:13 PDT 2011


On Tuesday 02 August 2011 10:19:54 Marco Leise wrote:
> Am 02.08.2011, 08:02 Uhr, schrieb Jonathan M Davis <jmdavisProg at gmx.com>:
> > "file." and "file" do _not_ have the same extension. One has an empty
> > extension whereas the other has none.
> 
> Still I would expect a get extension function to return the empty string
> for both. Why is that so? As Wikipedia states the interpretation depends
> on the filesystem (or maybe on the originating OS, but you can use ext3 on
> Windows and NTFS on Linux nowadays).
> 
> But others seem to have problems as well:
> 
> Trailing dots disappear in Samba:
> http://lists.samba.org/archive/rsync/2002-September/003636.html
> 
> On Windows files ending in a dot cannot be deleted:
> http://cygwin.com/ml/cygwin/2004-01/msg00848.html
> http://blog.dotsmart.net/2008/06/12/solved-cannot-read-from-the-source-file-
> or-disk/
> 
> Mozilla Linux cannot open files ending in a dot:
> https://bugzilla.mozilla.org/show_bug.cgi?id=149586
> 
> The file extension is what is following the last dot.
> On Windows it cannot be empty, thus 'foo.' will be an inaccessible file.
> Yet 'foo..bar' is perfectly fine, which is causing us trouble now, since
> 'foo.' is 'foo..bar' stripped from its extension, but 'foo.' itself -
> while valid on Posix - is an ambiguous name in Windows.
> Camp A thinks:
> - it has no extension as long as the dot isn't followed by one
> - changing the extension must result in 'foo..ext'
> - getExtension should never return null, but be either '' or include the
> dot as in '.ext'
> - disassembling and reassembling a filename by string concatenation should
> return the original filename in all cases
> 
> Camp B thinks:
> - no dot = no extension, otherwise what follows the dot is the extension
> - changing the extension must result in 'foo.ext'
> - getExtension returns null if no dot is found, an empty string if the
> file ends in a dot or otherwise what is following the dot
> - disassembling and reassembling a filename isn't a portable process
> 
> I started at camp A, but now I'm really caught in the middle. Their
> arguments make as much sense.
> Funny enough even Sun avoided file extension methods in their Java File
> class, so I checked Python for that matter:
> os.path.splitext ( "foo.bar" ) -> '.bar'
> os.path.splitext ( "foo." ) -> '.'
> os.path.splitext ( "foo" ) -> ''
> Although there is no routine to change the extension, the obvious approach
> would result in changeExt('foo.', '.bar') == 'foo.bar'.
> 
> This is what Jonathan prefers and I agree with this solution now that I
> made up my mind. It's just inconvenient that by this convention you cannot
> change the extension of 'Keep my dot.' in a way that the result is 'Keep
> my dot..ext'.

Except that that's two extensions, which shouldn't pose a problem.

Actually, that raises the argument that we should have an addExtension 
function. After all, files such as file.tar.gz are quite common (on Linux at 
least), and std.path should be able to handle files with multiple extensions. 
IIRC, on the whole both the old std.path and the new std.path handle multiple 
extensions fairly well, but I don't think that either the old std.path or the 
new std.path has a function which handles the case where you want to add an 
extension to a file regardless of whether it already has an extension.

- Jonathan M Davis


More information about the Digitalmars-d mailing list