DMD Source Archive - Why?

Wed Apr 10 16:42:53 UTC 2024

On Wednesday, 10 April 2024 at 03:47:30 UTC, Walter Bright wrote:
> On 4/9/2024 4:42 PM, Steven Schveighoffer wrote:
>> I will also bet that any difference in compile time will be 
>> extremely insignificant. I don't bet against decades of 
>> filesystem read optimizations. Saving e.g. microseconds on a 
>> 1.5 second build isn't going to move the needle.
>
> On my timing on compiling hello world, a 1.412s build becomes 
> 1.375s, 35 milliseconds faster. Most of the savings appear to 
> be due to when the archive is first accessed, its table of 
> contents is loaded into the path cache and file cache that you 
> developed. Then, no stats are done on the filesystem.

Yes, the nice thing is knowing you will not have to ask the 
filesystem for something you know doesn't exist. Pre-loading the 
directory structure could do the same thing, but I think that's 
definitely not as efficient.

>> The only benefit I might see in this is to *manage* the source 
>> as one item.
>
> The convenience of being able to distribute a "header only" 
> library as one file may be significant. I've always liked 
> things that didn't need an installation program. An install 
> should be "copy the file onto your system" and uninstall should 
> be "delete the file" !
>
> Back in the days of CD software, my compiler was set up so no 
> install was necessary, just put the CD in the drive and run it. 
> You didn't even have to set the environment variables, as the 
> compiler would look for its files relative to where the 
> executable file was (argv[0]). You can see vestiges of that 
> still in today's dmd.
>
> Of course, to get it to run faster you'd XCOPY it onto the hard 
> drive. Though some users were flummoxed by the absence of 
> INSTALL.EXE and I'd have to explain how to use XCOPY.

Consider that java archives (`.jar` files) are distributed as a 
package instead of individual `.class` files.

And Microsoft (and other C compilers) can produce "pre-compiled 
headers", that take away some of the initial steps of compilation.

I think there would be enthusiastic support for D archive files 
that reduce some of the compilation steps, or provide extra 
features (e.g. predetermined inference or matching compile-time 
switches). Especially if you aren't going to directly edit these 
archive files, you will be mechanically generating them, why not 
do more inside there?

> A tar file is serial, meaning one has to read the entire file 
> to see what it is in it (because it was designed for tape 
> systems where data is simply appended).

You can index a tar file easily. Each file is preceded by a 
header with the information about the file (including size). So 
you can determine the catalog by seeking to each header.

Note also that we can work with tar files to add indexes that are 
backwards compatible with existing tools. Remember, we are 
generating this *from a tool that we control*. Prepending an 
index "file" is trivial.

> The tar file doesn't have a table of contents, the filename is 
> limited to 100 characters, and the path is limited to 155 
> characters.

I'm not too worried about such things. I've never run into 
filename length problems with tar. But also, most modern tar 
formats do not have these limitations:

https://www.gnu.org/software/tar/manual/html_section/Formats.html

>
> Sar files have a table of contents at the beginning, and 
> unlimited filespec sizes.
>
> P.S. the code that actually reads the .sar file is about 20 
> lines! (Excluding checking for corrupt files, and the header 
> structure definition.) The archive reader and writer can be 
> encapsulated in a separate module, so anyone can replace it 
> with a different format.

I would suggest we replace it with a modern tar format for 
maximum compatibility with existing tools. We already have seen 
the drawbacks of using the abandoned `sdl` format for dub 
packages. We should not repeat that mistake.

-Steve